KR101743861B1

KR101743861B1 - Methods of image fusion for image stabilization

Info

Publication number: KR101743861B1
Application number: KR1020157034366A
Authority: KR
Inventors: 마리우스 티코; 지안핑 조우; 아니타 나리아니 슐츠; 롤프 토프트; 폴 후벨; 웨이 순
Original assignee: 애플 인크.
Priority date: 2013-06-06
Filing date: 2014-05-06
Publication date: 2017-06-05
Also published as: KR20160004379A; CN105264567A; US20140363087A1; CN105264567B; WO2014197154A1; US9262684B2

Abstract

이미지 안정화 동작을 개선하기 위한 시스템, 방법, 및 컴퓨터 판독 가능 매체를 설명한다. 비-기준 이미지들을 공통적으로 캡쳐된 이미지들의 세트에서 미리 선택된 기준 프레임과 융합하기 위한 새로운 접근법들이 개시된다. 융합 방법은 이웃하고 거의 유사한 픽셀들 간의 급격한 전환을 방지하기 위하여 고스트/비-고스트 픽셀들에 대하여 가중 평균을 사용함으로써 부드러운 전환을 이용할 수 있다. 또한, 고스트/비-고스트 결정은 각각의 픽셀에 대하여 독립적으로 진행되기 보다는 이웃하는 픽셀들의 세트에 기초하여 이루어질 수 있다. 대안적인 접근법은 모든 캡쳐된 이미지들의 다중 해상도 분해를 수행하는 단계, 각각의 계층에서 시간적 융합, 공간-시간적 융합, 또는 이들의 조합을 이용하는 단계, 및 출력 이미지를 생성하기 위하여 상기 상이한 계층들을 조합하는 단계를 포함할 수 있다.Systems, methods, and computer readable media for improving image stabilization operations are described. New approaches are disclosed for fusing non-reference images with previously selected reference frames in a set of commonly captured images. The fusing method may utilize a smooth transition by using a weighted average for the ghost / non-ghost pixels to prevent sharp transitions between neighboring and nearly similar pixels. Also, ghost / non-ghost crystals may be made based on a set of neighboring pixels rather than progressing independently for each pixel. An alternative approach is to perform multiple resolution decomposition of all captured images, using temporal fusion, spatial-temporal fusion, or a combination thereof at each layer, and combining the different layers to produce an output image Step < / RTI >

Description

[0001] METHODS OF IMAGE FUSION FOR IMAGE STABILIZATION FOR IMAGE STABILIZATION [0002]

본 개시내용은 일반적으로 디지털 사진 분야에 관한 것이다. 더 구체적으로, 본 개시내용는 스틸 이미지 안정화 기술에 관한 것이지만, 이에 제한되지 않는다. 본 명세서에서 사용된 바와 같이, 이미지 안정화는 이미지 캡쳐 동작 중 모션에 의한 블러링을 감소시키는 기술들의 집합을 지칭한다. 그러한 모션은 카메라, 장면 속 물체, 또는 둘 모두의 움직임으로 인해 발생할 수 있다.This disclosure relates generally to the field of digital photography. More particularly, this disclosure relates to still image stabilization techniques, but is not limited thereto. As used herein, image stabilization refers to a collection of techniques for reducing blur caused by motion during image capture operations. Such motion can occur due to motion of the camera, the scene object, or both.

낮은 주변광 조건에서 고화질의 사진을 찍거나, 역동적인 장면(예를 들어, 운동 장면)을 찍는 것은 이미지를 캡쳐하는 동안 카메라 모션 및/또는 장면 속 물체들의 모션으로 인해 쉽지 않다. 이미지의 노이즈를 증폭시키지 않고 모션 블러를 감소시키는 한가지 방법은 장면의 단시간 노출된 이미지들을 여러장 캡쳐하여 융합하는 것이다. 그러한 동작들은 종종 '스틸 이미지 안정화'라고 불린다. 이미지 노출 시간 단축은 모션 블러 결함을 감소시킬 수 있지만, 그것은 이미지가 더 많은 노이즈 및/또는 더 어두워지는 희생을 치르게 된다.It is not easy to take high-quality pictures in low ambient light conditions, or to take dynamic scenes ( e.g. , motion scenes) due to camera motion and / or motion of objects in the scene during image capture. One way to reduce motion blur without amplifying the noise of an image is to capture and fuse multiple short-exposure images of the scene. Such operations are often referred to as " still image stabilization ". Shortening the image exposure time can reduce motion blur defects, but it results in more noise and / or darkening sacrifices in the image.

이미지 안정화의 일반적인 해결책은 (1) 다수의 단시간 노출된 이미지의 세트로부터 기준 이미지를 선택하는 단계, (2) 기준 이미지에 대하여 모든 비-기준 이미지들을 전체적으로 등록하는 단계, 및 (3) 모든 캡쳐된 이미지들을 기준 이미지에 융합함으로써 출력 이미지를 합성하는 단계로 구성된다. 기준 이미지가 캡쳐될 때와 같이 출력 이미지가 장면을 나타내는 방식에서는, 비-기준 이미지들을 이용하여 모든 이미지들에 걸쳐 각각의 기준 픽셀에 대한 다수의 관찰 결과들을 평균화/병합함으로써 기준 이미지에서 노이즈를 감소시킨다.Typical solutions for image stabilization include (1) selecting a reference image from a set of multiple short-time exposed images, (2) registering all non-reference images for the reference image as a whole, and (3) And compositing the output image by fusing the images to the reference image. In a manner in which an output image represents a scene, such as when a reference image is captured, non-reference images are used to average / merge multiple observations for each reference pixel across all images to reduce noise in the reference image .

일반적인 해결 방법인, 모든 등록된 비-기준 이미지들을 기준 이미지에 융합하는 출력 이미지 합성은 이미지들을 그대로 평균하는 것이다. 그대로 평균하는 것은 이미지의 정적 영역에 있는 노이즈는 감소시킬 수 있지만, 고스트 결함(ghosting artifact)을 유발할 수도 있다. 고스트 결함은 기준 이미지의 픽셀들 중 일부가 장면 속 움직이는 물체로 인해 비-기준 이미지들의 일부에서 차폐될 때 종종 발생한다. 캡쳐된 이미지들 간에 모션이 있으면, 이미지들을 그대로 평균할 때 심각한 고스트 결함이 최종 출력에 나타날 수 있다. 그러한 고스트 결함 효과의 예가 도 1에 도시된다. 도 1은 전체적으로 등록된 이미지들을 그대로 평균하여 생성된 출력을 도시한다. 도 1에서 볼 수 있듯이, 이미지들을 그대로 평균하면 심각한 고스트 결함이 존재한다.As a general solution, output image synthesis that fuses all registered non-reference images to a reference image is to average the images as is. Averaging as it is can reduce the noise in the static area of the image, but it can also cause ghosting artifacts. Ghosting defects occasionally occur when some of the pixels in the reference image are occluded in some of the non-reference images due to moving objects in the scene. If there is motion between the captured images, serious ghosting defects may appear at the final output when averaging the images as they are. An example of such a ghosting defect effect is shown in Fig. Figure 1 shows an output generated by averaging registered images as a whole. As can be seen in FIG. 1, there is a serious ghost defect when the images are averaged as they are.

고스트 결함을 방지하는 한가지 방법은 융합 절차에서 차폐와 노이즈를 구분하고 융합에서 모든 차폐 영역을 배재하는 것이다. 비-기준 픽셀들의 대응하는 기준 픽셀과 비교하여 크게 상이한 값을 갖는 모든 비-기준 픽셀들을 평균에서 배재함으로써 성취될 수 있다. 수용 가능한 차이의 양을 결정하는 한가지 방법은 특정 픽셀의 예상 노이즈에 기초하여 그것을 계산하는 것이다. 수용 임계치가 결정되는 즉시, 비-기준 픽셀들의 대응하는 기준 픽셀들에 비해 이 임계치보다 더 많이 차이나는 비-기준 픽셀들은 평균에서 배재될 수 있다.One way to avoid ghosting defects is to distinguish shielding from noise in the fusion procedure and to exclude all shielded areas from fusion. Can be accomplished by averaging all non-reference pixels having significantly different values compared to corresponding reference pixels of non-reference pixels. One way to determine the amount of acceptable difference is to calculate it based on the expected noise of a particular pixel. As soon as the acceptance threshold is determined, the non-reference pixels that differ by more than this threshold relative to the corresponding reference pixels of the non-reference pixels may be dispatched from the average.

그러나, 고스트/비-고스트 픽셀 분류에 대한 세트 임계치는 그 자체로 이미지 결함을 야기할 수 있고, 특히 심각한 노이즈가 있으면, 이것은 이미지 안정화를 위한 전형적인 케이스일 수 있다. 이것은 수용 임계치가 확실한 실패율을 가질 수 있는 통계적 추정치이기 때문이다. 이웃 픽셀들은 쉽게 임계치를 기준으로 어느 한쪽에 해당하므로, 고스트/비-고스트(즉 더 노이즈가 많은/더 깨끗한) 픽셀들 사이에 급격한 전환을 만들 수 있다. 따라서, 현재 사용되는 융합 방법은 개선될 수 있다.However, the set threshold for the ghost / non-ghost pixel classification may itself cause image defects and, particularly if there is a significant noise, this may be a typical case for image stabilization. This is because the acceptance threshold is a statistical estimate that can have a certain failure rate. Neighboring pixels can be easily shifted between ghosted / non-ghosted (i.e., more noise / cleaner) pixels because they correspond to either side of the threshold value. Thus, currently used fusion methods can be improved.

일 실시예에서 캡쳐된 기준 이미지를 캡쳐된 비-기준 이미지와 융합하는 방법이 제공된다. 방법은 처음에 캡쳐된 장면의 제1 이미지를 획득하는 단계 - 이미지는 복수의 픽셀을 가짐 -, 및 2차에 장면의 제2 이미지를 획득하는 단계 - 제1 이미지의 복수의 픽셀은 각각 제2 이미지에서 대응하는 픽셀을 가짐 - 를 포함한다. 이어서 방법은 제1 이미지에서 제1 픽셀을 선택하는 단계 및 제2 이미지에서 제1 픽셀의 대응하는 픽셀에 대한 비이진 가중치 값(non-binary weight value)을 결정하는 단계를 포함할 수 있다. 이어서 제1 픽셀은 제1 융합된 픽셀을 획득하기 위하여 비이진 가중치 값을 이용하여 제2 이미지에서 그것의 대응하는 픽셀과 조합될 수 있다. 프로세스는 융합된 이미지를 획득하기 위하여 제1 이미지의 복수의 다른 픽셀들 각각에 대하여 선택, 결정 및 조합을 반복할 수 있다.In one embodiment, a method of fusing a captured reference image with a captured non-reference image is provided. The method includes the steps of acquiring a first image of the initially captured scene, the image having a plurality of pixels, and acquiring a second image of the scene in a second order, wherein the plurality of pixels of the first image are each a second Having corresponding pixels in the image. The method may then include selecting a first pixel in the first image and determining a non-binary weight value for the corresponding pixel of the first pixel in the second image. The first pixel may then be combined with its corresponding pixel in the second image using a non-binary weight value to obtain a first fused pixel. The process may repeat selection, determination, and combination for each of a plurality of different pixels of the first image to obtain a fused image.

다른 실시예에서, 캡쳐된 기준 이미지를 캡쳐된 비-기준 이미지와 융합하기 위한 대안적인 방법이 제공된다. 이 해결책에 따른 방법은 처음에 캡쳐된 장면의 제1 이미지를 획득하는 단계 - 제1 이미지는 복수의 픽셀을 가짐 -, 및 이어서 2차에 장면의 제2 이미지를 획득하는 단계 - 제1 이미지의 복수의 픽셀은 각각 제2 이미지에서 대응하는 픽셀을 가짐 - 를 포함한다. 이어서 제2 이미지에서 제1 픽셀의 대응하는 픽셀에 대한 비이진 가중치 값이 결정될 수 있다. 이어서 방법은 비이진 가중치 값이 명시된 임계치보다 크면, 제1 픽셀과 그것의 대응하는 제2 이미지의 픽셀을 조합하여 제1 융합된 픽셀을 획득할 수 있다. 비이진 가중치 값이 명시된 임계치보다 작거나 같으면, 제1 픽셀과 그것의 대응하는 제2 이미지의 픽셀은 조합되지 않아도 된다. 이어서 프로세스는 제1 이미지의 복수의 다른 픽셀들 각각에 대하여 반복해서 선택, 결정 및 조합하여 융합된 이미지를 획득할 수 있다.In another embodiment, an alternative method is provided for fusing the captured reference image with the captured non-reference image. The method according to this solution comprises the steps of acquiring a first image of the initially captured scene, the first image having a plurality of pixels, and subsequently acquiring a second image of the scene in a second order, And the plurality of pixels each have a corresponding pixel in the second image. The non-binary weight value for the corresponding pixel of the first pixel in the second image may then be determined. The method may then obtain a first fused pixel by combining the pixels of the first pixel and its corresponding second image if the non-binary weighted value is greater than the specified threshold. If the non-binary weight value is less than or equal to the specified threshold, the pixels of the first pixel and its corresponding second image may not be combined. The process can then repeatedly select, determine, and combine for each of the plurality of different pixels of the first image to obtain a fused image.

또 다른 실시예에서, 캡쳐된 기준 이미지는 융합된 이미지를 얻기 위한 대안적인 방법으로 캡쳐된 비-기준 이미지와 융합될 수 있다. 이 해결책은 처음에 캡쳐된 장면의 제1 이미지를 획득하는 단계 - 제1 이미지는 복수의 픽셀을 가짐 -, 및 이어서 2차에 캡쳐된 장면의 제2 이미지를 획득하는 단계 - 제2 이미지는 복수의 픽셀 가지며 제2 이미지의 픽셀은 각각 제1 이미지에서 대응하는 픽셀을 가짐 - 를 포함한다. 이어서 제1 이미지의 다중 계층 피라미드 표현이 생성될 수 있고, 다중 계층 피라미드의 최상위 계층은 제1 이미지의 저해상도 표현을 포함하고 제1 다중 계층 피라미드의 최하위 계층은 제1 이미지의 고해상도 표현을 포함하고, 최상위 계층과 최하위 계층 사이의 계층은 각각 계층의 해상도에 대응하는 제1 이미지의 높은 공간 주파수 표현을 포함한다. 이어서 방법은 제2 이미지의 다중 계층 피라미드 표현을 생성할 수 있고, 제2 다중 계층 피라미드의 최상위 계층은 제2 이미지의 저해상도 표현을 포함하고 제2 다중 계층 피라미드의 최하위 계층은 제2 이미지의 고해상도 표현을 포함하고, 최상위 계층과 최하위 계층 사이의 계층은 각각 제1 이미지의 다중 계층 피라미드 표현의 대응하는 계층을 갖는다. 이어서 방법은 장면의 제1 다중 계층 피라미드 표현의 한 계층에 있는 픽셀들의 그룹마다, 장면의 제2 다중 계층 피라미드 표현의 대응하는 픽셀들의 그룹을 식별하고, 장면의 제1 및 제2 다중 계층 피라미드 표현들의 식별된 픽셀들의 그룹을 융합함으로써, 장면의 제1 및 제2 다중 계층 피라미드 표현의 계층마다 장면의 출력 다중 계층 피라미드 표현의 계층을 생성할 수 있다. 마지막으로, 장면의 출력 다중 계층 피라미드 표현을 조합함으로써, 장면을 나타내는 출력 이미지가 생성되고 메모리에 저장될 수 있다.In yet another embodiment, the captured reference image may be fused with the captured non-reference image in an alternative manner to obtain the fused image. This solution is obtained by first acquiring a first image of a captured scene, the first image having a plurality of pixels, and subsequently acquiring a second image of the scene captured secondarily, the second image comprising a plurality And the pixels of the second image each have a corresponding pixel in the first image. Layer pyramid representation of the first image, wherein the uppermost layer of the multi-layer pyramid comprises a low-resolution representation of the first image and the lowest layer of the first multi-layer pyramid comprises a high-resolution representation of the first image, The layer between the top layer and the bottom layer includes a high spatial frequency representation of the first image, each corresponding to a resolution of the layer. The method may then produce a multi-layer pyramid representation of the second image, wherein the uppermost layer of the second multi-layer pyramid comprises a low-resolution representation of the second image, and the lowest layer of the second multi-layer pyramid comprises a high- And the layer between the highest layer and the lowest layer each has a corresponding layer of a multilayer pyramid representation of the first image. The method then identifies, for each group of pixels in one layer of the first multi-layer pyramid representation of the scene, a corresponding group of pixels of the second multi-layer pyramid representation of the scene, Layer pyramid representation of the scene for each of the layers of the first and second multi-layer pyramid representations of the scene by merging a group of identified pixels of the scene. Finally, by combining the output multilayer pyramid representation of the scene, an output image representing the scene can be generated and stored in memory.

또 다른 실시예에서, 캡쳐된 기준 이미지는 융합된 이미지를 얻기 위한 다른 방법으로 캡쳐된 비-기준 이미지와 융합될 수 있다. 이 해결책은 처음에 캡쳐된 장면의 제1 이미지를 획득하는 단계 - 제1 이미지는 복수의 픽셀을 가짐 -, 및 이어서 제1 이미지의 다중 해상도 분해를 수행하여 제1 이미지의 제1 다중 계층 피라미드 표현을 생성하는 단계를 포함한다. 이어서 장면의 제2 이미지가 획득될 수 있고, 제2 이미지는 제1 이미지와 상이한 시간에 캡쳐되고, 제1 이미지의 복수의 픽셀은 각각 제2 이미지에서 대응하는 픽셀을 가진다. 이어서 제2 이미지의 다중 해상도 분해를 수행하여 제2 이미지의 제2 다중 계층 피라미드 표현을 생성할 수 있다. 이어서 방법은 제1 이미지에서 하나 이상의 픽셀을 선택하고, 제1 이미지의 하나 이상의 픽셀에 대응하는 제2 이미지의 하나 이상의 픽셀에 대한 비이진 가중치 값을 결정하고, 비이진 가중치 값이 명시된 임계치보다 크면 제1 이미지의 하나 이상의 픽셀과 그것들의 대응하는 제2 이미지의 하나 이상의 픽셀을 조합하여 제1 융합된 픽셀을 획득하고, 비이진 가중치 값이 명시된 임계치보다 작거나 같으면 제1 이미지의 하나 이상의 픽셀과 그것들의 대응하는 제2 이미지의 하나 이상의 픽셀을 조합하지 않음으로써 장면의 제1 및 제2 다중 계층 피라미드 표현들의 계층마다 장면의 출력 다중 계층 피라미드 표현의 계층을 생성한다. 이어서 프로세스는 반복해서 제1 이미지의 다중 해상도 분해의 계층마다 장면의 출력 다중 계층 피라미드 표현의 계층을 생성할 수 있다. 장면의 출력 다중 계층 피라미드 표현의 상이한 계층들이 조합되어 출력 이미지를 생성할 수 있다.In yet another embodiment, the captured reference image may be fused with a non-reference image captured by another method for obtaining a fused image. This solution is achieved by first obtaining a first image of the captured scene, the first image having a plurality of pixels, and then performing a multi-resolution decomposition of the first image to generate a first multi-layer pyramid representation of the first image . The second image of the scene may then be captured, the second image captured at a different time than the first image, and each of the plurality of pixels of the first image has a corresponding pixel in the second image. Resolution resolution of the second image to produce a second multi-layer pyramid representation of the second image. The method then selects one or more pixels in the first image, determines a non-binary weight value for one or more pixels of the second image corresponding to the one or more pixels of the first image, and if the non-binary weight value is greater than the specified threshold Combining one or more pixels of the first image and one or more pixels of their corresponding second images to obtain a first fused pixel and if the non-binary weighted value is less than or equal to the specified threshold, Layer pyramid representation of the scene for each layer of the first and second multi-layer pyramid representations of the scene by not combining one or more pixels of their corresponding second images. The process may then repeatedly generate a hierarchy of output multi-layer pyramid representations of the scene for each layer of multi-resolution decomposition of the first image. The different layers of the output multi-layer pyramid representation of the scene can be combined to produce an output image.

도 1은 종래 기술에 따른 이미지 융합 동작들의 예를 도시한다.
도 2는 일 실시예에 따른 흐름도 형태의 융합 동작을 도시한다.
도 3은 일 실시예에 따른 전형적인 시간적 융합 동작을 도시한다.
도 4는 일 실시예에 따라 블록 픽셀들을 이용하는 전형적인 시간적 융합 동작을 도시한다.
도 5는 일 실시예에 따른 전형적인 공간-시간적 융합 동작을 도시한다.
도 6은 일 실시예에 따라 흐름도 형태의 다중 해상도 융합 동작을 도시한다.
도 7은 대안적인 실시예에 따른 전형적인 캡쳐된 이미지 시퀀스를 도시한다.
도 8a 및 도 8b는 또 다른 실시예에 따른 도 7의 캡쳐된 이미지 시퀀스를 융합함으로써 생성된 융합된 출력 이미지들을 도시한다.
도 9는 일 실시예에 따른 다중 해상도 분해에 의해 생성되는 전형적인 이미지들을 도시한다.
도 10은 일 실시예에 따라 다중 해상도 융합 동작을 흐름도 형태로 도시한다.
도 11은 일 실시예에 따른 다기능 전자 디바이스를 블록 다이어그램 형태로 도시한다.Figure 1 shows an example of image fusion operations according to the prior art.
Figure 2 illustrates a convergence operation in the form of a flow diagram according to one embodiment.
FIG. 3 illustrates an exemplary temporal convergence operation in accordance with one embodiment.
4 illustrates an exemplary temporal convergence operation using block pixels in accordance with one embodiment.
5 illustrates an exemplary spatial-temporal convergence operation in accordance with one embodiment.
6 illustrates a multi-resolution fusion operation in the form of a flow diagram in accordance with one embodiment.
Figure 7 illustrates a typical captured image sequence in accordance with an alternative embodiment.
8A and 8B illustrate fused output images generated by fusing the captured image sequence of FIG. 7 according to another embodiment.
FIG. 9 illustrates exemplary images generated by a multi-resolution decomposition according to an embodiment.
10 illustrates a multi-resolution fusion operation in flow diagram form in accordance with one embodiment.
11 shows a block diagram of a multifunctional electronic device according to one embodiment.

본 개시내용은 이미지 안정화 동작을 개선하기 위한 시스템, 방법, 및 컴퓨터 판독 가능 매체에 적용된다. 일 실시예에서, 공통적으로 캡쳐된 이미지들의 셋트에서 등록된 비-기준 이미지들을 기준 이미지와 융합하기 위한 새로운 접근법이 이용될 수 있다. 융합 방법은 이웃하는 거의 유사한 픽셀들 간의 급격한 전환을 방지하기 위하여 고스트/비-고스트 픽셀들에 대하여 가중 평균을 사용함으로써 픽셀들 간의 부드러운 전환을 이용할 수 있다. 대안적인 실시예에서, 고스트/비-고스트 결정은 각각의 픽셀에 대하여 독립적으로 진행되기 보다는 이웃하는 픽셀들의 세트에 기초하여 이루어질 수 있다. 대안적인 해결책은 모든 캡쳐된 이미지들의 다중 해상도 분해를 수행하는 단계, 각 계층에서 가중 평균을 이용 및/또는 이웃하는 픽셀들의 세트를 검사하여 각각의 계층에서 어떤 픽셀들을 융합할지 결정하는 단계, 및 출력 이미지를 생성하기 위하여 상이한 계층들을 조합하는 단계를 포함할 수 있다.The present disclosure applies to systems, methods, and computer-readable media for improving image stabilization operations. In one embodiment, a new approach may be used to fuse registered non-reference images with a reference image in a set of commonly captured images. The fusion method can take advantage of the smooth transition between the pixels by using a weighted average for the ghost / non-ghost pixels to prevent abrupt transition between neighboring similar pixels. In an alternative embodiment, the ghost / non-ghost crystal may be made based on a set of neighboring pixels rather than progressing independently for each pixel. An alternative solution is to perform multi-resolution decomposition of all captured images, use a weighted average at each layer and / or examine a set of neighboring pixels to determine which pixels to fuse at each layer, And combining the different layers to produce an image.

하기의 설명에서, 설명의 목적들을 위해, 다수의 특정 상세 사항들이 본 발명의 개념의 철저한 이해를 제공하기 위해 기재된다. 이러한 설명의 일부로서, 본 개시 내용의 도면들의 일부는 본 발명을 모호하게 하는 것을 피하기 위해 구조들 및 디바이스들을 블록도 형태로 나타낸다. 명료하도록, 실제 구현예의 모든 특징부들이 기술되지 않는다. 게다가, 본 개시 내용에 사용된 표현은 원칙적으로 가독성 및 교육 목적들을 위해 선택되었으며, 본 발명의 요지를 기술하거나 제한하도록 선택되지는 않아서, 그러한 본 발명의 요지를 결정하기 위해 특허청구범위에 대한 의존이 필요할 수 있다. "일 실시예"에 대한 또는 "실시예"에 대한 본 개시 내용에서의 언급은 실시예와 관련되어 설명된 특정한 특징부, 구조, 또는 특성이 본 발명의 적어도 하나의 실시예에 포함된다는 것을 의미하며, "일 실시예" 또는 "실시예"에 대한 다수의 언급들이 반드시 모두 동일한 실시예를 지칭하는 것으로서 이해되어서는 안된다.In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the concepts of the present invention. As part of this description, some of the figures of the present disclosure show structures and devices in block diagram form in order to avoid obscuring the present invention. To be clear, not all features of an actual implementation are described. In addition, the expressions used in this disclosure are in principle selected for readability and educational purposes and are not selected to describe or limit the gist of the present invention, May be required. Reference in the present disclosure to "one embodiment" or "an embodiment " means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention , And numerous references to "one embodiment" or "an embodiment" are not necessarily to be construed as indicating the same embodiment.

(임의의 개발 프로젝트에서처럼) 임의의 실제 구현예의 개발에서, 개발자의 특정 목표들(예컨대, 시스템 관련 및 사업 관련 제약들의 준수)을 달성하기 위해 많은 결정들이 이루어져야 하며, 이들 목적들이 구현예별로 달라질 수 있음이 이해될 것이다. 그러한 개발 노력들은 복잡하고 시간 소모적일 수 있지만, 그럼에도 불구하고 본 발명의 이득을 갖는 이미지 안정화 시스템의 설계 및 구현예에서 당업자를 위해 착수하는 루틴일 것임이 또한 이해될 것이다.In the development of any actual implementation (as in any development project), many decisions must be made to achieve the developer's specific goals (e.g., compliance with system-related and business-related constraints), and these goals may vary from implementation to implementation Will be understood. It will also be appreciated that such development efforts may be complex and time consuming, but nevertheless routine for those of ordinary skill in the art of designing and implementing an image stabilization system with the benefit of the present invention.

이미지 안정화의 하나의 새로운 접근법은 등록된 비-기준 이미지들과 기준 이미지의 시간적 융합에 의한 출력 이미지를 생성하는 단계를 포함한다. 도 2를 참조하여, 이 접근법에 따른 일 실시예에서, 이미지 시퀀스 R이 수신되면(블록(205)) 이미지 안정화 동작(200)이 시작된다. 통상적인 이미지 안정화 동작의 첫 단계들 중 하나는 기준 이미지로서 시퀀스에서 이미지들 중 하나를 선택하는 것이다(블록(210)). 기준 이미지를 선택하는 것에 대하여 해당 기술분야에서 공지된 다양한 방법들이 많다. 미국 특허 출원 번호 불명의, 본 출원과 동시에 출원되고 발명의 제목이 "Reference Frame Selection for Still Image Stabilization"인 출원이, 본 명세서에 전체적으로 참조로서 포함되고, 그러한 방법을 기술한다. 기준 이미지가 선택된 이후, 시퀀스의 나머지 이미지들은 기준 이미지에 대하여 전체적으로 등록될 수 있다(블록(215)). 기준 이미지에 대한 비-기준 이미지들의 전체 등록의 한 접근법은 발명의 제목이 "Image Registration Methods for Still Image Stabilization"이고, 본 출원과 동시에 출원되고 본 명세서에 전체적으로 참조로서 포함되는 미국 특허 출원에서 논의된다.One new approach to image stabilization involves generating an output image by temporal fusion of registered non-reference images and reference images. Referring to FIG. 2, in one embodiment according to this approach, image sequence R is received (block 205) and image stabilization operation 200 begins. One of the first steps in a typical image stabilization operation is to select one of the images in the sequence as a reference image (block 210). There are many different methods known in the art for selecting a reference image. Filed concurrently herewith, the subject matter of which is entitled " Reference Frame Selection for Still Image Stabilization ", filed concurrently herewith, the disclosure of which is incorporated herein by reference in its entirety. After the reference image is selected, the remaining images in the sequence may be registered as a whole for the reference image (block 215). One approach to full registration of non-reference images for a reference image is discussed in U.S. Patent Application, entitled " Image Registration Methods for Still Image Stabilization ", filed concurrently with the present application and incorporated herein by reference in its entirety .

비-기준 이미지들이 전체적으로 등록되는 즉시, 이미지 시퀀스에 있는 모든 이미지들의 대응하는 픽셀들은 동일한 공간 좌표(x, y)를 가질 수 있다. 이미지들이 상이한 시간의 순간에 획득되기 때문에, 각각의 픽셀은 시간을 표시하는, 단지 이미지 인덱스(x, y, t)에 대응하는 제3 좌표에 의해 표시될 수 있다. 예를 들어, 픽셀(x, y, 3)은 제3 이미지에서 공간 좌표(x, y)에 위치한 픽셀을 표시할 수 있다.As soon as non-reference images are registered as a whole, corresponding pixels of all images in the image sequence may have the same spatial coordinates (x, y). Since images are acquired at different moments of time, each pixel can be represented by a third coordinate, which corresponds to the image index (x, y, t), which represents the time. For example, the pixel (x, y, 3) may represent a pixel located at the spatial coordinate (x, y) in the third image.

시간적 융합은 픽셀들의 시간적인 차원을 따라 픽셀들을 융합하는 단계를 포함한다. 이것이 도 3에 도시되고, 라인(305)은 기준 이미지를 표시하고 라인(310, 315, 320)은 각각 이미지 시퀀스에서 하나의 등록된 비-기준 이미지를 표시한다. 단순하게, s로 나타낸, 하나의 공간 좌표만 도시된다. 수평축은 시간 좌표를 나타내고, 그것에 따라 수신된 프레임들이 배치될 수 있다. 픽셀(325)은 비-기준 이미지들의 대응하는 픽셀들과 융합될 필요가 있는 기준 이미지의 픽셀을 나타낸다. 보이는 바와 같이, 시간적 융합에서, 현재 픽셀은 모든 이미지들에서 동일한 공간 좌표를 갖는 픽셀들과 융합될 수 있다. 따라서 도 3에서, 픽셀(325)은 픽셀들(330, 335, 340)과 융합될 수 있다.Temporal fusing includes fusing pixels along the temporal dimension of the pixels. 3, line 305 represents the reference image and lines 310, 315 and 320 represent one registered non-reference image in the image sequence, respectively. Only one spatial coordinate, shown in s, is shown. The horizontal axis represents time coordinates, and the received frames can be arranged accordingly. Pixel 325 represents a pixel of the reference image that needs to be fused with corresponding pixels of non-reference images. As can be seen, in temporal fusion, the current pixel can be fused with pixels having the same spatial coordinates in all images. Thus, in FIG. 3, pixel 325 may be fused with pixels 330, 335, and 340.

기준 이미지 픽셀에 대응하는 비-기준 이미지 픽셀들은 종종 장면에서 움직이는 물체들로 인해 차폐될 수 있다. 위에서 논의한 바와 같이, 그러한 픽셀들을 기준 픽셀과 융합하는 것은 고스트 결함을 야기할 수 있다. 최종 출력 이미지에서 고스트 결함을 방지하기 위하여, 시간적 융합 동작(200)(도 2 다시 참조)은 비-기준 이미지의 픽셀이 고스트인지 아닌지 결정할 수 있다. 두 픽셀 간의 유사성을 결정하기 위하여 비-기준 이미지의 각각의 픽셀을 기준 이미지의 대응하는 픽셀과 비교함으로써(블록(220)) 수행될 수 있다.Non-reference image pixels corresponding to a reference image pixel can often be shielded by moving objects in the scene. As discussed above, fusing such pixels with reference pixels can cause ghosting defects. To prevent ghosting defects in the final output image, the temporal convergence operation 200 (see FIG. 2) may determine whether the pixels of the non-reference image are ghosted. (Block 220) by comparing each pixel of the non-reference image with a corresponding pixel of the reference image to determine the similarity between the two pixels.

그러나, 픽셀 유사성에 기초하여 어렵게 고스트/비-고스트 결정을 내리는 대신에, 동작은 각각의 비-기준 픽셀에 대하여 가중 함수를 계산할 수 있다(블록(225)). 일 실시예에서, 가중 함수는 0과 1 사이의 값을 갖는다. 가중치 1은 비-고스트 픽셀에 해당하고 가중치 0은 고스트 픽셀에 해당할 수 있다.However, instead of making the ghost / non-ghost determination with difficulty based on pixel similarity, the operation may calculate the weighting function for each non-reference pixel (block 225). In one embodiment, the weighting function has a value between zero and one. Weight 1 corresponds to non-ghost pixels and weight 0 corresponds to ghost pixels.

일 구현예에서, 가중치는 각각의 비-기준 픽셀을 그것에 대응하는 기준 이미지의 픽셀과 비교함으로써 계산될 수 있다. 대안적인 실시예에서, 가중치는 특정 노출 파라미터들에서 픽셀 유사값 및 예상 노이즈 함량에 기초하여 계산될 수 있다. 해당 기술에서 알려진 바와 같이, 많은 카메라들은 특정 노출 파라미터들에서 각각의 픽셀에 대해 알려진 예상 노이즈 함량을 갖는다. 픽셀의 예상 노이즈 함량은 그것의 가중 함수를 계산하는 데 사용될 수 있다. 노이즈 표준편차 S(x, y) 및 픽셀 유사값 D(x, y)에 기초하여 픽셀(x, y)에 대한 가중치 W(x, y)를 계산함으로써 수행될 수 있다. 픽셀 유사값은 픽셀(x, y)과 그것에 대응하는 기준 픽셀 간의 픽셀 차이값일 수 있다. 이미지들이 YUV 색 공간에서 표시되는 것을 가정하면, 모든 픽셀은 3 개의 픽셀 값 차이(Dy, Du, Dv), 및 3 개의 노이즈 표준편차(Sy, Su, Sv)를 가질 수 있다.In one implementation, the weights may be computed by comparing each non-reference pixel with a pixel of the reference image corresponding to it. In an alternative embodiment, the weights may be calculated based on the pixel similarity value and the expected noise content at certain exposure parameters. As is known in the art, many cameras have a known anticipated noise content for each pixel at specific exposure parameters. The expected noise content of a pixel can be used to calculate its weighting function. Can be performed by calculating the weight W (x, y) for the pixel (x, y) based on the noise standard deviation S (x, y) and the pixel similarity value D (x, y). The pixel similarity value may be a pixel difference value between the pixel (x, y) and the corresponding reference pixel. Assuming that the images are displayed in the YUV color space, all pixels may have three pixel value differences (Dy, Du, Dv), and three noise standard deviations (Sy, Su, Sv).

사용되는 특정 가중 함수는 달라질 수 있고 이는 설계 선택의 문제이다. 일 실시예에서, 가중 함수는 가우시안 함수일 수 있다. 다른 실시예에서, 가중 함수는 선형적일 수 있다. 식(1)은 예시 가중 함수를 나타낸다.The specific weighting function used can vary, which is a matter of design choice. In one embodiment, the weighting function may be a Gaussian function. In another embodiment, the weighting function may be linear. Equation (1) represents an example weight function.

ω_{t =}ω_{Y *}ω_{U *}ω_V ω _{t =} ω _{Y *} ω _{U *} ω _V

(1)(One)

여기서 ω_t는 비-기준 픽셀(x, y, t)에 배정된 가중치를 나타내고, ω_Y는 Y 채널에 대응하는 가중치 성분을 나타내고, ω_U는 U 채널에 대한 가중치 성분에 해당하고, ω_V는 V 채널에 대한 가중치 성분을 나타낸다. 이 실시예에서, 계산된 가중 함수 ω_t는 픽셀(x, y, t)이 비-고스트 픽셀일 확률을 나타낸다. 가중치가 예상 노이즈 함량과 픽셀 값 차이에 기초하여 계산되는 실시예에서, 가중치 파라미터들은 다음 식들에 따라 계산될 수 있다.Where ω _t denotes a weight component assigned to the non-reference pixel (x, y, t), ω _Y denotes a weight component corresponding to the Y channel, ω _U corresponds to a weight component for the U channel, ω _V Represents a weight component for the V channel. In this embodiment, the calculated weighting function? _T represents the probability that the pixel (x, y, t) is a non-ghost pixel. In an embodiment in which the weights are calculated based on the expected noise content and the difference in pixel values, the weighting parameters may be calculated according to the following equations.

(2)(2)

(3)(3)

(4)(4)

여기서(

)는 설계 선호도에 따라 값이 설정될 수 있는 상수들일 수 있다.here(

) May be constants whose values can be set according to design preferences.

픽셀이 고스트인지 여부를 결정하기 위한 대안적인 접근법은 픽셀들의 블록을 서로 비교하는 것이다. 모든 픽셀(x, y)마다, 개별적인 픽셀을 단독으로 분석하는 대신에, (x, y)를 중심으로 하는 픽셀들의 블록을 분석함으로써, 수행될 수 있다. 이미지(400)가 기준 이미지를 나타내고 이미지들(420, 450)이 2 개의 비-기준 이미지를 나타내는 예시가 도 4에 도시된다. 이미지(400)의 픽셀(405)을 대응하는 픽셀들(425, 455)와 융합하기 위하여 가중치가 계산될 필요가 있으면, 픽셀(405)을 중심으로 하는 블록(410)은 블록(430, 460)과 비교될 수 있는데, 각 블록은 각각의 비-기준 이미지에서 대응하는 픽셀을 중심으로 한다. 이어서, 개별적인 픽셀들 간의 차이를 계산하는 대신에, 대응하는 블록들, 예를 들어 블록(410)과 블록(430) 또는 블록(410)과 블록(460) 간의 차이를 계산할 수 있다. 예를 들어, 블록들 간의 MAD(Mean Absolute Difference) 또는 MSD(Mean Square Difference)를 계산함으로써 수행될 수 있다. MAD는 다음 식에 따라 계산될 수 있다.An alternative approach to determining whether a pixel is a ghost is to compare blocks of pixels with each other. For each pixel (x, y), by analyzing the block of pixels centered at (x, y), instead of analyzing the individual pixels alone. An example where image 400 represents a reference image and images 420 and 450 represent two non-reference images are shown in FIG. If a weight needs to be computed to fuse the pixel 405 of the image 400 with the corresponding pixels 425 and 455 then the block 410 centered on the pixel 405 is the block 430, , Where each block is centered on a corresponding pixel in each non-reference image. Then, instead of calculating the difference between the individual pixels, the difference between the corresponding blocks, for example, block 410 and block 430, or between block 410 and block 460, can be calculated. For example, it can be performed by calculating Mean Absolute Difference (MAD) or Mean Square Difference (MSD) between blocks. MAD can be calculated according to the following equation.

(5)(5)

여기서

는 고스트/비-고스트로서의 상태가 결정되는, 비-기준 이미지에 위치한 픽셀의 좌표를 나타내고,

는 기준 이미지에 위치한 대응하는 픽셀의 좌표들을 나타낸다. i 및 j는 각각의 픽셀을 둘러싸는 블록에 걸쳐있는 가산 인덱스(summation indice)들이고, '||'는 절대값 연산자를 나타낸다. 픽셀 차이값이 블록에 대하여 식(5)에 따라 계산되면, 계산된 값은 위에서 논의한 바와 같이 가중치 파라미터들(ω_Y,ω_U,ω_V)을 계산하는 데 사용된다. 선택되는 블록의 크기는 달라질 수 있고 이는 설계 선택의 문제이다. 일 실시예에서, 블록은 3×3 일 수 있다. 다른 구현예에서, 블록은 5×5 일 수 있다.here

Represents the coordinates of the pixel located in the non-reference image, the state of which is determined as ghost / non-ghost,

Represents the coordinates of the corresponding pixel located in the reference image. i and j are summation indices that span blocks surrounding each pixel and '||' represents an absolute value operator. When the pixel difference value is calculated according to equation (5) for the block, the calculated value is used to calculate the weighting parameters (? _Y ,? _U ,? _V ) as discussed above. The size of the selected block can vary, which is a matter of design choice. In one embodiment, the block may be 3x3. In other implementations, the block may be 5x5.

도 2를 다시 참조하여, 각각의 비-기준 픽셀에 대하여 가중치가 계산되면, 결과값은 출력 이미지의 대응하는 픽셀에 대한 값을 계산하는 데 사용될 수 있다(블록(230)). 다음 식은 이 계산에 사용될 수 있다.Referring again to FIG. 2, once a weight is calculated for each non-reference pixel, the result value may be used to calculate a value for the corresponding pixel of the output image (block 230). The following equation can be used for this calculation.

(6)(6)

여기서

는 최종 출력 픽셀 값을 나타내고,

는 이미지 t의 공간 좌표

에서의 픽셀 값을 나타내고,

는 비-기준 픽셀

에 배정된 가중치이다. 기준 이미지는 t = 0 인 시간 좌표를 갖는 것으로 가정될 수 있다.here

Represents the final output pixel value,

Is the spatial coordinate of the image t

Lt; / RTI > pixel value,

Lt; RTI ID = 0.0 > non-

. The reference image can be assumed to have a time coordinate of t = 0.

세트 임계치 대신에 가중 함수를 사용함으로써, 시간적 융합 동작은 고스트와 비-고스트 픽셀들 사이에 부드러운 전환을 제공하여, 급격한 전환 및 그로 인한 이미지 결함들을 방지할 수 있다. 그러나, 모든 이미지들에서 동일한(x, y) 공간 좌표를 갖는 픽셀들만을 고려하는 것은, 각각의 시퀀스에 대하여 단시간 노출된 이미지들의 수만큼만 양호하게 노이즈 제거를 성취하는 것으로 프로세스를 제한한다. 예를 들어, 4 개의 이미지들만 이미지 시퀀스에서 수신되는 경우, 기준 픽셀은 다른 이미지들에서 최대 3 개의 차폐되지 않은 픽셀들과 평균할 수 있다. 이미지들 중 하나 이상의 픽셀이 고스트인 경우, 융합될 수 있는 픽셀들의 수는 더 줄어든다. 그러한 경우에, 선택할 픽셀들을 더 많이 할수록 최종 출력 이미지의 품질을 현저하게 개선함을 발견하게 되었다. 일 실시예에서, 본 명세서에서 공간-시간적 융합으로 지칭되는 동작을 사용함으로써 수행될 수 있다.By using a weighting function instead of the set threshold, the temporal convergence operation provides a smooth transition between ghost and non-ghost pixels, thereby avoiding abrupt transitions and hence image defects. However, considering only pixels with the same (x, y) spatial coordinates in all images limits the process to achieving good noise cancellation only as many as the number of short-time exposed images for each sequence. For example, if only four images are received in the image sequence, the reference pixel may average a maximum of three unshielded pixels in other images. If one or more of the images is a ghost, the number of pixels that can be fused is further reduced. In such a case, it has been found that the more pixels to choose, the better the quality of the final output image. In one embodiment, this can be done by using an operation referred to herein as spatial-temporal fusion.

일 실시예에서 공간-시간적 융합은 단지 동일한 공간 좌표를 갖는 픽셀들뿐만 아니라 다른 가능성있는 픽셀들을 융합함으로써 시간적 융합 접근법의 확장을 포함할 수 있다. 따라서, 기준 이미지의 픽셀(x, y)은 비-기준 이미지들에서 상이한 공간 좌표를 갖는 픽셀들과 매칭될 수 있다. 기준 이미지(505)의 픽셀(525)은 비-기준 이미지들(510, 515, 520) 각각의 다중 픽셀들과 융합되는 것이 도 5에 도시된다.Spatial-temporal fusion in one embodiment may include an extension of the temporal convergence approach by merely fusing pixels as well as other possible pixels with the same spatial coordinates. Thus, the pixel (x, y) of the reference image may be matched with pixels having different spatial coordinates in the non-reference images. The pixel 525 of the reference image 505 is shown in FIG. 5 to be fused with multiple pixels of each of the non-reference images 510, 515, 520.

도 6을 참조하면, 본 접근법에 따른 일 실시예에서, 이미지 시퀀스 R이 수신되면(블록(605)), 이미지 안정화 동작(600)이 시작된다. 이어서 기준 이미지가 이미지 시퀀스로부터 선택될 수 있고(블록(610)), 비-기준 이미지들이 선택된 기준 이미지에 대하여 등록될 수 있다(블록(615)). 이어서 비-기준 이미지들의 각각의 픽셀(x, y)은 대응하는 기준 이미지의 픽셀과 비교되어(블록(620)) 각각의 비-기준 픽셀에 대하여 가중 함수를 계산할 수 있다(블록(625)). 가중치가 계산되면, 미리 결정된 임계치와 비교하여 픽셀이 고스트일 가능성을 결정할 수 있다(블록(630)).Referring to FIG. 6, in an embodiment consistent with this approach, if an image sequence R is received (block 605), image stabilization operation 600 begins. The reference image may then be selected from the image sequence (block 610) and non-reference images may be registered for the selected reference image (block 615). Each pixel (x, y) of the non-reference images may then be compared to the pixels of the corresponding reference image (block 620) to compute a weighting function for each non-reference pixel (block 625) . Once the weight is calculated, it may be compared to a predetermined threshold to determine the likelihood that the pixel is a ghost (block 630).

미리 결정된 임계치는 가중치에 대하여 임의의 가능한 값들 중에서 선택될 수 있다. 일 실시예에서, 임계치는 가중치의 최대값의 10%일 수 있다. 따라서 가중치 값의 범위가 0과 1 사이이면, 임계치는 0.1 일 수 있다.The predetermined threshold may be selected from any possible values for the weights. In one embodiment, the threshold may be 10% of the maximum value of the weight. Thus, if the range of weight values is between 0 and 1, then the threshold may be 0.1.

일 실시예에서, 픽셀이 고스트를 나타내는지 여부에 대한 결정은 시간적 융합에서와 같은 방식으로, 기준 이미지의 대응하는 픽셀을 중심으로 하는 작은 이미지 블록을, 분석되고 있는 픽셀을 중심으로 하는 유사한 블록과 비교함으로써 이루어질 수 있다.In one embodiment, the determination as to whether a pixel represents a ghost may be accomplished in the same manner as in temporal fusion, by combining a small image block centered at the corresponding pixel of the reference image with a similar block centered on the pixel being analyzed .

블록(630)에서 가중치가 임계치보다 큰 것으로 결정되는 경우, 시간적 융합을 사용하여 대응하는 출력 이미지의 픽셀에 대한 값을 계산할 수 있다(블록(635)). 가중치가 임계치보다 작은 경우, 픽셀이 고스트 픽셀일 가능성이 있음을 나타내는데, 그렇다면 동작(600)은 더 나은 융합 후보자를 찾기 위하여 픽셀(x, y)을 포함하는 이미지에서 공간적 탐색을 수행할 수 있다(블록(640)). 일 실시예에서, 공간적 탐색은 픽셀(x, y)에 이웃한 모든 다른 공간적 위치들을 고려함으로써 수행될 수 있다. (x, y) 둘레에 특정하게 이웃하는 공간적인 위치를 갖는 비-기준 이미지들의 모든 픽셀들을 고려함으로써 수행될 수 있다. 대안적인 실시예는 공간 좌표(x, y)에 특정한 이웃에 위치하는 비-기준 픽셀들의 서브셋만을 이용하는 것을 포함한다. 시간적 융합에 대하여 이전에 논의한 바와 같이, 대안적인 접근법은 기준 픽셀을 중심으로 하는 픽셀 블록을 비-기준 이미지들에서 선택한 픽셀 후보군 중 각각을 둘러싸는 대응하는 픽셀 블록과 매칭시키는 것을 포함할 수 있다. 비-기준 픽셀 후보군의 서브셋은 또한 하나의 기준 픽셀에서 다른 기준 픽셀로 변경될 수 있다. 이는 새로운 기준 픽셀을 처리할 때의 비-기준 픽셀 후보군의 패턴이 이전 기준 픽셀을 처리할 때 사용된 패턴과 상이할 수 있다는 것을 의미한다.If it is determined at block 630 that the weight is greater than the threshold, temporal fusion may be used to calculate the value for the pixel of the corresponding output image (block 635). If the weight is less than the threshold, it indicates that the pixel is likely to be a ghost pixel, then the action 600 may perform a spatial search on the image containing the pixel (x, y) to find a better fusion candidate Block 640). In one embodiment, the spatial search may be performed by considering all the different spatial positions neighboring the pixel (x, y). by considering all the pixels of the non-reference images having a spatial position that is specifically adjacent to (x, y). An alternative embodiment involves using only a subset of non-reference pixels located in a neighborhood specific to the spatial coordinates (x, y). As discussed previously for temporal fusing, an alternative approach may include matching a pixel block centered around a reference pixel with a corresponding pixel block surrounding each of the selected pixel candidate groups in non-reference images. The subset of non-reference pixel candidate sets may also be changed from one reference pixel to another reference pixel. This means that the pattern of non-reference pixel candidates when processing a new reference pixel may differ from the pattern used when processing the previous reference pixel.

탐색에 사용된 접근법에 상관없이, 대응하는 기준 이미지의 픽셀과의 융합을 위한 하나 이상의 픽셀이 각각의 비-기준 이미지에서 발견되면, 선택된 픽셀들의 가중치가 앞에서와 유사한 방법으로 계산될 수 있다(블록(645)). 이어서 계산된 가중치 값을 사용하여 대응하는 출력 이미지의 픽셀에 대한 값을 결정할 수 있다(블록(635)).Regardless of the approach used for the search, if one or more pixels for fusion with the pixels of the corresponding reference image are found in each non-reference image, the weights of the selected pixels may be calculated in a similar manner as before (Step 645). The calculated weight value may then be used to determine the value for the corresponding output image pixel (block 635).

이런 방식으로 시간적 융합 및 공간적 융합을 조합하는 것은 효율을 증가시면서, 또한 출력 이미지의 품질을 개선한다는 것을 알게 되었다. 그 이유는 더 나은 픽셀을 탐색하는 것이 오직 픽셀이 고스트일 가능성이 있다고 결정될 때만 수행되기 때문이다. 이는 탐색 동작이 일반적으로 이미지들의 모든 픽셀이 아니라 제한된 갯수의 픽셀들에만 수행되어, 효율성을 현저하게 개선함을 의미한다.It has been found that combining temporal and spatial fusing in this manner increases efficiency and also improves the quality of the output image. The reason is that searching for a better pixel is only performed when it is determined that the pixel is likely to be a ghost. This means that the search operation is typically performed on a limited number of pixels, rather than all pixels of the images, thereby significantly improving efficiency.

위에서 논의한 시간적인 및 공간-시간적 융합 동작들은 이미지 시퀀스의 모든 이미지들이 상대적으로 선명할 때 잘 작동한다. 그러나, 이미지 시퀀스의 모든 이미지들이 선명할 것이라고 보장할 수는 없다. 기준 이미지가 일반적으로 선명한 이미지인 것이 사실이지만, 비-기준 이미지들 중 일부는 카메라 또는 빠른 물체 모션으로 인해 흐릴 수 있다. 이전 동작에서 설명한 바와 같이 이미지들을 융합함으로써, 임의의 비-기준 이미지에 존재하는 블러가 출력 이미지에서 보이게 될 수 있다. 이것은 도 7 및 도 8에서 예시된다.The temporal and spatial-temporal convergence operations discussed above work well when all images in the image sequence are relatively clear. However, there is no guarantee that all images in the image sequence will be clear. While it is true that the reference image is generally a sharp image, some of the non-reference images may be blurred due to camera or fast object motion. By blending the images as described in the previous operation, the blur present in any non-reference image can be seen in the output image. This is illustrated in Figures 7 and 8.

도 7은 출력 이미지를 생성하기 위한 본 개시내용에 따라 이미지 안정화 동작에 의해 처리되고 있는 이미지 시퀀스의 4 장의 단시간 노출된 이미지들(705, 710, 715, 720)을 도시한다. 보이는 바와 같이, 이미지 시퀀스의 이미지(705)는 심각한 모션 블러 결함을 보인다. 도 8a는 시간적인 또는 공간-시간적 융합을 이용한 이미지들(705, 710, 715, 720)의 융합의 결과적인 출력 이미지(805)를 도시한다. 보이는 바와 같이, 입력 이미지(705)에 존재하는 블러는 또한 최종 출력 이미지(805)에도 존재한다. 이러한 문제를 막기 위하여, 이미지들을 상이한 주파수 대역에서 상이하게 융합하여 일부 입력 이미지들에 존재하는 블러링 영역들을 효과적으로 제거하는 다중 해상도 융합 전략이 사용될 수 있다.FIG. 7 shows four short-time exposed images 705, 710, 715, 720 of an image sequence being processed by an image stabilization operation in accordance with the present disclosure for generating an output image. As can be seen, the image 705 of the image sequence exhibits severe motion blur defects. FIG. 8A shows the resulting output image 805 of the fusion of images 705, 710, 715, 720 using temporal or spatial-temporal fusion. As can be seen, the blur present in the input image 705 also exists in the final output image 805. To prevent this problem, a multi-resolution fusing strategy may be used that effectively blurs the images in different frequency bands to effectively remove the blurring areas present in some input images.

흐릿한 프레임들로 인해 발생한 열화는 이미지 가장자리 및 높은 주파수 텍스처의 이웃에 주로 나타난다. 반대로, 매끈한 이미지 영역(예를 들어, 낮은 공간 주파수 대역)에서, 흐릿한 프레임들의 분포가 노이즈를 감소시키는 데 유용할 수 있다. 일 실시예에서, 다중 해상도 융합 접근법은, 낮은 공간 주파수 콘텐츠를 융합하는 데에는 흐릿한 프레임들을 사용하고, 이미지 가장자리 또는 높은 주파수 텍스쳐를 융합하는 데에는 그것들을 배제함으로써 이러한 이해를 이용한다.Deterioration caused by blurry frames is predominantly at the edges of the image and in the neighborhood of high frequency textures. Conversely, in a smooth image region (e.g., a low spatial frequency band), the distribution of blurred frames can be useful for reducing noise. In one embodiment, the multi-resolution fusion approach utilizes this understanding by using blurred frames to blend low spatial frequency content and excluding them to blend image edges or high frequency textures.

이러한 접근법을 구현하는 한가지 방법은 입력 이미지 시퀀스의 이미지들을 각각 상이한 공간 주파수 대역으로 분해하고 각각을 그러한 주파수 대역별로 융합할 수 있다. 상이한 주파수 대역의 다중 해상도 이미지 분해는 해당 기술분야에서 공지되어 있고 다양한 방식으로 성취될 수 있다. 하이패스 피라미드 분해 알고리즘(high-pass pyramid decomposition)을 이용하는 하나의 절차가 있을 수 있다. 웨이블렛 분해(wavelet decomposition)를 활용하는 다른 접근법이 있을 수 있다. 다른 대안들 또한 가능하다.One approach to implementing this approach is to decompose the images of the input image sequence into different spatial frequency bands and fuse each of them with such frequency bands. Multi-resolution image decomposition of different frequency bands is well known in the art and can be accomplished in a variety of ways. There can be one procedure that uses a high-pass pyramid decomposition algorithm. There may be other approaches that utilize wavelet decomposition. Other alternatives are also possible.

선호하는 실시예에서, 하이패스 분해 알고리즘이 사용될 수 있다. 이 알고리즘은 샘플 밀도(sample density) 및 해상도를 규칙적인 단계로 감소되는 원본 이미지의 사본들의 시퀀스를 생성하여 다수의 중간 계층의 원본 이미지를 생성하는 단계를 포함할 수 있다. 이것을 성취하기 위하여, 이미지는 먼저 로우패스(low-pass) 필터링되고 이어서 미리 결정된 인자에 의해 다운 샘플링되어(down-sampled) 이미지의 다음의 피라미드 계층을 얻을 수 있다. 미리 결정된 인자는 달라질 수 있고 이는 설계 선택의 문제이다. 일 실시예에서, 미리 결정된 인자는 4 개이다. 대안적인 실시예에서, 미리 결정된 인자는 2 개이다. 생성되는 각각의 이미지의 계층의 수 또한 사용되는 디바이스의 요구 및 처리 능력에 따라 달라진다. 일 실시예에서, 계층의 수는 4 개이다. 대안적인 실시예에서, 계층의 수는 3 개이다.In a preferred embodiment, a highpass decomposition algorithm may be used. The algorithm may include generating a sequence of copies of the original image that is reduced to a regular step of the sample density and resolution to produce a plurality of intermediate layer original images. To accomplish this, the image may first be low-pass filtered and then down-sampled by a predetermined factor to obtain the next pyramid layer of the image. The predetermined factor may vary, which is a matter of design choice. In one embodiment, the predetermined factor is four. In an alternative embodiment, the predetermined factor is two. The number of layers of each image generated also depends on the needs and processing capabilities of the devices used. In one embodiment, the number of layers is four. In an alternative embodiment, the number of layers is three.

모든 중간 계층들이 이러한 방식으로 생성되면, 각각의 계층은 이전 계층과 동일한 해상도로 업 샘플링 및 로우패스 필터링되고, 이전 계층에서 결과물을 차감하여 각각의 계층에서 해당 해상도에 대응하는 높은 주파수 대역 성분을 얻을 수 있다. 그러나, 최상위 계층의 높은 주파수 대역은 일반적으로 이러한 방식으로 획득될 수 없다는 것을 주의해야 한다. 생성된 피라미드에서, 각각의 계층은 이전 계층보다 작고, 해당 해상도에서 높은 공간 주파수 대역을 포함한다. 피라미드의 최상위 계층은 저해상도 버전의 원본 이미지와 닮았고 낮은 공간 주파수 대역을 포함한다. 3 개의 계층(905, 910, 915)를 포함하는 예시가 도 9에 도시된다. 최상위 계층 이미지(905)는 낮은 주파수 표현의 이미지이다. 계층들은 상위에서 하위로 가면서 점차적으로 해상도/주파수가 더 높아짐을 보인다. 이런 방식으로, 원본 이미지는 상이한 공간 주파수 대역들로 분해될 수 있고, 이러한 대역들은 개별적으로 서로 융합될 수 있다.When all intermediate layers are generated in this manner, each layer is upsampled and low-pass filtered to the same resolution as the previous layer, and the result is subtracted from the previous layer to obtain a high frequency band component corresponding to the resolution in each layer . It should be noted, however, that the higher frequency band of the highest layer can not generally be obtained in this manner. In the generated pyramid, each layer is smaller than the previous layer and includes a higher spatial frequency band at that resolution. The top layer of the pyramid resembles a low resolution version of the original image and contains a low spatial frequency band. An example that includes three layers 905, 910, 915 is shown in FIG. The highest hierarchical image 905 is an image of a lower frequency representation. The layers show progressively higher resolution / frequency as they go from top to bottom. In this way, the original image can be decomposed into different spatial frequency bands, and these bands can be fused together individually.

다양한 계층이 이미지 시퀀스(기준 이미지 포함)의 이미지 각각에 대하여 생성되면, 도 10에 따라 다중 해상도 융합 동작(1000)이 이미지 시퀀스의 이미지 각각에 대한 모든 계층들을 수신함으로써 시작된다(블록(1005)). 이어서 처리할 최상위 계층 이미지가 선택될 수 있고(블록(1010)) 시간적 융합이 최상위 계층에서 수행될 수 있다. 비-기준 이미지들의 최상위 계층에 있는 각각의 픽셀을 대응하는 기준 이미지의 픽셀들과 비교하고(블록(1015)) 각각의 비-기준 픽셀에 대한 가중 함수를 계산함으로써 수행될 수 있고, 이는 위에서 논의한 바와 같다. 각각의 픽셀에 대하여 가중 함수가 계산된 이후, 픽셀이 고스트인지 결정하기 위하여 가중치가 미리 결정된 임계치와 비교될 수 있다(블록(1025)). 계산된 가중치가 미리 결정된 임계치를 넘는 경우, 비-기준 픽셀은 출력 이미지의 최상위 계층에 있는 대응하는 픽셀에 대한 값을 계산하는 데 사용될 수 있다(블록(1045)).When multiple layers are created for each of the images of the image sequence (including the reference image), the multi-resolution fusion operation 1000 begins by receiving all the layers for each of the images of the image sequence according to FIG. 10 (block 1005) . The highest layer image to be processed may then be selected (block 1010) and temporal fusion may be performed at the highest layer. By comparing each pixel in the uppermost layer of the non-reference images with the pixels of the corresponding reference image (block 1015) and calculating the weighting function for each non-reference pixel, Same as. After the weighting function is calculated for each pixel, the weighting may be compared to a predetermined threshold to determine if the pixel is a ghost (block 1025). If the calculated weight exceeds a predetermined threshold, the non-reference pixel may be used to calculate a value for the corresponding pixel in the top layer of the output image (block 1045).

가중치가 미리 결정된 임계치보다 작은 경우, 동작(1000)은 선택된 픽셀의 이웃에서 공간적 탐색을 수행함으로써 공간-시간적인 기술을 이용하여 더 나은 매칭을 찾을 수 있다(블록(1030)). 이어서 가장 좋은 매칭의 상대적인 위치 또는 좌표는 선택된 픽셀의 해당 필드에 저장될 수 있다(블록(1035)). 해당 필드는 식별된 대응하는 픽셀들의 필드를 지칭한다. 이어서 동작은 최선의 매칭 픽셀에 대한 가중 함수를 계산하고(블록(1040)), 이 값을 이용하여 출력 이미지의 최상위 계층에 있는 대응하는 픽셀에 대한 값을 결정할 수 있다(블록(1045)).If the weights are less than a predetermined threshold, operation 1000 may find a better match using a space-time technique by performing a spatial search in the neighborhood of the selected pixel (block 1030). The relative position or coordinates of the best match may then be stored in the corresponding field of the selected pixel (block 1035). The field refers to the field of corresponding pixels identified. The operation may then calculate a weighting function for the best matching pixel (block 1040) and use this value to determine the value for the corresponding pixel in the top layer of the output image (block 1045).

최상위 계층에 있는 모든 픽셀들에 대한 처리가 완료되면, 동작은 블록(1050)으로 이동하여 처리할 다른 계층이 있는지 결정할 수 있다. 모든 계층이 처리된 경우, 값들은 출력 이미지의 모든 계층에 있는 픽셀들에 이용 가능할 수 있다. 이어서 계층들을 합성하거나 조합하여 최종 출력 이미지를 생성할 수 있다(블록(1060)). 최상위 피라미드 계층에서 시작하여 출력 계층을 스케일링 업(즉, 업 샘플링 및 로우패스 필터링)하고 이어서 그것을 그 다음 출력 계층에 추가함으로써 수행될 수 있다. 이 동작은 모든 계층이 조합되고 출력 이미지가 입력 이미지와 동일한 해상도를 가질 때까지 반복될 수 있다. 블록(1050)에서 다른 계층이 남아있음이 결정되는 경우, 현재의 계층의 발견된 최선의 매칭 각각에 대한 해당 필드가 업데이트될 수 있다(블록(1055)). 이는 그에 따라 각각의 계층이 다운 샘플링되는 미리 결정된 인자를 고려하고 다음 계층의 해상도와 매칭하기 위하여 동일한 인자에 의해 해당 필드의 위치 정보를 스케일링 업 함으로써 수행될 수 있다. 이어서 처리할 그 다음 계층이 선택될 수 있고(블록(1065)) 프로세스가 블록(1015)에서 시작하여 반복된다. 그러나, 이 계층에 대하여, 업데이트된 해당 필드는 각각의 기준 픽셀에 대하여 대응하는 픽셀을 발견하기 위하여 어디를 찾아야하는지에 대한 초기 추정치로서 사용될 수 있다. 단계(1005 - 1055)는 모든 계층이 처리될 때까지 반복될 수 있고 최종 출력 이미지가 블록(1060)에 따라 생성된다.Once the processing for all the pixels in the top layer is complete, the operation may move to block 1050 to determine if there are other layers to process. If all layers are processed, the values may be available to pixels in all layers of the output image. The layers may then be combined or combined to produce a final output image (block 1060). Starting with the top-level pyramid layer and scaling up (i.e., upsampling and low-pass filtering) the output layer and then adding it to the next output layer. This operation can be repeated until all layers are combined and the output image has the same resolution as the input image. If it is determined that another layer remains in block 1050, then the corresponding field for each best match found in the current layer may be updated (block 1055). This can be done by considering the predetermined factor that each layer is downsampled and scaling up the location information of that field by the same factor to match the resolution of the next layer. The next layer to process can then be selected (block 1065) and the process begins at block 1015 and repeats. However, for this layer, the updated corresponding field may be used as an initial estimate of where to find the corresponding pixel for each reference pixel. Steps 1005-105 may be repeated until all layers have been processed and a final output image is generated according to block 1060. [

이런 방식으로, 동작(1000)은 모든 피라미드 계층에서 융합을 수행하는데, 최상위 계층(낮은-주파수 대역)에서 시작하고 가장 높은 해상도 계층에서 종료한다. 모든 계층에서, 대응하는 비-기준 픽셀들과 기준 픽셀들 간의 유사성이 고스트 결함을 방지하기 위하여 사용될 수 있다. 피라미드의 최상위 계층에서, 대응하는 픽셀들은 동일한 공간 좌표를 갖는다는 가정이 있을 수 있다. 그러나, 장면에서 움직이는 물체들로 인해, 대응하는 픽셀들이 그 다음 계층에서 동일한 공간 좌표를 갖지 않기 때문에, 모든 비-기준 계층에 대하여 각각의 비-기준 픽셀과 그것의 대응하는 기준 픽셀 사이의 공간적 변위를 저장하는 해당 필드가 결정될 수 있다. 가중치가 특정 임계치보다 낮은 경우 비-기준 픽셀이 고스트일 가능성이 있다고 결정된다. 그러한 경우에, 그것의 공간적인 위치 주위의 국지적인 탐색을 수행하여 기준 픽셀과의 더 나은 매칭을 찾도록 할 수 있다. 이어서 발견된 최선의 매칭은 그것에 연관된 가중치를 이용하여 융합에 사용될 수 있고, 기준 픽셀 좌표에 대한 발견된 최선의 매칭의 공간적 변위는 유지될 수 있다. 통상적으로, 대부분의 장면이 정적이기 때문에 그와 같은 탐색은 작은 비율의 픽셀들에만 필요하다. 프로세스는 필요할 때만 탐색을 수행하기 때문에, 이것은 효율성을 높인다. 탐색은 위에서 논의한 바와 같이 블록 매칭에 의해 수행될 수 있다. 이러한 단계들을 수행함으로써, 접근법(1000)은 논의된 융합 기술 3가지를 모두(시간적인, 공간-시간적인, 그리고 다중 해상도) 활용하여 노이즈 및 블러를 현저하게 감소 또는 제거할 수 있는 효율적 융합 동작을 수행하여 고품질 최종 출력 이미지를 생성한다.In this way, operation 1000 performs fusion at all pyramid layers, starting at the highest layer (low-frequency band) and ending at the highest resolution layer. At all layers, similarity between corresponding non-reference pixels and reference pixels can be used to prevent ghosting defects. At the top level of the pyramid, there may be an assumption that corresponding pixels have the same spatial coordinates. However, because of the moving objects in the scene, the spatial displacement between each non-reference pixel and its corresponding reference pixel for all non-reference layers, because the corresponding pixels do not have the same spatial coordinates in the next layer May be determined. It is determined that the non-reference pixel is likely to be a ghost if the weight is below a certain threshold. In such a case, a local search around its spatial location may be performed to find a better match with the reference pixel. The best match found subsequently can then be used for fusion using the weight associated with it and the spatial displacement of the best match found for the reference pixel coordinates can be maintained. Typically, such a search is only needed for a small percentage of pixels because most scenes are static. This increases efficiency because the process performs the search only when it is needed. The search may be performed by block matching as discussed above. By performing these steps, the approach 1000 utilizes all three fusion techniques discussed (temporal, spatial-temporal, and multi-resolution) to achieve an efficient convergence operation that significantly reduces or eliminates noise and blur To produce a high quality final output image.

대안적으로, 다중 해상도 융합 접근법은 각각의 계층에서 시간적 융합만을 이용함으로써 수행될 수 있다. 모든 이미지들에서 동일한 공간 좌표를 갖는 이러한 픽셀들만을 이용하여, 각각의 피라미드 계층에서 융합함으로써 수행될 수 있다. 따라서, 각각의 비-기준 계층의 단일 픽셀이 (대응하는 기준 이미지 계층에서) 동일한 공간 좌표를 갖는 대응하는 기준 픽셀과 융합될 수 있다.Alternatively, the multi-resolution fusion approach may be performed by using only temporal fusion at each layer. Can be performed by fusing in each pyramid layer, using only those pixels with the same spatial coordinates in all images. Thus, a single pixel of each non-reference hierarchy can be fused with a corresponding reference pixel having the same spatial coordinates (at the corresponding reference image hierarchy).

대안적인 실시예에서, 공간-시간적 융합의 이점들을 얻기 위하여, 다중 해상도 융합 절차는 각각의 피라미드 계층에서 공간-시간적 융합을 활용할 수 있다. 일 실시예에서, 모든 피라미드 계층마다, 기준 픽셀을 각각의 비-기준 계층의 더 많은 픽셀들과 융합함으로써 수행될 수 있다. 기준 픽셀들과 융합되는 비-기준 픽셀들의 공간 좌표는 기준 픽셀 좌표 둘레의 특정한 이웃에 있을 수 있다.In an alternative embodiment, in order to achieve the advantages of spatial-temporal fusion, a multi-resolution fusion procedure may utilize spatial-temporal fusion at each pyramid layer. In one embodiment, for every pyramid layer, it may be performed by fusing a reference pixel with more pixels of each non-reference layer. The spatial coordinates of the non-reference pixels fused with the reference pixels may be in a particular neighborhood around the reference pixel coordinates.

다른 실시예는 모션 필드를 이용하여 융합을 수행할 수 있다. 피라미드의 최상위 계층에서 시작하여 각각의 비-기준 이미지에 대한 모션 필드를 추정함으로써 성취될 수 있다. 피라미드의 모든 계층에서, 모션 필드는 가장 유사한 기준 픽셀과 비-기준 픽셀을 융합되도록 연관시킬 수 있다. 이어서 모든 기준 픽셀은 모든 비-기준 계층의 단일 픽셀과 융합될 수 있지만, 그것들의 공간 좌표는 계층의 모션 필드에 따라 상이할 수 있다.Other embodiments may perform fusion using a motion field. Starting from the highest layer of the pyramid and estimating the motion field for each non-reference image. In all layers of the pyramid, the motion field can associate the most similar reference pixel with the non-reference pixel for fusion. All reference pixels may then be fused with a single pixel of all non-reference layers, but their spatial coordinates may differ depending on the motion field of the layer.

기준 픽셀이 임의의 비-기준 계층의 둘 이상의 픽셀과 융합되는 또 다른 실시예가 활용될 수 있다. 비-기준 픽셀들의 공간 좌표는 모션 필드에 의해 제안된 공간 좌표 둘레의 특정한 이웃에 있을 수 있다.Another embodiment may be utilized in which the reference pixel is fused with two or more pixels of any non-reference hierarchy. The spatial coordinates of the non-reference pixels may be in a particular neighborhood around the spatial coordinates proposed by the motion field.

본 명세서에서 사용된 바와 같이, 용어 "카메라"는 디지털 이미지 캡쳐 기능을 포함 또는 내장하는 임의의 전자 디바이스를 지칭한다. 예를 들어, 이것은 단독형 카메라(예를 들어, 디지털 SLR 카메라 및 '포인트 앤드 클릭(point-and-click)' 카메라)뿐만 아니라 카메라 기능이 내장된 다른 전자 디바이스를 포함한다. 후자 유형의 예에는, 이동 전화기, 태블릿 및 노트북 컴퓨터 시스템, 및 디지털 미디어 플레이어 디바이스가 포함되지만, 이로 제한되지 않는다.As used herein, the term "camera" refers to any electronic device that includes or embeds digital image capture functionality. For example, it includes standalone cameras ( e.g. , digital SLR cameras and 'point-and-click' cameras) as well as other electronic devices with built-in camera functionality. Examples of the latter type include, but are not limited to, mobile telephones, tablets and notebook computer systems, and digital media player devices.

도 11을 참조하여, 예시적인 전자 디바이스(1100)의 단순화된 기능 블록 다이어그램이 일 실시예에 따라 도시된다. 전자 디바이스(1100)는 프로세서(1105), 디스플레이(1110), 사용자 인터페이스(1115), 그래픽 하드웨어(1120), 디바이스 센서(1125)(예를 들어, 근접 센서/주변광 센서, 가속도계 및/또는 자이로스코프), 마이크로폰(1130), 오디오 코덱(들)(1135), 스피커(들)(1140), 통신 회로(1145), 디지털 이미지 캡쳐 유닛(1150), 비디오 코덱(들)(1155), 메모리(1160), 저장장치(1165), 및 통신 버스(1170)를 포함할 수 있다. 예를 들어, 전자 디바이스(1100)는 디지털 카메라, 개인용 디지털 어시스턴트(PDA), 개인용 음악 플레이어, 이동 전화기, 서버, 노트북, 랩톱, 데스크톱, 또는 태블릿 컴퓨터일 수 있다. 더 구체적으로, 개시된 기술들은 디바이스(1100)의 일부 또는 모든 컴포넌트들을 포함하는 디바이스 상에서 실행될 수 있다.Referring to Fig. 11, a simplified functional block diagram of an exemplary electronic device 1100 is shown in accordance with one embodiment. The electronic device 1100 includes a processor 1105, a display 1110, a user interface 1115, graphics hardware 1120, a device sensor 1125 (e.g., a proximity sensor / ambient light sensor, an accelerometer, and / 1135, a communication circuit 1145, a digital image capture unit 1150, a video codec (s) 1155, a memory (memory) 1130, 1160, a storage device 1165, and a communication bus 1170. For example, the electronic device 1100 can be a digital camera, a personal digital assistant (PDA), a personal music player, a mobile phone, a server, a notebook, a laptop, a desktop, or a tablet computer. More specifically, the disclosed techniques may be implemented on a device that includes some or all of the components of device 1100.

프로세서(1105)는 디바이스(1100)에 의해 수행되는 많은 기능들의 동작을 수행 또는 제어하는 데 필요한 명령어들을 실행할 수 있다. 예를 들어, 프로세서(1105)는 디스플레이(1110)를 구동하고 사용자 인터페이스(1115)로부터 사용자 입력을 수신할 수 있다. 사용자 인터페이스(1115)는 다양한 형태, 예컨대 버튼, 키패드, 다이얼, 클릭 휠, 키보드, 디스플레이 스크린, 터치 스크린, 또는 이들의 조합을 가질 수 있다. 프로세서(1105)는 또한, 예를 들어, 모바일 디바이스에 제공되는 것과 같은 시스템-온-칩일 수 있고 전용 그래픽 프로세싱 유닛(GPU)을 포함할 수 있다. 프로세서(1105)는 감소된 명령어-세트 컴퓨터(reduced instruction-set computer, RISC) 또는 복합 명령어-세트-컴퓨터(complex instruction-set computer, CISC) 아키텍처들 또는 임의의 다른 적합한 아키텍처에 기초할 수 있으며, 하나 이상의 프로세싱 코어들을 포함할 수 있다. 그래픽 하드웨어(1120)는 그래픽을 처리하기 위한 특수 목적의 연산 하드웨어일 수 있고/있거나 보조 프로세서(1105)는 그래픽 정보를 프로세싱한다. 일 실시예에서, 그래픽스 하드웨어(1120)는 프로그래밍 가능한 그래픽스 프로세싱 유닛(GPU)을 포함할 수 있다.The processor 1105 may execute the instructions necessary to perform or control the operation of many of the functions performed by the device 1100. [ For example, the processor 1105 may drive the display 1110 and receive user input from the user interface 1115. The user interface 1115 may have various forms, such as buttons, keypad, dial, click wheel, keyboard, display screen, touch screen, or a combination thereof. The processor 1105 may also be a system-on-chip such as, for example, provided on a mobile device and may include a dedicated graphics processing unit (GPU). The processor 1105 may be based on a reduced instruction-set computer (RISC) or complex instruction-set computer (CISC) architectures or any other suitable architecture, And may include one or more processing cores. Graphics hardware 1120 may be special purpose computing hardware for processing graphics and / or auxiliary processor 1105 processes graphics information. In one embodiment, the graphics hardware 1120 may include a programmable graphics processing unit (GPU).

센서 및 카메라 회로(1150)는 적어도 부분적으로, 개시된 기술에 따라, 비디오 코덱(들)(1155) 및/또는 프로세서(1105) 및/또는 그래픽스 하드웨어(1120), 및/또는 회로(1150) 내에 포함된 전용 이미지 프로세싱 유닛에 의해 처리될 수 있는 스틸 및 비디오 이미지들을 캡쳐할 수 있다. 그렇게 캡처된 이미지들은 메모리(1160) 및/또는 저장 장치(1165)에 저장될 수 있다. 메모리(1160)는 디바이스 기능들을 수행하기 위해 프로세서(1105) 및 그래픽스 하드웨어(1120)에 의해 사용된 하나 이상의 상이한 유형들의 미디어를 포함할 수 있다. 예를 들어, 메모리(1160)는 메모리 캐시, 판독 전용 메모리(ROM) 및/또는 랜덤 액세스 메모리(RAM)를 포함할 수 있다. 저장 장치(1165)는 미디어(예컨대, 오디오, 이미지 및 비디오 파일들), 컴퓨터 프로그램 명령어들 또는 소프트웨어, 선호도 정보, 디바이스 프로파일 정보, 및 임의의 다른 적합한 데이터를 저장할 수 있다. 저장 장치(1165)는 예를 들어 자기 디스크(고정형, 플로피, 및 이동형) 및 테이프, CD-ROM 및 DVD(digital video disk)와 같은 광학 미디어, 및 EPROM(Electrically Programmable Read-Only Memory) 및 EEPROM(Electrically Erasable Programmable Read-Only Memory)과 같은 반도체 메모리 디바이스들을 포함하는, 하나 이상의 비일시적 저장 매체들을 포함할 수 있다. 메모리(1160) 및 저장 장치(1165)는 하나 이상의 모듈들로 조직화되고 임의의 원하는 컴퓨터 프로그래밍 언어로 기록되는 컴퓨터 프로그램 명령어들 또는 코드를 유형적으로 보유하기 위해 사용될 수 있다. 예를 들어, 프로세서(1105)에 의해 실행될 때, 그러한 컴퓨터 프로그램 코드는 본 명세서에 설명된 동작들 중 하나 이상을 구현할 수 있다.The sensor and camera circuitry 1150 may be included at least in part within the video codec (s) 1155 and / or the processor 1105 and / or the graphics hardware 1120 and / or the circuitry 1150, Can capture still and video images that can be processed by a dedicated dedicated image processing unit. The images thus captured may be stored in memory 1160 and / or storage device 1165. Memory 1160 may include one or more different types of media used by processor 1105 and graphics hardware 1120 to perform device functions. For example, memory 1160 may include a memory cache, read only memory (ROM), and / or random access memory (RAM). Storage device 1165 may store media (e.g., audio, image and video files), computer program instructions or software, preference information, device profile information, and any other suitable data. The storage device 1165 can be any type of storage device such as, for example, magnetic disks (fixed, floppy, and removable) and optical media such as tape, CD-ROM, and digital video disk (DVD), and Electrically Programmable Read-Only Memory (EPROM) Electrically erasable programmable read-only memory (EEPROM), and the like. Memory 1160 and storage device 1165 may be used to tangentially store computer program instructions or code organized in one or more modules and written in any desired computer programming language. For example, when executed by processor 1105, such computer program code may implement one or more of the operations described herein.

위의 상세한 설명이 의도적인 예시이며, 제한하려는 것이 아님이 이해될 것이다. 해당 기술분야의 통상의 기술자들이 청구된 발명의 주제를 본 명세서에서 기술한 바와 같이 만들고 사용하도록 할 수 있는 내용들이 소개되었고, 특정 실시예들의 맥락에서 제공되지만, 그것들의 변형이 통상의 기술자들에게 쉽게 이해될 것이다(예를 들어, 개시된 실시예들 중 일부는 서로 조합하여 사용될 수 있다). 예를 들어, 도 1 내지 도 11은 가공되지 않은 또는 처리되지 않은 이미지들을 처리하는 맥락에서 설명되지만, 이는 반드시 필요한 것은 아니다. 본 개시내용에 따른 이미지 안정화 동작이 처리된 버전의 캡쳐된 이미지들(예를 들어 에지 맵(edge-map)들) 또는 하위 샘플링된 버전의 캡쳐된 이미지들에 적용될 수 있다. 또한, 기술된 동작들 중 일부는 본 명세서에 나타난 다른 단계들과 상이한 순서에 따라, 또는 그것들과 함께 수행되는 개별적인 단계들을 가질 수 있다. 더 일반적으로, 하드웨어 지원이 있는 경우, 도 1 내지 도 11과 함께 기술되는 일부 동작들은 병렬로 수행될 수 있다.It is to be understood that the above detailed description is intended to be illustrative, not limiting. Those skilled in the art will be able to make and use the claimed subject matter as described herein and are provided in the context of specific embodiments thereof, (For example, some of the disclosed embodiments may be used in combination with one another). For example, Figures 1-11 are described in the context of processing unprocessed or unprocessed images, but this is not necessary. An image stabilization operation in accordance with the present disclosure may be applied to the processed versions of the captured images ( e.g., edge-maps) or subsampled versions of the captured images. In addition, some of the operations described may have separate steps performed in accordance with or in a different order than the other steps presented herein. More generally, if there is hardware support, some of the operations described in conjunction with FIGS. 1-11 may be performed in parallel.

많은 다른 실시예들이 상기 설명을 검토할 때 통상의 기술자들에게 명백할 것이다. 그러므로, 본 발명의 범주는 첨부된 특허청구범위를 참조하여, 이러한 청구범위의 권리를 갖는 등가물들의 전체 범주에 따라 결정되어야 한다. 첨부된 청구범위에서, 용어들 "포함하는(including)" 및 "여기서(in which)"는 각자의 용어들 "포함하는(comprising)" 및 "여기서(wherein)"의 평이한 영어 등가물들로서 사용된다.Many other embodiments will be apparent to those of ordinary skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms "including" and "in which" are used as the plain English equivalents of their respective terms "comprising" and "wherein".

Claims

A non-transitory program storage device comprising instructions readable and stored by a programmable control device, the instructions causing the programmable control device to:
Acquiring a first image of the initially captured scene, the first image having a plurality of pixels;
Performing a multiple resolution decomposition of the first image to produce a first multi-layer pyramid representation of the first image;
Acquiring a second image of the scene, the second image being captured at a different time than the first image, and each of the plurality of pixels of the first image having a corresponding pixel in the second image;
Performing a multiple resolution decomposition of the second image to produce a second multi-layer pyramid representation of the second image;
Layer pyramid representation of the scene for each layer of the first and second multi-layer pyramid representations of the scene in accordance with instructions, the instructions causing the programmable control device to:
Identifying a group of corresponding pixels in the second multi-layer pyramid representation of the scene for a group of pixels in the hierarchy of the first multi-layer pyramid representation of the scene,
To fuse a group of the identified pixels of the first and second multi-layer pyramid representations of the scene;
Repeating the instructions to cause the programmable control device to generate a layer of the output multi-layer pyramid representation of the scene for each layer of the multi-resolution decomposition of the first image;
Combine the output multilayer pyramid representations of the scene to produce a single output image representative of the scene;
Store the output image in a memory,
Wherein the instructions cause the programmable control device to fuse a group of the identified pixels of the first and second multi-layer pyramid representations of the scene to cause the programmable control device to select one of the first multi- Wherein the spatial coordinates of the two or more pixels are such that the spatial coordinates of the pixels of the second multi-layer pyramid representation identified corresponding to the one pixel In a predetermined neighborhood of the non-temporary program storage device.

The method of claim 1, wherein the instructions cause the programmable control device to fuse a group of the identified pixels of the first and second multi-layer pyramid representations of the scene to cause the programmable control device to:
Determining a weight value associated with each group of pixels in the layer of the first multi-layer pyramid representation of the scene using the corresponding group of pixels in the layer of the second multi-layer pyramid representation of the scene ;
If the determined weight value is greater than a specified threshold, fuses a group of the identified pixels of the first and second multi-layer pyramid representations of the scene, and if the determined weight value is less than or equal to the specified threshold, 1 and a group of the identified pixels of the second multi-layer pyramid representation.

3. The method of claim 2, wherein the instructions for causing the programmable control device to determine a weight value cause the programmable control device to:
Compare the similarity between each pixel of the group of pixels in the layer of the first multi-layer pyramid representation of the scene and the corresponding pixel of the corresponding group of pixels of the second multi-layer pyramid representation of the scene ;
Obtain a pixel similarity value based on the comparison;
Obtaining an expected noise content for each pixel of the group of pixels in the layer of the first multi-layer pyramid representation of the scene;
And calculate the weight value based on the pixel similarity value and the expected noise content.

3. The apparatus of claim 2, wherein the instructions for causing the programmable control device to determine a weight value further cause the programmable control device to:
Perform a spatial search of the layer of the second multi-layer pyramid representation of the scene to search for a group of pixels corresponding to a better corresponding value if the weight value is less than or equal to the specified threshold;
Determine a weight value for the group of the better corresponding pixels;
And to cause a group of the identified pixels of the first multi-layer pyramid representation to fuse with the group of the corresponding corresponding pixels of the second multi-layer pyramid representation.

The method of claim 1, wherein the instructions cause the programmable control device to fuse a group of the identified pixels of the first and second multi-layer pyramid representations of the scene to cause the programmable control device to:
Estimate a motion field for each of a second multi-layer pyramid representation of the second image, the motion field associating pixels of the first image with pixels of the second image;
And to cause each pixel of the first multi-layer pyramid representation to fuse with a selected pixel in the motion field.

delete

2. The method of claim 1, wherein the instructions for generating a first multi-layer pyramid representation of the first image cause the programmable control device to generate a high-pass pyramid decomposition of the first image &Lt; / RTI > instructions, and instructions.

2. The method of claim 1, wherein the instructions for generating a first multi-layer pyramid representation of the first image comprise instructions for causing the programmable control device to generate a wavelet decomposition of the first image. Non-transient program storage device.

As a system,
Image capture device;
Memory; And
One or more programmable control devices
The one or more programmable control devices interacting with the image capture device and the memory,
Acquiring a first image of the initially captured scene, the first image having a plurality of pixels;
Performing a multiple resolution decomposition of the first image to produce a first multi-layer pyramid representation of the first image;
Acquiring a second image of the scene, the second image being captured at a different time than the first image, and each of the plurality of pixels of the first image having a corresponding pixel in the second image;
Performing a multiple resolution decomposition of the second image to produce a second multi-layer pyramid representation of the second image;
Layer pyramid representation of the scene for each layer of the first and second multi-layer pyramid representations of the scene in accordance with instructions, the instructions causing the programmable control device to:
Identifying a group of corresponding pixels in the second multi-layer pyramid representation of the scene for a group of pixels in the hierarchy of the first multi-layer pyramid representation of the scene,
And fusing a group of the identified pixels of the first and second multi-layer pyramid representations of the scene;
Combine the output multilayer pyramid representations of the scene to produce a single output image representative of the scene;
And storing the output image in a memory,
Wherein fusing the group of identified pixels of the first and second multi-layer pyramid representations of the scene fuses one pixel of the first multi-layer pyramid representation with two or more pixels of the second multi-layer pyramid representation Wherein the spatial coordinates of the two or more pixels are in a predetermined neighborhood of a pixel of the second multi-layer pyramid representation identified corresponding to the one pixel.

10. The method of claim 9, wherein fusing the group of identified pixels of the first and second multi-layer pyramid representations of the scene comprises:
Determining a weight value associated with each group of pixels in the layer of the first multi-layer pyramid representation of the scene using the corresponding group of pixels in the layer of the second multi-layer pyramid representation of the scene ;
If the determined weight value is greater than a specified threshold, fuses a group of the identified pixels of the first and second multi-layer pyramid representations of the scene, and if the determined weight value is less than or equal to the specified threshold, 1 and a group of the identified pixels of the second multi-layer pyramid representation.

11. The method of claim 10, wherein determining the weight value comprises:
Compare the similarity between each pixel of the group of pixels in the layer of the first multi-layer pyramid representation of the scene and the corresponding pixel of the corresponding group of pixels of the second multi-layer pyramid representation of the scene ;
Obtain a pixel similarity value based on the comparison;
Obtain an expected noise content for the group of pixels in the layer of the first multi-layer pyramid representation of the scene;
And calculating the weight value based on the pixel similarity value and the expected noise content.

11. The method of claim 10, wherein determining the weight value further comprises:
Perform a spatial search of the layer of the second multi-layer pyramid representation of the scene to search for a group of pixels corresponding to a better corresponding value if the weight value is less than or equal to the specified threshold;
Determine a weight value for the group of the better corresponding pixels;
And fusing the group of identified pixels of the first multi-layer pyramid representation with the group of the corresponding corresponding pixels of the second multi-layer pyramid representation.

10. The method of claim 9, wherein fusing the group of identified pixels of the first and second multi-layer pyramid representations of the scene comprises:
Estimate a motion field for each of a second multi-layer pyramid representation of the second image, the motion field associating pixels of the first image with pixels of the second image;
And fusing each pixel of the first multi-layer pyramid representation with a selected pixel in the motion field.

delete

As a method,
Acquiring a first image of an initially captured scene, the first image having a plurality of pixels;
Performing a multiple resolution decomposition of the first image to produce a first multi-layer pyramid representation of the first image;
Obtaining a second image of the scene, wherein the second image is captured at a different time than the first image, and wherein each of the plurality of pixels of the first image has a corresponding pixel in the second image;
Performing multiple resolution decomposition of the second image to generate a second multi-layer pyramid representation of the second image;
Generating a hierarchy of output multilayer pyramid representations of the scene for each layer of the first and second multilayer pyramid representations of the scene,
Identifying a group of corresponding pixels in the second multi-layer pyramid representation of the scene for a group of pixels in the hierarchy of the first multi-layer pyramid representation of the scene; And
Fusing the identified group of pixels of the first and second multi-layer pyramid representations of the scene;
Combining the output multilayer pyramid representations of the scene to produce a single output image representative of the scene; And
Storing the output image in a memory,
Wherein merging a group of the identified pixels of the first and second multi-layer pyramid representations of the scene comprises merging one pixel of the first multi-hierarchical pyramid representations with two or more pixels of the second multi- Wherein the spatial coordinates of the at least two pixels are in a predetermined neighborhood of a pixel of the second multi-layer pyramid representation correspondingly identified for the one pixel.

16. The method of claim 15, wherein fusing the group of identified pixels of the first and second multi-layer pyramid representations of the scene comprises:
Determining a weight value associated with each group of pixels in the layer of the first multi-layer pyramid representation of the scene using the corresponding group of pixels in the layer of the second multi-layer pyramid representation of the scene step; And
If the determined weight value is greater than a specified threshold, fuses a group of the identified pixels of the first and second multi-layer pyramid representations of the scene, and if the determined weight value is less than or equal to the specified threshold, 1 and a group of the identified pixels of the second multi-layer pyramid representation.

17. The method of claim 16, wherein determining a weight value comprises:
Comparing the similarity between each pixel of the group of pixels in the layer of the first multi-layer pyramid representation of the scene and the corresponding pixel of the corresponding group of pixels of the second multi-layer pyramid representation of the scene step;
Obtaining a pixel similarity value based on the comparison;
Obtaining an expected noise content for the group of pixels in the layer of the first multi-layer pyramid representation of the scene; And
Calculating the weight value based on the pixel similarity value and the expected noise content.

17. The method of claim 16, wherein determining the weight value further comprises:
Performing a spatial search of the layer of the second multi-layer pyramid representation of the scene to find a group of pixels corresponding to a better corresponding value if the weight value is less than or equal to the specified threshold;
Determining a weight value for the group of the corresponding corresponding pixels; And
And fusing the group of identified pixels of the first multi-layer pyramid representation with the group of the corresponding corresponding pixels of the second multi-layer pyramid representation.

16. The method of claim 15, wherein fusing the group of identified pixels of the first and second multi-layer pyramid representations of the scene comprises:
Estimating a motion field for each of a second multi-layer pyramid representation of the second image, the motion field associating pixels of the first image with pixels of the second image; And
Fusing each pixel of the first multi-layer pyramid representation with a selected pixel in the motion field.

delete