TW202334902A

TW202334902A - Systems and methods for image reprojection

Info

Publication number: TW202334902A
Application number: TW111149249A
Authority: TW
Inventors: 皮亞佐貝爾; 尤瓦史華茲; 拓亞迪克; 雅札克馬茲西諾; 羅伊哈同; 麥耳楚爾; 羅恩蓋茲曼; 耶胡達巴斯特納克
Original assignee: 美商高通公司
Priority date: 2021-12-31
Filing date: 2022-12-21
Publication date: 2023-09-01
Also published as: WO2023129855A1

Abstract

An imaging system receives depth data (corresponding to an environment) from a depth sensor and first image data (a depiction of the environment) from an image sensor. The imaging system generates, based on the depth data, first motion vectors corresponding to a change in perspective of the depiction of the environment in the first image data. The imaging system generates, using grid inversion based on the first motion vectors, second motion vectors that indicate respective distances moved by respective pixels of the depiction of the environment in the first image data for the change in perspective. The imaging system generates second image data by modifying the first image data according to the second motion vectors. The second image data includes a second depiction of the environment from a different perspective than the first image data. Some image reprojection applications (e.g., frame interpolation) can be performed without the depth data.

Description

Systems and methods for image reprojection

本案係關於影像處理。更具體地，本案係關於例如使用網格反演重投影從第一視角擷取的第一圖像以產生表現為從第二視角擷取的第二圖像的系統和方法。This case is about image processing. More specifically, the present case relates to systems and methods that reproject a first image captured from a first perspective using, for example, grid inversion to produce a second image that appears to be captured from a second perspective.

相機是使用圖像感測器接收光並擷取圖像圖框（諸如靜止圖像或視訊圖框）的設備。相機從與相機的視野相對應的視角擷取圖示環境的圖像。A camera is a device that uses an image sensor to receive light and capture an image frame, such as a still image or video frame. The camera captures an image of the illustrated environment from a perspective corresponding to the camera's field of view.

擴展現實（XR）設備是例如藉由頭戴式顯示器（HMD）或行動手持終端向使用者顯示環境的設備。環境至少部分地不同於使用者所在的真實世界環境。使用者大體可以例如藉由傾斜或移動HMD或其它設備而互動式地改變他們對環境的觀察。虛擬實境（VR）、增強現實（AR）和混合現實（MR）是XR的示例。XR設備可以包括從環境中擷取資訊的感測器。Extended reality (XR) devices are devices that display the environment to the user, such as through a head-mounted display (HMD) or a mobile handheld terminal. The environment is at least partially different from the user's real-world environment. Users can generally interactively change their view of the environment, such as by tilting or moving the HMD or other device. Virtual reality (VR), augmented reality (AR), and mixed reality (MR) are examples of XR. XR devices can include sensors that capture information from the environment.

在一些示例中，描述了用於影像處理的系統和技術。在一些示例中，成像系統接收深度資料（對應於環境）。成像系統接收由圖像感測器擷取的第一圖像資料（包括對環境的圖示）。成像系統基於深度資料產生與第一圖像資料中對環境的圖示的視角變化相對應的第一運動向量。成像系統基於第一複數個運動向量使用網格反演產生第二運動向量，該第二運動向量指示第一圖像資料中的對環境的圖示的相應圖元針對視角變化移動的相應距離。成像系統藉由根據第一運動向量及/或第二運動向量修改第一圖像資料來產生第二圖像資料。第二圖像資料包括從與第一圖像資料不同的視角看的環境的第二圖示。該成像系統輸出該第二圖像資料。一些圖像重投影應用（例如，圖框插補）可以在沒有深度資料的情況下執行。In some examples, systems and techniques for image processing are described. In some examples, the imaging system receives depth information (corresponding to the environment). The imaging system receives first image data (including a representation of the environment) captured by the image sensor. The imaging system generates a first motion vector corresponding to a change in perspective of the representation of the environment in the first image data based on the depth information. The imaging system uses grid inversion based on the first plurality of motion vectors to generate a second motion vector indicating a corresponding distance that a corresponding primitive of the representation of the environment in the first image material moves in response to a change in viewing angle. The imaging system generates second image data by modifying the first image data based on the first motion vector and/or the second motion vector. The second image material includes a second illustration of the environment from a different perspective than the first image material. The imaging system outputs the second image data. Some image reprojection applications (e.g., frame interpolation) can be performed without depth information.

在一個示例中，提供了一種影像處理裝置。該裝置包括記憶體和耦合到該記憶體的一或多個處理器（例如，在電路中實施）。該一或多個處理器被配置為並且可以：接收包括與環境相對應的深度資訊的深度資料；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料包括對該環境的圖示；基於至少該深度資料，產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量；基於該第一複數個運動向量使用網格反演產生第二複數個運動向量，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及輸出該第二圖像資料。In one example, an image processing device is provided. The apparatus includes memory and one or more processors coupled to the memory (eg, implemented in circuitry). The one or more processors are configured to and may: receive depth data including depth information corresponding to the environment; receive first image data captured by the image sensor, the first image data including an illustration of the environment; based on at least the depth data, generating a first plurality of motion vectors corresponding to changes in perspective of the illustration of the environment in the first image data; based on the first plurality of motion vectors using Grid inversion generates a second plurality of motion vectors indicating corresponding distances that corresponding primitives of the representation of the environment in the first image material move in response to the change in viewing angle; at least in part Second image data is generated by modifying the first image data based on the second plurality of motion vectors, wherein the second image data includes a second view of the environment from a different perspective than the first image data. icon; and output the second image data.

在另一個示例中，提供了一種影像處理方法。該方法包括：接收包括與環境相對應的深度資訊的深度資料；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料包括對該環境的圖示；基於至少該深度資料，產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量；基於該第一複數個運動向量使用網格反演產生第二複數個運動向量，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及輸出該第二圖像資料。In another example, an image processing method is provided. The method includes: receiving depth data including depth information corresponding to the environment; receiving first image data captured by an image sensor, the first image data including an illustration of the environment; based on at least the Depth data, generating a first plurality of motion vectors corresponding to changes in perspective of the illustration of the environment in the first image data; using grid inversion to generate a second plurality of motion vectors based on the first plurality of motion vectors Motion vectors, the second plurality of motion vectors indicating corresponding distances in which corresponding primitives of the representation of the environment in the first image data move in response to the change in viewing angle; at least in part by moving according to the second plurality of motions vector modifying the first image data to generate second image data, wherein the second image data includes a second illustration of the environment from a different perspective than the first image data; and outputting the second image Like data.

在另一個示例中，提供了一種其上儲存有指令的非暫時性電腦可讀取媒體，該指令在由一或多個處理器執行時使該一或多個處理器：接收包括與環境相對應的深度資訊的深度資料；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料包括對該環境的圖示；基於至少該深度資料，產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量；基於該第一複數個運動向量使用網格反演產生第二複數個運動向量，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及輸出該第二圖像資料。In another example, a non-transitory computer-readable medium is provided having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to: receive information including communicating with an environment. Depth data corresponding to depth information; receiving first image data captured by an image sensor, the first image data including an illustration of the environment; based on at least the depth data, generating the first image data A first plurality of motion vectors corresponding to changes in the viewing angle of the illustration of the environment in the image data; using grid inversion to generate a second plurality of motion vectors based on the first plurality of motion vectors, the second plurality of motion vectors A vector indicating a corresponding distance that a corresponding primitive of the representation of the environment in the first image data moves in response to the change in perspective; at least in part by modifying the first image data based on the second plurality of motion vectors. generating second image data, wherein the second image data includes a second representation of the environment from a different perspective than the first image data; and outputting the second image data.

在另一個示例中，提供了一種影像處理裝置。該裝置包括：用於接收由圖像感測器擷取的第一圖像資料的構件，該第一圖像資料包括對該環境的圖示；用於基於至少該深度資料產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量的構件；用於基於該第一複數個運動向量使用網格反演產生第二複數個運動向量的構件，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；用於至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料的構件，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及用於輸出該第二圖像資料的構件。In another example, an image processing device is provided. The device includes: means for receiving first image data captured by an image sensor, the first image data including an illustration of the environment; and means for generating an image corresponding to the first image data based on at least the depth data. means for a first plurality of motion vectors corresponding to a change in viewing angle of the illustration of the environment in the image data; means for generating a second plurality of motion vectors based on the first plurality of motion vectors using grid inversion , the second plurality of motion vectors indicates the corresponding distance that the corresponding graphic element of the illustration of the environment in the first image data moves in response to the change of viewing angle; for at least partially by moving according to the second plurality of motions means for vector modifying the first image data to generate second image data, wherein the second image data includes a second representation of the environment from a different perspective than the first image data; and for outputting The component of the second image data.

在一些態樣中，該第二圖像資料包括被配置為圖示第一時間與第三時間之間的第二時間處的環境的插補圖像，其中該第一圖像資料包括圖示至少該第一時間或該第三時間中的一個時間處的環境的至少一個圖像。In some aspects, the second image material includes an interpolated image configured to illustrate the environment at a second time between the first time and a third time, wherein the first image material includes an illustration At least one image of the environment at at least one of the first time or the third time.

在一些態樣中，該第一圖像資料包括包括視差移動的複數個視訊資料圖框，其中該第二圖像資料包括該複數個視訊資料圖框的減少該視差移動的穩定變體。In some aspects, the first image data includes a plurality of video data frames including parallax motion, wherein the second image data includes a stable variant of the plurality of video data frames that reduces the parallax motion.

在一些態樣中，該第一圖像資料包括人員從第一角度觀看該圖像感測器，其中該第二圖像資料包括該人員從與該第一角度不同的第二角度觀看該圖像感測器。In some aspects, the first image data includes the person viewing the image sensor from a first angle, wherein the second image data includes the person viewing the image from a second angle different from the first angle. Like a sensor.

在一些態樣中，視角變化包括根據角度並圍繞軸線的視角旋轉。在一些態樣中，視角變化包括根據方向和距離的視角平移。在一些態樣中，視角變化包括轉變。在一些態樣中，該視角變化包括沿著軸線在該第一圖像資料中的對該環境的圖示的原始視角與物件在該環境中的位置之間的移動，其中該物件的至少一部分在該第一圖像資料中進行了圖示。In some aspects, the viewing angle changes include viewing angle rotations based on angles and about an axis. In some aspects, the viewing angle changes include viewing angle translation based on direction and distance. In some aspects, perspective changes include transitions. In some aspects, the change in perspective includes movement along an axis between an original perspective of the illustration of the environment in the first image material and a position of an object in the environment, wherein at least a portion of the object This is illustrated in the first image material.

在一些態樣中，上文描述的方法、裝置和電腦可讀取媒體中的一者或多者還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個間隙；及在輸出該第二圖像資料之前，藉由使用插補填充該第二圖像資料中的一或多個間隙來至少部分地修改該第二圖像資料。In some aspects, one or more of the methods, apparatus, and computer-readable media described above further includes identifying the second image based on one or more gaps in the second plurality of motion vectors. one or more gaps in the second image data; and before outputting the second image data, at least partially modifying the second image by filling one or more gaps in the second image data using interpolation material.

在一些態樣中，上文描述的方法、裝置和電腦可讀取媒體中的一者或多者還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。In some aspects, one or more of the methods, apparatus, and computer-readable media described above further includes identifying the second image based on one or more gaps in the second plurality of motion vectors. one or more occluded areas in the image data; and before outputting the second image data, modify the second image at least in part by filling one or more gaps in the second image data using inpainting material.

在一些態樣中，上文描述的方法、裝置和電腦可讀取媒體中的一者或多者還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復利用一或多個訓練過的機器學習模型填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。In some aspects, one or more of the methods, apparatus, and computer-readable media described above further includes identifying the second image based on one or more gaps in the second plurality of motion vectors. one or more occluded areas in the image data; and before outputting the second image data, at least partially filling in one or more occluded areas in the second image data by using inpainting with one or more trained machine learning models. or multiple gaps to modify the second image data.

在一些態樣中，上文描述的方法、裝置和電腦可讀取媒體中的一者或多者還包括：基於該第二複數個運動向量中來自該第一圖像資料的一或多個衝突值來標識該第二圖像資料中的一或多個衝突；及基於與該第二複數個運動向量相關聯的移動資料，從該第一圖像資料中選擇該一或多個衝突值中的一者。In some aspects, one or more of the methods, apparatus, and computer-readable media described above further include: based on one or more of the second plurality of motion vectors from the first image data. conflict values to identify one or more conflicts in the second image data; and selecting the one or more conflict values from the first image data based on motion data associated with the second plurality of motion vectors one of them.

在一些態樣中，該深度資訊包括從第一視角看的環境的三維表示。在一些態樣中，該深度資料是從至少一個深度感測器接收的，其中該至少一個深度感測器包括至少一個飛行時間感測器。In some aspects, the depth information includes a three-dimensional representation of the environment from a first-person perspective. In some aspects, the depth data is received from at least one depth sensor, wherein the at least one depth sensor includes at least one time-of-flight sensor.

在一些態樣中，輸出該第二圖像資料包括使用至少顯示器來顯示該第二圖像資料。在一些態樣中，輸出該第二圖像資料包括使用至少一個通訊介面使該第二圖像資料發送到至少一個接收方設備。In some aspects, outputting the second image data includes using at least a display to display the second image data. In some aspects, outputting the second image data includes causing the second image data to be sent to at least one recipient device using at least one communication interface.

在一些態樣中，該第一圖像資料中對該環境的圖示從第一視角圖示環境，其中該視角變化是該第一視角與和該第二圖像資料中對該環境的第二圖示相對應的不同視角之間的變化。In some aspects, the illustration of the environment in the first image material illustrates the environment from a first perspective, wherein the change in perspective is a combination of the first perspective and a third perspective of the environment in the second image material. The two diagrams show the corresponding changes between different viewing angles.

在一些態樣中，該視角變化包括視角的視差移動或圍繞軸線的視角旋轉中的至少一者，該方法還包括：經由使用者介面接收以下一者：對視角的視差移動的距離的指示，或對視角的旋轉角度或軸線的指示。In some aspects, the change in perspective includes at least one of a parallax shift of the perspective or a rotation of the perspective about the axis, the method further comprising receiving, via the user interface, one of: an indication of a distance of the parallax shift of the perspective, or an indication of the angle of rotation or axis of the viewing angle.

在一些態樣中，上文描述的方法、裝置和電腦可讀取媒體中的一者或多者還包括：基於該第一複數個運動向量的相應端點中的一或多個間隙，標識該第二複數個運動向量中的一或多個間隙在該第二圖像資料中產生一或多個間隙；及在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。In some aspects, one or more of the methods, apparatus, and computer-readable media described above further include identifying, based on one or more gaps in corresponding endpoints of the first plurality of motion vectors, One or more gaps in the second plurality of motion vectors create one or more gaps in the second image data; and before outputting the second image data, filling the second plurality of motion vectors at least partially by using interpolation. one or more gaps in the second image data to modify the second image data.

在一些態樣中，該裝置是可穿戴設備、擴展現實設備（例如，虛擬實境（VR）設備、增強現實（AR）設備或混合現實（MR）設備）、頭戴式顯示（HMD）設備、無線通訊設備、行動設備（例如，行動電話及/或行動手持終端及/或所謂的「智慧型電話」或其它行動設備）、相機、個人電腦、膝上型電腦、伺服器電腦、車輛或計算設備或車輛的元件、另一個設備或其組合的一部分及/或包括它們。在一些態樣中，該裝置包括用於擷取一或多個圖像的一或多個相機。在一些態樣中，該裝置還包括用於顯示一或多個圖像、通知及/或其它可顯示資料的顯示器。在一些態樣中，上述裝置可以包括一或多個感測器（例如，一或多個慣性測量單元（IMU），諸如一或多個陀螺儀、一或多個陀螺測試儀、一或多個加速度計、其任何組合，及/或其它感測器）。In some aspects, the device is a wearable device, an extended reality device (eg, a virtual reality (VR) device, an augmented reality (AR) device, or a mixed reality (MR) device), a head-mounted display (HMD) device , wireless communication devices, mobile devices (such as mobile phones and/or mobile handheld terminals and/or so-called "smartphones" or other mobile devices), cameras, personal computers, laptops, server computers, vehicles or An element of a computing device or vehicle, part of another device or combination thereof and/or includes them. In some aspects, the device includes one or more cameras for capturing one or more images. In some aspects, the device also includes a display for displaying one or more images, notifications, and/or other displayable information. In some aspects, the apparatus may include one or more sensors (e.g., one or more inertial measurement units (IMUs), such as one or more gyroscopes, one or more gyroscopes, one or more accelerometers, any combination thereof, and/or other sensors).

本發明內容既非意圖辨識所要求保護的標的的關鍵或本質特徵，也非意圖單獨用於決定所要求保護的標的的範圍。應藉由參考本專利的完整說明書的適當部分、任何或全部附圖以及每條請求項來理解主題。This Summary is neither intended to identify key or essential features of the claimed subject matter, nor is it intended to be used solely to determine the scope of the claimed subject matter. The subject matter should be understood by reference to the appropriate portions of the complete specification of this patent, any or all of the drawings, and each claim.

藉由參考以下說明書、請求項和附圖，前述內容連同其它特徵和實施例將變得更加明顯。The foregoing, together with other features and embodiments, will become more apparent by reference to the following specification, claims and drawings.

下面提供本案的某些態樣和實施例。這些態樣和實施例中的一些可以獨立地應用並且它們中的一些可以組合應用，這對於熟習此項技術者來說是顯而易見的。在以下描述中，出於解釋的目的，闡述了具體細節以便提供對本案的實施例的透徹理解。然而，顯而易見的是，可以在沒有這些具體細節的情況下實踐各種實施例。附圖和描述不意圖是限制性的。Some aspects and examples of this case are provided below. Some of these aspects and embodiments may be used independently and some of them may be used in combination, as will be apparent to those skilled in the art. In the following description, for the purpose of explanation, specific details are set forth in order to provide a thorough understanding of the present embodiments. It may be apparent, however, that various embodiments may be practiced without these specific details. The drawings and descriptions are not intended to be limiting.

以下描述僅提供示例性實施例，而不意圖限制本案的範圍、適用性或配置。相反，對示範性實施例的以下描述將為熟習此項技術者提供用於實施示範性實施例的可行描述。應當理解，在不脫離所附請求項中闡述的本案的精神和範圍的情況下，可以對元件的功能和佈置作出各種改變。The following description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the present invention. Rather, the following description of the exemplary embodiments will provide those skilled in the art with a feasible description for implementing the exemplary embodiments. It should be understood that various changes could be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.

相機是使用圖像感測器接收光並擷取圖像圖框（諸如靜止圖像或視訊圖框）的設備。術語「圖像」、「圖像圖框」和「圖框」在本文中可互換使用。相機可以被配置有各種圖像擷取和影像處理設置。不同的設置會導致圖像具有不同的外觀。一些相機（諸如ISO、曝光時間、光圈大小、光圈範圍、快門速度、焦點和增益）設置是在擷取一或多個圖像圖框之前或期間決定和應用的。例如，可以將設置或參數應用於圖像感測器以擷取一或多個圖像圖框。其它相機設置可以配置一或多個圖像圖框的後處理，諸如對比度、亮度、飽和度、銳度、色階、曲線或顏色的改變。例如，可以將設置或參數應用於處理器（例如，圖像信號處理器或ISP）以處理由圖像感測器擷取的一或多個圖像圖框。A camera is a device that uses an image sensor to receive light and capture an image frame, such as a still image or video frame. The terms "image", "image frame" and "frame" are used interchangeably in this article. Cameras can be configured with various image capture and image processing settings. Different settings cause images to have different appearances. Some camera settings (such as ISO, exposure time, aperture size, aperture range, shutter speed, focus, and gain) are determined and applied before or during the acquisition of one or more image frames. For example, settings or parameters may be applied to an image sensor to capture one or more image frames. Other camera settings can configure post-processing of one or more image frames, such as changes in contrast, brightness, saturation, sharpness, levels, curves, or color. For example, settings or parameters may be applied to a processor (eg, an image signal processor or ISP) to process one or more image frames captured by an image sensor.

深度感測器是測量從深度感測器所在的環境的一或多個部分的深度、範圍或距離的感測器。深度感測器的示例包括光探測和測距（LIDAR）感測器、無線電探測和測距（RADAR）感測器、聲音探測和測距（SODAR）感測器、聲音導航和測距（SONAR）感測器、飛行時間（ToF）感測器、結構光感測器或其組合。由深度感測器擷取的深度資料可以包括點雲、3D模型及/或深度圖像。A depth sensor is a sensor that measures depth, range, or distance from one or more parts of the environment in which the depth sensor is located. Examples of depth sensors include light detection and ranging (LIDAR) sensors, radio detection and ranging (RADAR) sensors, sound detection and ranging (SODAR) sensors, sound navigation and ranging (SONAR) ) sensor, time-of-flight (ToF) sensor, structured light sensor or a combination thereof. The depth data captured by the depth sensor may include point clouds, 3D models and/or depth images.

擴展現實（XR）系統或設備可以向使用者提供虛擬內容及/或可以結合實體環境（場景）和虛擬環境（包括虛擬內容）的真實世界視圖。XR系統促進了使用者與此類組合XR環境的互動。真實世界視圖可以包括真實世界物件（也稱為實體物件），諸如人、車輛、建築物、桌子、椅子及/或其它真實世界或實體物件。XR系統或設備可以促進與不同類型的XR環境的互動（例如，使用者可以使用XR系統或設備與XR環境進行互動）。XR系統可以包括促進與VR環境互動的虛擬實境（VR）系統、促進與AR環境互動的增強現實（AR）系統、促進與MR環境互動的混合現實（MR）系統及/或其它XR系統。XR系統或設備的示例包括頭戴式顯示器（HMD）、智慧眼鏡等等。在一些情況下，XR設備可以追蹤使用者的一部分（例如，使用者的手及/或指尖）以允許使用者與虛擬內容項進行互動。Extended reality (XR) systems or devices can provide users with virtual content and/or a view of the real world that can combine physical environments (scenes) and virtual environments (including virtual content). The XR system facilitates user interaction with such combined XR environments. Real-world views may include real-world objects (also referred to as physical objects), such as people, vehicles, buildings, tables, chairs, and/or other real-world or physical objects. XR systems or devices can facilitate interaction with different types of XR environments (e.g., users can use XR systems or devices to interact with XR environments). XR systems may include virtual reality (VR) systems that facilitate interaction with VR environments, augmented reality (AR) systems that facilitate interaction with AR environments, mixed reality (MR) systems that facilitate interaction with MR environments, and/or other XR systems. Examples of XR systems or devices include head-mounted displays (HMDs), smart glasses, and more. In some cases, the XR device can track a portion of the user (eg, the user's hands and/or fingertips) to allow the user to interact with virtual content items.

成像系統可以包括相機的深度感測器和圖像感測器。深度感測器擷取包括與環境相對應的深度資訊的深度資料，諸如點雲、3D模型、深度圖像、一組視差值及/或環境的3D表示。圖像感測器擷取包括對環境的2D圖示的第一圖像資料。The imaging system may include a depth sensor and an image sensor of the camera. The depth sensor captures depth data including depth information corresponding to the environment, such as a point cloud, a 3D model, a depth image, a set of disparity values, and/or a 3D representation of the environment. The image sensor captures first image data including a 2D representation of the environment.

成像系統使用深度資料產生第一組運動向量。第一組運動向量對應於第一圖像資料中的對環境的圖示的視角從第一視角到第二視角的改變。The imaging system uses the depth data to generate a first set of motion vectors. The first set of motion vectors corresponds to a change in the perspective of the illustration of the environment in the first image material from a first perspective to a second perspective.

成像系統將網格反演應用於第一組運動向量以產生第二組運動向量。第二組運動向量指示第一圖像資料中的對環境的圖示的相應圖元針對從第一視角到第二視角的視角變化而移動的相應距離。在一些情況下，為了應用網格反演，成像系統藉由將較大運動優先於較小運動及/或藉由將環境中較近的物件的運動優先於環境中較遠的物件的運動來解決與網格反演的衝突。在一些情況下，為了應用網格反演，成像系統使用插補來填充缺失區域。The imaging system applies grid inversion to the first set of motion vectors to produce a second set of motion vectors. The second set of motion vectors is indicative of the corresponding distance that corresponding primitives of the representation of the environment in the first image material move in response to a change in perspective from the first perspective to the second perspective. In some cases, to apply grid inversion, the imaging system works by prioritizing larger motions over smaller motions and/or by prioritizing the motion of closer objects in the environment over the motion of farther objects in the environment. Resolve conflicts with grid inversion. In some cases, to apply grid inversion, imaging systems use interpolation to fill in missing areas.

成像系統藉由根據第二組運動向量修改圖像資料來產生第二圖像資料。例如，成像系統可以根據第二組運動向量藉由將第一圖像資料中的對環境的圖示的圖元資料的相應圖元移動達由第二組運動向量指示的相應距離來修改圖像資料。第二圖像資料包括從與第一圖像資料不同的視角看的環境的第二圖示。成像系統例如藉由顯示第二圖像資料或將第二圖像資料發送到接收方設備而輸出第二圖像資料。The imaging system generates second image data by modifying the image data based on the second set of motion vectors. For example, the imaging system may modify the image based on the second set of motion vectors by moving corresponding primitives of primitive data representing the environment in the first image data by corresponding distances indicated by the second set of motion vectors. material. The second image material includes a second illustration of the environment from a different perspective than the first image material. The imaging system outputs the second image data, for example, by displaying the second image data or sending the second image data to the recipient device.

憑藉藉由基於基於網格反演的第二組運動向量修改第一圖像資料產生第二圖像資料來實現視角變化的多種有用應用。例如，視角變化可以用於視訊資料的3D穩定化，例如以減少或消除可能由於使用者握住相機的手不穩及/或由於使用者的腳步引起的視差移動。視角變化可以用於圖框插補，以藉由在兩個現有圖框之間產生中間圖框來提高視訊的有效圖框率。視角變化可以用於「3D變焦」效果，該效果比環境的背景更快速地縮放環境的前景，以看起來更類似於向前進入環境的真實移動而不是放大。視角變化可以用於適應兩個感測器（例如，兩個相機、相機與深度感測器等）之間的偏移。視角變化可以用於頭部姿態校正，例如，當相機實際上位於人員的下方或上方時使相機表現為與人員的頭部齊平，這在視訊會議中很常見。視角變化可以用於XR，即使不同視角尚未完成渲染，也可以快速模擬環境的不同視角。視角變化可以用於各種特效，諸如模擬圍繞場景中的物件的旋轉的效果。Various useful applications of perspective changes are achieved by generating second image data by modifying the first image data based on a second set of motion vectors based on grid inversion. For example, perspective changes can be used for 3D stabilization of video material, such as to reduce or eliminate parallax movement that may be caused by an unstable hand holding the camera and/or by the user's footsteps. Viewing angle changes can be used for frame interpolation to increase the effective frame rate of a video by creating an intermediate frame between two existing frames. Perspective changes can be used for a "3D zoom" effect, which zooms the foreground of an environment more quickly than the background of the environment to look more like real movement forward into the environment rather than zooming in. Viewing angle changes can be used to accommodate offsets between two sensors (e.g., two cameras, a camera and a depth sensor, etc.). Perspective changes can be used for head pose correction, for example, making the camera appear level with the person's head when the camera is actually below or above the person, which is common in video conferencing. Perspective changes can be used in XR to quickly simulate different perspectives of an environment even if the different perspectives have not yet finished rendering. Perspective changes can be used for a variety of special effects, such as simulating rotation around objects in the scene.

在一些示例中，描述了用於影像處理的系統和技術。在一些示例中，成像系統接收由深度感測器擷取的深度資料（對應於環境），並且成像系統接收由圖像感測器擷取的第一圖像資料（對環境的圖示）。成像系統基於深度資料產生與第一圖像資料中對環境的圖示的視角變化相對應的第一運動向量。成像系統基於第一運動向量使用網格反演產生第二運動向量，該第二運動向量指示第一圖像資料中的對環境的圖示的相應圖元針對視角變化移動的相應距離。成像系統藉由根據第一運動向量及/或第二運動向量修改第一圖像資料來產生第二圖像資料。第二圖像資料包括從與第一圖像資料不同的視角看的環境的第二圖示。該成像系統輸出該第二圖像資料。In some examples, systems and techniques for image processing are described. In some examples, the imaging system receives depth data (corresponding to the environment) captured by the depth sensor, and the imaging system receives first image data (a representation of the environment) captured by the image sensor. The imaging system generates a first motion vector corresponding to a change in perspective of the representation of the environment in the first image data based on the depth information. The imaging system uses grid inversion based on the first motion vector to generate a second motion vector indicating a corresponding distance that a corresponding primitive of the representation of the environment in the first image material moves in response to a change in viewing angle. The imaging system generates second image data by modifying the first image data based on the first motion vector and/or the second motion vector. The second image material includes a second illustration of the environment from a different perspective than the first image material. The imaging system outputs the second image data.

本文描述的成像系統和技術提供了相對於先前影像處理系統的多項技術改進。例如，本文描述的影像處理系統和技術可以為任何視角平移及/或旋轉移動提供到不同視角的重投影。本文描述的影像處理系統和技術可以對各種應用使用這種重投影和支援它的網格反演技術，這些應用包括使用光流提高視訊圖框品質、對準深度和圖像資料以克服這兩個感測器之間的偏移距離、基於3D深度的視訊穩定化、基於3D深度的變焦（也稱為電影變焦）、對準來自兩個不同相機的圖像資料以克服這兩個感測器之間的偏移距離、頭部姿態校正、用於擴展現實（XR）的後期重投影、特效或其組合。使用網格反演可以提高效率、減少計算負荷、減少用電量、減少發熱量，並減少對散熱組件的需求。The imaging systems and techniques described herein provide several technical improvements over previous image processing systems. For example, the image processing systems and techniques described herein can provide reprojection to different viewing angles for any viewing angle translation and/or rotational movement. The image processing systems and techniques described in this article make it possible to use this reprojection and the grid inversion techniques that support it for a variety of applications, including using optical flow to improve video frame quality and aligning depth and image data to overcome these two problems. Offset distance between sensors, 3D depth-based video stabilization, 3D depth-based zoom (also called movie zoom), aligning image data from two different cameras to overcome the two senses Offset distance between monitors, head pose correction, post-production reprojection for extended reality (XR), special effects, or a combination thereof. Using grid inversion can increase efficiency, reduce computational load, reduce power usage, reduce heat generation, and reduce the need for cooling components.

將關於附圖描述本案的各個態樣。圖1是示出圖像擷取和處理系統100的架構的方塊圖。圖像擷取和處理系統100包括用於擷取和處理一或多個場景的圖像（例如，場景110的圖像）的各種元件。圖像擷取和處理系統100可以擷取獨立圖像（或照片）及/或可以擷取包括特定序列中的多個圖像（或視訊圖框）的視訊。系統100的鏡頭115面向場景110並接收來自場景110的光。鏡頭115將光偏向圖像感測器130。由鏡頭115接收的光穿過由一或多個控制機構120控制的光圈並由圖像感測器130接收。在一些示例中，場景110是環境中的場景。在一些示例中，場景110是使用者的至少一部分的場景。例如，場景110可以是使用者的一隻或兩隻眼睛及/或使用者面部的至少一部分的場景。Each aspect of this case will be described with reference to the accompanying drawings. FIG. 1 is a block diagram illustrating the architecture of an image capture and processing system 100. Image capture and processing system 100 includes various components for capturing and processing images of one or more scenes (eg, images of scene 110). The image capture and processing system 100 can capture individual images (or photos) and/or can capture video that includes multiple images (or video frames) in a specific sequence. Lens 115 of system 100 faces scene 110 and receives light from scene 110 . Lens 115 deflects light toward image sensor 130 . Light received by lens 115 passes through an aperture controlled by one or more control mechanisms 120 and is received by image sensor 130 . In some examples, scene 110 is a scene in an environment. In some examples, scene 110 is the scene of at least a portion of the user. For example, scene 110 may be a scene of one or both eyes of the user and/or at least a portion of the user's face.

一或多個控制機構120可以基於來自圖像感測器130的資訊及/或基於來自影像處理器150的資訊來控制曝光、聚焦及/或變焦。一或多個控制機構120可以包括多個機構和元件；例如，控制機構120可以包括一或多個曝光控制機構125A、一或多個聚焦控制機構125B及/或一或多個變焦控制機構125C。一或多個控制機制120還可以除了包括所示出的控制機制之外還包括附加的控制機制，諸如控制類比增益、閃光、HDR、景深及/或其它圖像擷取屬性的控制機構。One or more control mechanisms 120 may control exposure, focus, and/or zoom based on information from the image sensor 130 and/or based on information from the image processor 150 . One or more control mechanisms 120 may include a plurality of mechanisms and components; for example, the control mechanism 120 may include one or more exposure control mechanisms 125A, one or more focus control mechanisms 125B, and/or one or more zoom control mechanisms 125C . One or more control mechanisms 120 may also include additional control mechanisms in addition to those shown, such as controls that control analog gain, flash, HDR, depth of field, and/or other image capture attributes.

控制機構120的焦點控制機構125B可以獲得焦點設置。在一些示例中，焦點控制機構125B將焦點設置儲存在記憶體暫存器中。基於焦點設置，焦點控制機構125B可以相對於圖像感測器130的位置來調整鏡頭115的位置。例如，基於焦點設置，焦點控制機構125B可以藉由致動馬達或伺服系統使鏡頭115移動得更靠近圖像感測器130或遠離圖像感測器130，由此調整焦點。在一些情況下，系統100中可以包括附加鏡頭，諸如圖像感測器130的每個光電二極體上方的一或多個微鏡頭，其各自在從鏡頭115接收的光到達對應的光電二極體之前將光偏向該光電二極體。焦點設置可以經由對比度偵測自動對焦（CDAF）、相位偵測自動對焦（PDAF）或其某種組合來決定。焦點設置可以使用控制機構120、圖像感測器130及/或影像處理器150來決定。焦點設置可以稱為圖像擷取設置及/或影像處理設置。The focus setting can be obtained by the focus control mechanism 125B of the control mechanism 120 . In some examples, focus control mechanism 125B stores focus settings in a memory register. Based on the focus setting, focus control mechanism 125B may adjust the position of lens 115 relative to the position of image sensor 130 . For example, based on the focus setting, the focus control mechanism 125B can adjust the focus by actuating a motor or servo system to move the lens 115 closer to or away from the image sensor 130 . In some cases, additional lenses may be included in system 100 , such as one or more microlenses above each photodiode of image sensor 130 , each of which responds when light received from lens 115 reaches the corresponding photodiode. The polar body deflects light toward the photodiode. Focus setting can be determined via contrast-detection autofocus (CDAF), phase-detection autofocus (PDAF), or some combination thereof. The focus setting may be determined using the control mechanism 120, the image sensor 130, and/or the image processor 150. Focus settings may be called image capture settings and/or image processing settings.

控制機構120的曝光控制機構125A可以獲得曝光設置。在一些情況下，曝光控制機構125A將曝光設置儲存在儲存暫存器中。基於該曝光設置，曝光控制機構125A可以控制光圈的大小（例如，光圈大小或光圈範圍）、光圈打開的持續時間（例如，曝光時間或快門速度）、圖像感測器130的靈敏度（例如，ISO速度或膠片速度）、由圖像感測器130應用的類比增益或其任何組合。曝光設置可以稱為圖像擷取設置及/或影像處理設置。The exposure setting can be obtained by the exposure control mechanism 125A of the control mechanism 120 . In some cases, exposure control mechanism 125A stores exposure settings in a storage register. Based on the exposure setting, the exposure control mechanism 125A can control the size of the aperture (eg, aperture size or aperture range), the duration of the aperture opening (eg, exposure time or shutter speed), the sensitivity of the image sensor 130 (eg, ISO speed or film speed), analog gain applied by image sensor 130, or any combination thereof. Exposure settings may be referred to as image capture settings and/or image processing settings.

控制機構120的變焦控制機構125C可以獲得變焦設置。在一些示例中，變焦控制機構125C將變焦設置儲存在記憶體暫存器中。基於變焦設置，變焦控制機構125C可以控制包括鏡頭115和一或多個附加鏡頭的鏡頭元件的總成（鏡頭總成）的焦距。例如，變焦控制機構125C可以藉由致動一或多個馬達或伺服系統以使一或多個鏡頭相對於彼此移動來控制鏡頭總成的焦距。變焦設置可以稱為圖像擷取設置及/或影像處理設置。在一些示例中，鏡頭總成可以包括超焦變焦鏡頭或變焦距變焦鏡頭。在一些示例中，鏡頭總成可以包括聚焦鏡頭（在一些情況下可以是鏡頭115），其首先從場景110接收光，然後在光到達圖像感測器130之前，光通過聚焦鏡頭（例如，鏡頭115）與圖像感測器130之間的無焦變焦系統）。在一些情況下，無焦變焦系統可以包括焦距相等或類似（例如，在閾值差內）的兩個正（例如，會聚、凸）鏡頭，在它們之間具有負（例如，發散、凹）鏡頭。在一些情況下，變焦控制機構125C移動無焦變焦系統中的一或多個鏡頭，諸如負鏡頭和一個或兩個正鏡頭。The zoom setting can be obtained by the zoom control mechanism 125C of the control mechanism 120 . In some examples, zoom control mechanism 125C stores zoom settings in a memory register. Based on the zoom setting, zoom control mechanism 125C may control the focal length of an assembly of lens elements (lens assembly) including lens 115 and one or more additional lenses. For example, zoom control mechanism 125C may control the focal length of the lens assembly by actuating one or more motors or servos to move one or more lenses relative to each other. The zoom settings may be called image capture settings and/or image processing settings. In some examples, the lens assembly may include a hyperfocal zoom lens or a zoom zoom lens. In some examples, the lens assembly may include a focusing lens (which may be lens 115 in some cases) that first receives light from scene 110 and then passes the light through the focusing lens (eg, Afocal zoom system between lens 115) and image sensor 130). In some cases, an afocal zoom system may include two positive (e.g., converging, convex) lenses of equal or similar (e.g., within a threshold difference) focal length, with a negative (e.g., divergent, concave) lens between them . In some cases, zoom control mechanism 125C moves one or more lenses in an afocal zoom system, such as a negative lens and one or two positive lenses.

圖像感測器130包括光電二極體或其它光敏元件的一或多個陣列。每個光電二極體測量最終與由圖像感測器130產生的圖像中的特定圖元相對應的光量。在一些情況下，不同的光電二極體可能被不同的濾色器覆蓋，因此可以測量與覆蓋光電二極體的濾色器的顏色匹配的光。例如，拜耳濾色器包括紅色濾色器、藍色濾色器和綠色濾色器，其中圖像的每個圖元基於來自覆蓋在紅色濾色器中的至少一個光電二極體的紅光資料、來自覆蓋在藍色濾色器中的至少一個光電二極體的藍光資料和來自覆蓋在綠色濾色器中的至少一個光電二極體的綠光資料而產生。其它類型的濾色器可以使用黃色、品紅色及/或青色（也稱為「翡翠色」）濾色器作為紅色、藍色及/或綠色濾色器的代替或補充。一些圖像感測器可能完全沒有濾色器，而是可能在整個圖元陣列中使用不同的光電二極體（在一些情況下是豎直堆疊的）。整個圖元陣列中的不同光電二極體可以具有不同的光譜靈敏度曲線，因此回應不同波長的光。單色圖像感測器也可能缺少濾色器，因此缺少色彩深度。Image sensor 130 includes one or more arrays of photodiodes or other light-sensitive elements. Each photodiode measures the amount of light that ultimately corresponds to a specific primitive in the image produced by image sensor 130 . In some cases, different photodiodes may be covered by different color filters so that light matching the color of the color filter covering the photodiodes can be measured. For example, Bayer filters include red filters, blue filters, and green filters, where each primitive of the image is based on red light from at least one photodiode covered in the red filter data, blue light data from at least one photodiode covered in a blue color filter, and green light data from at least one photodiode covered in a green color filter. Other types of color filters may use yellow, magenta, and/or cyan (also called "emerald") color filters instead of or in addition to red, blue, and/or green color filters. Some image sensors may have no color filters at all and may instead use different photodiodes (in some cases stacked vertically) throughout the array of primitives. Different photodiodes throughout the array of primitives can have different spectral sensitivity curves and therefore respond to different wavelengths of light. Monochrome image sensors may also lack color filters and therefore lack color depth.

在一些情況下，圖像感測器130可以替代地或另外包括不透明及/或反射遮罩，其阻擋光在某些時間及/或從特定角度到達某些光電二極體或某些光電二極體的部分，這可以用於相位偵測自動對焦（PDAF）。圖像感測器130還可以包括類比增益放大器以放大由光電二極體輸出的類比信號及/或包括類比數位轉換器（ADC）以將光電二極體的類比信號輸出（及/或由類比增益放大器放大）轉換為數位信號。在一些情況下，關於控制機構120中的一者或多者討論的某些元件或功能可以替代地或另外包括在圖像感測器130中。圖像感測器130可以是電荷耦合設備（CCD）感測器、電子倍增CCD（EMCCD）感測器、主動圖元感測器（APS）、互補金屬氧化物半導體（CMOS）、N型金屬氧化物半導體（NMOS）、混合CCD/CMOS感測器（例如，sCMOS），或其一些其它組合。In some cases, image sensor 130 may alternatively or additionally include opaque and/or reflective masks that block light from reaching certain photodiodes or photodiodes at certain times and/or from certain angles. Part of the polar body, this can be used for phase detection autofocus (PDAF). The image sensor 130 may also include an analog gain amplifier to amplify the analog signal output by the photodiode and/or include an analog-to-digital converter (ADC) to output the analog signal of the photodiode (and/or by the analog signal). A gain amplifier amplifies) and converts it into a digital signal. In some cases, certain elements or functionality discussed with respect to one or more of control mechanisms 120 may alternatively or additionally be included in image sensor 130 . The image sensor 130 may be a charge coupled device (CCD) sensor, an electron multiplying CCD (EMCCD) sensor, an active picture element sensor (APS), a complementary metal oxide semiconductor (CMOS), or an N-type metal Oxide semiconductor (NMOS), hybrid CCD/CMOS sensor (e.g., sCMOS), or some other combination thereof.

影像處理器150可以包括一或多個處理器，諸如一或多個圖像信號處理器（ISP）（包括ISP 154）、一或多個主機處理器（包括主機處理器152）及/或關於計算系統4100討論的任何其它類型中的一種或多者類型的處理器4110。主機處理器152可以是數位訊號處理器（DSP）及/或其它類型的處理器。在一些實施方式中，影像處理器150是包括主機處理器152和ISP 154的單個積體電路或晶片（例如，稱為片上系統或SoC）。在一些情況下，晶片還可以包括一或多個輸入/輸出埠（例如，輸入/輸出（I/O）埠156）、中央處理單元（CPU）、圖形處理單元（GPU）、寬頻數據機（例如，3G、4G或LTE、5G等）、記憶體、連接組件（例如，藍芽®、全球定位系統（GPS）等）、其任何組合及/或其它組件。I/O埠156可以包括根據一或多個協定或規範的任何合適的輸入/輸出埠或介面，諸如內部積體電路2（I2C）介面、內部積體電路3（I3C）介面、串列周邊介面（SPI）介面、串列通用輸入/輸出（GPIO）介面、行動行業處理器介面（MIPI）（諸如MIPI CSI-2實體（PHY）層埠或介面）、進階高效能匯流排（AHB）匯流排、其任何組合及/或其它輸入/輸出埠。在一個說明性示例中，主機處理器152可以使用I2C埠與圖像感測器130進行通訊，並且ISP 154可以使用MIPI埠與圖像感測器130進行通訊。Image processor 150 may include one or more processors, such as one or more image signal processors (ISPs) (including ISP 154), one or more host processors (including host processor 152), and/or Computing system 4100 may be one or more of the other types of processors 4110 discussed. Host processor 152 may be a digital signal processor (DSP) and/or other types of processors. In some implementations, image processor 150 is a single integrated circuit or die (eg, referred to as a system on a chip or SoC) that includes host processor 152 and ISP 154 . In some cases, the chip may also include one or more input/output ports (eg, input/output (I/O) port 156), a central processing unit (CPU), a graphics processing unit (GPU), a broadband modem ( For example, 3G, 4G or LTE, 5G, etc.), memory, connectivity components (e.g., Bluetooth®, Global Positioning System (GPS), etc.), any combination thereof, and/or other components. I/O port 156 may include any suitable input/output port or interface in accordance with one or more protocols or specifications, such as an inter-integrated circuit 2 (I2C) interface, an inter-integrated circuit 3 (I3C) interface, a serial peripheral Interface (SPI) interface, Serial General Purpose Input/Output (GPIO) interface, Mobile Industry Processor Interface (MIPI) (such as MIPI CSI-2 Physical (PHY) layer port or interface), Advanced High Performance Bus (AHB) bus, any combination thereof, and/or other input/output ports. In one illustrative example, host processor 152 may communicate with image sensor 130 using an I2C port, and ISP 154 may communicate with image sensor 130 using a MIPI port.

影像處理器150可以執行許多工，諸如去馬賽克、顏色空間轉換、圖像圖框降低取樣、圖元插補、自動曝光（AE）控制、自動增益控制（AGC）、CDAF、PDAF、自動白平衡、合併圖像圖框以形成HDR圖像、圖像辨識、物件辨識、特徵辨識、輸入接收、管理輸出、管理記憶體或其某種組合。影像處理器150可以將圖像圖框及/或經處理的圖像儲存在隨機存取記憶體（RAM）140及/或4120、唯讀記憶體（ROM）145及/或4125、快取記憶體、記憶體單元、另一個儲存儲存裝置、或其某種組合中。The image processor 150 can perform many tasks, such as demosaicing, color space conversion, image frame downsampling, primitive interpolation, automatic exposure (AE) control, automatic gain control (AGC), CDAF, PDAF, automatic white balance , merging image frames to form HDR images, image recognition, object recognition, feature recognition, input reception, managing output, managing memory, or some combination thereof. The image processor 150 may store the image frame and/or the processed image in the random access memory (RAM) 140 and/or 4120, the read only memory (ROM) 145 and/or 4125, or the cache memory. body, a memory unit, another storage device, or some combination thereof.

各種輸入/輸出（I/O）設備160可以連接到影像處理器150。I/O設備160可以包括顯示螢幕、鍵盤、小鍵盤、觸控式螢幕、觸控板、觸敏表面、印表機、任何其它輸出設備4135、任何其它輸入設備4145或其某種組合。在一些情況下，可以藉由I/O設備160的實體鍵盤或小鍵盤或者藉由I/O設備160的觸控式螢幕的虛擬鍵盤或小鍵盤而將字幕輸入到影像處理設備105B中。I/O 160可以包括實現系統100與一或多個周邊設備之間的有線連接的一或多個埠、插孔或其它連接器，系統100可以藉由該一或多個埠、插孔或其它連接器從一或多個周邊設備接收資料及/或向一或多個周邊設備發送資料。I/O 160可以包括實現系統100與一或多個周邊設備之間的無線連接的無線收發器，系統100可以藉由該無線收發器從一或多個周邊設備接收資料及/或向一或多個周邊設備發送資料。周邊設備可以包括任何先前討論的類型的I/O設備160，並且一旦它們耦合到埠、插孔、無線收發器或其它有線及/或無線連接器，它們本身就可以被認為是I/O設備160。Various input/output (I/O) devices 160 may be connected to image processor 150 . I/O device 160 may include a display screen, a keyboard, a keypad, a touch screen, a trackpad, a touch-sensitive surface, a printer, any other output device 4135, any other input device 4145, or some combination thereof. In some cases, subtitles may be input into image processing device 105B via a physical keyboard or keypad of I/O device 160 or via a virtual keyboard or keypad of a touch screen of I/O device 160 . I/O 160 may include one or more ports, jacks, or other connectors that enable wired connections between system 100 and one or more peripheral devices through which system 100 may Other connectors receive data from and/or send data to one or more peripheral devices. I/O 160 may include a wireless transceiver that enables wireless connection between system 100 and one or more peripheral devices, through which system 100 may receive data from one or more peripheral devices and/or transmit data to one or more peripheral devices. Send data to multiple peripheral devices. Peripheral devices may include any of the previously discussed types of I/O devices 160 and may themselves be considered I/O devices once they are coupled to a port, jack, wireless transceiver, or other wired and/or wireless connector 160.

在一些情況下，圖像擷取和處理系統100可以是單個設備。在一些情況下，圖像擷取和處理系統100可以是兩個或更多個單獨的設備，其包括圖像擷取裝置105A（例如，相機）和影像處理設備105B（例如，耦合到相機的計算設備）。在一些實施方式中，圖像擷取裝置105A和影像處理設備105B可以例如經由一根或多根導線、電纜或其它電連接器及/或經由一或多個無線收發器無線地耦合在一起。在一些實施方式中，圖像擷取裝置105A和影像處理設備105B可以彼此斷開。In some cases, image capture and processing system 100 may be a single device. In some cases, image capture and processing system 100 may be two or more separate devices, including image capture device 105A (eg, a camera) and image processing device 105B (eg, a camera coupled to the camera). computing equipment). In some embodiments, image capture device 105A and image processing device 105B may be coupled together wirelessly, such as via one or more wires, cables, or other electrical connectors and/or via one or more wireless transceivers. In some embodiments, the image capture device 105A and the image processing device 105B may be disconnected from each other.

如圖1所示，豎直虛線將圖1的圖像採集和處理系統100分成兩部分，它們分別表示圖像採集設備105A和影像處理設備105B。圖像擷取裝置105A包括鏡頭115、控制機構120和圖像感測器130。影像處理設備105B包括影像處理器150（包括ISP 154和主機處理器152）、RAM 140、ROM 145和I/O 160。在一些情況下，圖像擷取裝置105A中所示的某些組件（諸如ISP 154及/或主機處理器152）可以被包括在圖像擷取裝置105A中。As shown in Figure 1, the vertical dotted line divides the image acquisition and processing system 100 of Figure 1 into two parts, which respectively represent the image acquisition device 105A and the image processing device 105B. The image capturing device 105A includes a lens 115, a control mechanism 120 and an image sensor 130. Image processing device 105B includes image processor 150 (including ISP 154 and host processor 152), RAM 140, ROM 145 and I/O 160. In some cases, certain components shown in image capture device 105A, such as ISP 154 and/or host processor 152, may be included in image capture device 105A.

圖像擷取和處理系統100可以包括電子設備，諸如行動或固定電話手持終端（例如，智慧手機、蜂巢式電話等）、桌上型電腦、膝上型或筆記型電腦、平板電腦、機上盒、電視機、相機、顯示裝置、數位媒體播放機、視訊遊戲控制台、視訊串流設備、網際網路協定（IP）相機或任何其它合適的電子設備。在一些示例中，圖像擷取和處理系統100可以包括用於無線通訊（諸如蜂巢網路通訊、802.11 wi-fi通訊、無線區域網路（WLAN）通訊或其某種組合）的一或多個無線收發器。在一些實施方式中，圖像擷取裝置105A和影像處理設備105B可以是不同的設備。例如，圖像擷取裝置105A可以包括相機設備並且影像處理設備105B可以包括計算設備，諸如行動手持終端、桌上型電腦或其它計算設備。Image capture and processing system 100 may include electronic devices such as mobile or landline handheld terminals (e.g., smartphones, cellular phones, etc.), desktop computers, laptop or notebook computers, tablet computers, on-board box, television, camera, display device, digital media player, video game console, video streaming device, Internet Protocol (IP) camera or any other suitable electronic device. In some examples, image capture and processing system 100 may include one or more devices for wireless communications, such as cellular network communications, 802.11 wi-fi communications, wireless local area network (WLAN) communications, or some combination thereof. wireless transceiver. In some embodiments, the image capture device 105A and the image processing device 105B may be different devices. For example, the image capture device 105A may include a camera device and the image processing device 105B may include a computing device, such as a mobile handheld terminal, a desktop computer, or other computing device.

雖然圖像擷取和處理系統100被示為包括某些元件，但是一般熟習此項技術者將理解圖像擷取和處理系統100可以包括比圖1中所示的元件更多的元件。圖像擷取和處理系統100的元件可以包括軟體、硬體或者軟體與硬體的一或多個組合。例如，在一些實施方式中，圖像擷取和處理系統100的元件可以包括電子電路或其它電子硬體及/或可以使用電子電路或其它電子硬體來實施，該電子電路或其它電子硬體可以包括一或多個可程式設計電子電路（例如，微處理器、GPU、DSP、CPU及/或其它合適的電子電路）；及/或包括電腦軟體、韌體或其任何組合及/或可以使用電腦軟體、韌體或其任何組合來實施，以執行本文描述的各種操作。軟體及/或韌體可以包括儲存在電腦可讀取儲存媒體上並且可由實施圖像擷取和處理系統100的電子設備的一或多個處理器執行的一或複數個指令。Although the image capture and processing system 100 is shown as including certain elements, those skilled in the art will understand that the image capture and processing system 100 may include more elements than those shown in FIG. 1 . The components of image capture and processing system 100 may include software, hardware, or one or more combinations of software and hardware. For example, in some embodiments, components of the image capture and processing system 100 may include and/or may be implemented using electronic circuits or other electronic hardware. May include one or more programmable electronic circuits (e.g., microprocessor, GPU, DSP, CPU and/or other suitable electronic circuits); and/or may include computer software, firmware or any combination thereof and/or may Implemented using computer software, firmware, or any combination thereof, to perform the various operations described herein. Software and/or firmware may include one or more instructions stored on a computer-readable storage medium and executable by one or more processors of an electronic device implementing image capture and processing system 100 .

圖2是示出用於執行用於各種應用的重投影操作的成像系統200的示例性架構的方塊圖。在一些示例中，成像系統200包括至少一個圖像擷取和處理系統100、圖像擷取裝置105A、影像處理設備105B或其組合。在一些示例中，成像系統200包括至少一個計算系統4100。在一些示例中，成像系統200包括至少一個神經網路3900。2 is a block diagram illustrating an exemplary architecture of an imaging system 200 for performing reprojection operations for various applications. In some examples, imaging system 200 includes at least one image capture and processing system 100, image capture device 105A, image processing device 105B, or a combination thereof. In some examples, imaging system 200 includes at least one computing system 4100. In some examples, imaging system 200 includes at least one neural network 3900.

在一些示例中，成像系統200包括一或多個感測器205。感測器205擷取測量及/或追蹤關於環境的各態樣的資訊的感測器資料，其中成像系統200及/或成像系統200的使用者在該環境中。在一些示例中，感測器205可以擷取測量及/或追蹤關於使用者身體及/或使用者行為的資訊的感測器資料。在一些示例中，感測器205包括面向環境及/或使用者的至少一部分的一或多個相機。一或多個相機可以包括擷取環境及/或使用者的至少一部分的圖像的一或多個圖像感測器。在一些示例中，感測器205包括面向環境及/或使用者的至少一部分的一或多個深度感測器。一或多個深度感測器可以擷取環境及/或使用者的至少一部分的深度資料（例如，深度圖像、點雲、3D模型、深度感測器與環境的部分之間的範圍、深度感測器與環境的部分之間的深度及/或深度感測器與環境的部分之間的距離）。在一些示例中，還可以使用立體深度感測利用來自立體相機的圖像資料來決定深度資料（諸如上文列出的任何類型的深度資料）。在一些示例中，可以使用來自立體相機的圖像資料藉由將圖像資料輸入到基於訓練資料訓練的訓練過的機器學習模型來決定深度資料。訓練資料包括由立體相機（或類似立體佈置的其它相機）擷取的其它圖像連同對應的深度資料。在一些示例中，感測器205包括一或多個其它類型的感測器，諸如麥克風、加速度計、陀螺儀、定位接收器、慣性測量單元（IMU）、生物感測器或其組合。在圖2中，一或多個感測器205被示為相機圖示和麥克風圖示。In some examples, imaging system 200 includes one or more sensors 205 . Sensor 205 captures sensor data that measures and/or tracks information about various aspects of the environment in which imaging system 200 and/or a user of imaging system 200 is. In some examples, sensors 205 may capture sensor data that measures and/or tracks information about the user's body and/or the user's behavior. In some examples, sensors 205 include one or more cameras facing at least a portion of the environment and/or the user. One or more cameras may include one or more image sensors that capture images of at least a portion of the environment and/or the user. In some examples, sensors 205 include one or more depth sensors facing at least a portion of the environment and/or the user. One or more depth sensors may capture depth data of at least a portion of the environment and/or the user (e.g., depth images, point clouds, 3D models, ranges between the depth sensor and portions of the environment, depth the depth between the sensor and parts of the environment and/or the distance between the depth sensor and parts of the environment). In some examples, stereoscopic depth sensing may also be used to determine depth data (such as any of the types of depth data listed above) using image data from a stereo camera. In some examples, image data from a stereo camera can be used to determine depth data by inputting the image data into a trained machine learning model trained based on the training data. The training data includes other images captured by a stereo camera (or other cameras in a similar stereo arrangement) along with corresponding depth data. In some examples, sensors 205 include one or more other types of sensors, such as microphones, accelerometers, gyroscopes, positioning receivers, inertial measurement units (IMUs), biosensors, or combinations thereof. In Figure 2, one or more sensors 205 are shown as a camera diagram and a microphone diagram.

感測器205可以包括一或多個相機、圖像感測器、麥克風、心率監測器、血氧計、生物感測器、定位收發器、慣性測量單元（IMU）、加速度計、陀螺儀、回轉測試儀、氣壓計、溫度計、高度計、深度感測器、其它本文討論的感測器或其組合。深度感測器的示例包括光探測和測距（LIDAR）感測器、無線電探測和測距（RADAR）感測器、聲音探測和測距（SODAR）感測器、聲音導航和測距（SONAR）感測器、飛行時間（ToF）感測器、結構光感測器或其組合。定位接收器的示例包括全球導航衛星系統（GNSS）接收器、全球定位系統（GPS）接收器、蜂巢信號收發器、Wi-Fi收發器、無線區域網路（WLAN）收發器、藍芽收發器、信標收發器、近場通訊（NFC）收發器、個人區域網路（PAN）收發器、射頻標識（RFID）收發器、通訊介面4140或其組合。在一些示例中，一或多個感測器205包括至少一個圖像擷取和處理系統100、圖像擷取裝置105A、影像處理設備105B或其組合。在一些示例中，一或多個感測器205包括計算系統4100的至少一個輸入設備4145。在一些實施方式中，感測器205中的一者或多者可以補充或改進來自其它感測器205的感測器讀數。例如，應用程式引擎210及/或圖像重投影引擎215可以使用來自定位接收器、慣性測量單元（IMU）、加速度計、陀螺儀及/或其它感測器的感測器資料來改進及/或補充圖像資料及/或深度資料。例如，應用程式引擎210及/或圖像重投影引擎215可以在圖像資料及/或深度資料的擷取期間及/或在圖像穩定及/或運動補償的情況下使用此類感測器資料來幫助決定成像系統200在環境中的姿態（例如，3D位置座標及/或取向（例如，俯仰、偏航及/或橫滾））。Sensors 205 may include one or more cameras, image sensors, microphones, heart rate monitors, oximeters, biosensors, positioning transceivers, inertial measurement units (IMUs), accelerometers, gyroscopes, Gyroscope, barometer, thermometer, altimeter, depth sensor, other sensors discussed in this article, or combinations thereof. Examples of depth sensors include light detection and ranging (LIDAR) sensors, radio detection and ranging (RADAR) sensors, sound detection and ranging (SODAR) sensors, sound navigation and ranging (SONAR) ) sensor, time-of-flight (ToF) sensor, structured light sensor or a combination thereof. Examples of positioning receivers include Global Navigation Satellite System (GNSS) receivers, Global Positioning System (GPS) receivers, cellular signal transceivers, Wi-Fi transceivers, Wireless Local Area Network (WLAN) transceivers, Bluetooth transceivers , beacon transceiver, near field communication (NFC) transceiver, personal area network (PAN) transceiver, radio frequency identification (RFID) transceiver, communication interface 4140 or a combination thereof. In some examples, one or more sensors 205 include at least one image capture and processing system 100, image capture device 105A, image processing device 105B, or a combination thereof. In some examples, one or more sensors 205 includes at least one input device 4145 of computing system 4100 . In some implementations, one or more of sensors 205 may supplement or improve sensor readings from other sensors 205 . For example, application engine 210 and/or image reprojection engine 215 may use sensor data from positioning receivers, inertial measurement units (IMUs), accelerometers, gyroscopes, and/or other sensors to improve and/or or supplement image data and/or depth data. For example, the application engine 210 and/or the image reprojection engine 215 may use such sensors during the acquisition of image data and/or depth data and/or in the context of image stabilization and/or motion compensation. Data is used to help determine the posture (eg, 3D position coordinates and/or orientation (eg, pitch, yaw, and/or roll)) of the imaging system 200 in the environment.

在一些示例中，成像系統200包括產生虛擬內容的虛擬內容產生器207。虛擬內容可以包括二維（2D）形狀、三維（3D）形狀、2D物件、3D物件、2D模型、3D模型、2D動畫、3D動畫、2D圖像、3D圖像、紋理、其它圖像的部分、字元、字串或其組合。在一些示例中，成像系統200可以將由虛擬內容產生器207產生的虛擬內容與來自感測器205的感測器資料組合以形成媒體資料285。在一些示例中，成像系統200可以將由虛擬內容產生器207產生的虛擬內容與媒體資料285組合。在圖2中，由虛擬內容產生器207產生的虛擬內容被示為四面體。在一些示例中，虛擬內容產生器207包括在成像系統200的一或多個處理器（諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合）上運行的一或多個軟體元素，例如與一或多個程式相對應的一組或多組指令。在一些示例中，虛擬內容產生器207包括一或多個硬體元件。例如，虛擬內容產生器207可以包括諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合的處理器。在一些示例中，虛擬內容產生器207包括一或多個軟體元素與一或多個硬體元件的組合。In some examples, imaging system 200 includes virtual content generator 207 that generates virtual content. Virtual content may include two-dimensional (2D) shapes, three-dimensional (3D) shapes, 2D objects, 3D objects, 2D models, 3D models, 2D animations, 3D animations, 2D images, 3D images, textures, portions of other images , characters, strings, or combinations thereof. In some examples, imaging system 200 may combine virtual content generated by virtual content generator 207 with sensor data from sensor 205 to form media data 285 . In some examples, imaging system 200 may combine virtual content generated by virtual content generator 207 with media material 285 . In FIG. 2, the virtual content generated by the virtual content generator 207 is shown as a tetrahedron. In some examples, virtual content generator 207 includes running on one or more processors of imaging system 200 , such as processor 4110 of computing system 4100 , image processor 150 , host processor 152 , ISP 154 , or a combination thereof. One or more software elements, such as one or more sets of instructions corresponding to one or more programs. In some examples, virtual content generator 207 includes one or more hardware components. For example, virtual content generator 207 may include a processor such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or a combination thereof. In some examples, virtual content generator 207 includes a combination of one or more software elements and one or more hardware elements.

成像系統200包括一組應用程式引擎210。應用程式引擎210從感測器205接收媒體資料285。媒體資料285由感測器205擷取。媒體資料285可以包括圖像資料，例如包括一或多個圖像或其部分。圖像資料可以包括視訊資料，例如包括視訊的視訊圖框。媒體資料285可以包括深度資料，例如包括深度圖像、點雲、3D模型、深度感測器與環境的部分之間的範圍、深度感測器與環境的部分之間的深度及/或深度感測器與環境的部分之間的距離，或其組合。媒體資料285可以包括音訊資料，例如包括由感測器205的一或多個麥克風記錄的音訊。在一些情況下，音訊資料可以包括與圖像資料的視訊相對應的音軌。在一些情況下，音訊資料可以是來自感測器205的多個麥克風的多頻道音訊，例如允許與音訊相對應的單獨音軌從環境中的不同方向到達感測器205。媒體資料285可以包括姿態資料，例如包括成像系統200在環境中的位置（例如，緯度、經度及/或海拔）、成像系統200的取向（例如，俯仰、偏航及/或橫滾）、成像系統200的移動速度、成像系統200的加速度、成像系統200的速度、成像系統200的動量、成像系統200的旋轉或其組合。在一些示例中，可以使用成像系統200的定位接收器、慣性測量單元（IMU）、加速度計及/或陀螺儀來擷取姿態資料。在一些示例中，成像系統200可以推斷姿態資料的各態樣，及/或可以基於基於其它類型的媒體資料285（諸如圖像資料、深度資料及/或音訊資料）的姿態決定來改進姿態資料。Imaging system 200 includes a set of application engines 210 . Application engine 210 receives media data 285 from sensor 205 . Media data 285 is captured by sensor 205. Media material 285 may include image material, such as one or more images or portions thereof. Image data may include video data, such as video frames including video. Media data 285 may include depth data, including, for example, depth images, point clouds, 3D models, ranges between a depth sensor and portions of the environment, depth between a depth sensor and portions of the environment, and/or a sense of depth. distance between the detector and parts of the environment, or a combination thereof. Media data 285 may include audio data, such as audio recorded by one or more microphones of sensor 205 . In some cases, the audio data may include an audio track corresponding to the video of the image data. In some cases, the audio data may be multi-channel audio from multiple microphones of sensor 205, such as allowing separate tracks corresponding to the audio to reach sensor 205 from different directions in the environment. Media data 285 may include attitude data, including, for example, the position of imaging system 200 in the environment (e.g., latitude, longitude, and/or altitude), the orientation of imaging system 200 (e.g., pitch, yaw, and/or roll), imaging The speed of movement of system 200, the acceleration of imaging system 200, the velocity of imaging system 200, the momentum of imaging system 200, the rotation of imaging system 200, or a combination thereof. In some examples, attitude data may be acquired using a positioning receiver, an inertial measurement unit (IMU), an accelerometer, and/or a gyroscope of the imaging system 200 . In some examples, imaging system 200 may infer aspects of pose data, and/or may improve pose data based on pose decisions based on other types of media data 285 , such as image data, depth data, and/or audio data. .

應用程式引擎210包括具有運動向量引擎220和網格反演引擎225的圖像重投影引擎215。圖像重投影引擎215的運動向量引擎220可以決定及/或產生與從環境的第一視角到環境的第二視角的移動相對應的第一組運動向量。在一些示例中，運動向量引擎220可以基於由感測器205的深度感測器擷取的深度資料及/或由感測器205的圖像感測器擷取的圖像資料來標識或產生環境的3D表示。運動向量引擎220可以將環境的3D表示自從第一視角表示環境旋轉、平移及/或變換到從第二視角表示環境。運動向量引擎220可以基於從第一視角到第二視角的這種視角變化來決定第一組運動向量。Application engine 210 includes an image reprojection engine 215 having a motion vector engine 220 and a mesh inversion engine 225. Motion vector engine 220 of image reprojection engine 215 may determine and/or generate a first set of motion vectors corresponding to movement from a first view of the environment to a second view of the environment. In some examples, motion vector engine 220 may identify or generate images based on depth data captured by a depth sensor of sensor 205 and/or image data captured by an image sensor of sensor 205 . 3D representation of the environment. Motion vector engine 220 may rotate, translate, and/or transform the 3D representation of the environment from a first perspective representing the environment to a second perspective representing the environment. Motion vector engine 220 may determine the first set of motion vectors based on this change of perspective from the first perspective to the second perspective.

由圖像重投影引擎215的運動向量引擎220輸出的運動向量可以輸出到網格反演引擎225。圖像重投影引擎215的網格反演引擎225可以對運動向量執行網格反演以產生第二組運動向量。圖像重投影引擎215可以使用第二組運動來修改媒體資料285的至少一個子集以產生修改後的媒體資料290。例如，圖像重投影引擎215可以接收從第三視角圖示環境的媒體資料285的圖像，並且可以將第二組運動向量應用於該圖像以產生修改後的媒體資料290的修改後的圖像。修改後的圖像可以從第四視角圖示環境。從第三視角到第四視角的改變可以與從第一視角到第二視角的改變匹配，例如應用相同的量、距離及/或角度的旋轉、平移及/或變換。例如，在一些示例中，從該第一視角到該第二視角的改變包括根據角度的視角旋轉，並且從該第三視角到該第四視角的改變包括根據該角度的視角旋轉。在一些示例中，從該第一視角到該第二視角的改變包括根據方向和距離的視角平移，並且從該第三視角到該第四視角的改變包括根據該方向和該距離的該視角平移。在一些示例中，從該第一視角到該第二視角的改變包括變換，並且從該第三視角到該第四視角的改變包括根據該變換的視角平移。The motion vectors output by the motion vector engine 220 of the image reprojection engine 215 may be output to the grid inversion engine 225 . Grid inversion engine 225 of image reprojection engine 215 may perform grid inversion on the motion vectors to generate a second set of motion vectors. Image reprojection engine 215 may modify at least a subset of media material 285 using the second set of motions to produce modified media material 290 . For example, image reprojection engine 215 may receive an image of media material 285 from a third perspective graphical environment and may apply a second set of motion vectors to the image to produce a modified version of media material 290 . images. The modified image can illustrate the environment from a fourth perspective. The change from the third perspective to the fourth perspective may match the change from the first perspective to the second perspective, such as applying the same amount, distance, and/or angle of rotation, translation, and/or transformation. For example, in some examples, the change from the first perspective to the second perspective includes a perspective rotation according to the angle, and the change from the third perspective to the fourth perspective includes a perspective rotation according to the angle. In some examples, the change from the first perspective to the second perspective includes a perspective translation according to the direction and the distance, and the change from the third perspective to the fourth perspective includes the perspective translation according to the direction and the distance. . In some examples, the change from the first perspective to the second perspective includes a transformation, and the change from the third perspective to the fourth perspective includes a perspective translation according to the transformation.

在一些示例中，圖像重投影引擎215包括在成像系統200的一或多個處理器（諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合）上運行的一或多個軟體元素，例如與一或多個程式相對應的一組或多組指令。在一些示例中，圖像重投影引擎215包括一或多個硬體元件。例如，圖像重投影引擎215可以包括諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合的處理器。在一些示例中，圖像重投影引擎215包括一或多個軟體元素與一或多個硬體元件的組合。In some examples, image reprojection engine 215 is included on one or more processors of imaging system 200 (such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or combinations thereof) One or more software elements that run, such as one or more sets of instructions corresponding to one or more programs. In some examples, image reprojection engine 215 includes one or more hardware components. For example, image reprojection engine 215 may include a processor such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or a combination thereof. In some examples, image reprojection engine 215 includes a combination of one or more software elements and one or more hardware elements.

在一些示例中，圖像重投影引擎215包括ML系統及/或訓練過的ML模型，其從感測器205及/或虛擬內容產生器207接收媒體資料285作為輸入。ML系統及/或訓練過的ML模型基於媒體資料285及/或虛擬內容輸出修改後的媒體資料290。在一些情況下，ML系統及/或訓練過的ML模型可以修改媒體資料285及/或虛擬內容，使得修改後的媒體資料290包括來自與媒體資料285中的對環境的圖示及/或表示的視角不同的視角的對環境的圖示及/或表示。在一些示例中，圖像重投影引擎215的ML系統及/或訓練過的ML模型可以包括一或多個神經網路（NN）（例如，神經網路3900）、一或多個迴旋神經網路（CNN）、一或多個訓練過的時間延遲神經網路（TDNN）、一或多個深度網路、一或多個自動編碼器、一或多個深度置信網路（DBN）、一或多個遞迴神經網路（RNN）、一或多個產生式對抗網路（GAN）、一或多個其它類型的神經網路、一或多個訓練過的支持向量機（SVM）、一或多個訓練過的隨機森林（RF）、一或多個電腦視覺系統、一或多個深度學習系統，或其組合。In some examples, image reprojection engine 215 includes an ML system and/or a trained ML model that receives media material 285 as input from sensor 205 and/or virtual content generator 207 . The ML system and/or the trained ML model outputs modified media material 290 based on the media material 285 and/or the virtual content. In some cases, the ML system and/or trained ML model may modify media material 285 and/or virtual content such that modified media material 290 includes illustrations and/or representations of the environment from media material 285 Illustrations and/or representations of the environment from different perspectives. In some examples, the ML system and/or the trained ML model of image reprojection engine 215 may include one or more neural networks (NN) (eg, neural network 3900 ), one or more convolutional neural networks network (CNN), one or more trained time delay neural networks (TDNN), one or more deep networks, one or more autoencoders, one or more deep belief networks (DBN), a or multiple Recurrent Neural Networks (RNN), one or more Generative Adversarial Networks (GAN), one or more other types of neural networks, one or more trained Support Vector Machines (SVM), One or more trained random forests (RF), one or more computer vision systems, one or more deep learning systems, or a combination thereof.

應用程式引擎210包括以各種方式應用圖像重投影引擎215（例如，包括運動向量引擎220及/或網格反演引擎225）的圖像重投影以用於各種應用的多個引擎。應用程式引擎210的這些引擎包括時間扭曲引擎230、深度感測器支援引擎235、3D穩定化引擎240、3D變焦引擎245、重投影SAT引擎250、頭部姿態校正引擎255、擴展現實（XR）後期重投影引擎260和特效引擎265。重投影SAT引擎250中的「SAT」可以代表感測器對準、空間對準變換或兩者。重投影SAT引擎250可以使用感測器對準、空間對準變換或兩者。應用程式引擎210的這些引擎修改媒體資料285的至少一個子集以產生修改後的媒體資料290，例如利用圖像重投影引擎215（例如，包括運動向量引擎220及/或網格反演引擎225）的圖像重投影來這樣做。Application engine 210 includes multiple engines that apply image reprojection of image reprojection engine 215 (eg, including motion vector engine 220 and/or mesh inversion engine 225) in various ways for various applications. These engines of application engine 210 include time warp engine 230, depth sensor support engine 235, 3D stabilization engine 240, 3D zoom engine 245, reprojection SAT engine 250, head pose correction engine 255, extended reality (XR) Post-production reprojection engine 260 and special effects engine 265. "SAT" in reprojection SAT engine 250 may represent sensor alignment, spatial alignment transformation, or both. Reprojection SAT engine 250 may use sensor alignment, spatial alignment transformation, or both. The engines of application engine 210 modify at least a subset of media data 285 to produce modified media data 290 , such as using image reprojection engine 215 (eg, including motion vector engine 220 and/or mesh inversion engine 225 ) to do so by reprojecting the image.

在一些示例中，應用程式引擎210中的至少一者包括ML系統及/或訓練過的ML模型，其從感測器205及/或虛擬內容產生器207接收媒體資料285作為輸入。ML系統及/或訓練過的ML模型基於媒體資料285及/或虛擬內容輸出修改後的媒體資料290。在一些情況下，ML系統及/或訓練過的ML模型可以修改媒體資料285及/或虛擬內容，使得修改後的媒體資料290包括來自與媒體資料285中的對環境的圖示及/或表示的視角不同的視角的對環境的圖示及/或表示。在一些示例中，應用程式引擎210中的至少一者的ML系統及/或訓練過的ML模型可以包括一或多個NN、一或多個CNN、一或多個TDNN、一或多個深度網路、一或多個自動編碼器、一或多個DBN、一或多個RNN、一或多個GAN、一或多個訓練過的SVM、一或多個訓練過的RF、一或多個電腦視覺系統、一或多個深度學習系統或其組合。In some examples, at least one of the application engines 210 includes an ML system and/or a trained ML model that receives media data 285 as input from the sensor 205 and/or the virtual content generator 207 . The ML system and/or the trained ML model outputs modified media material 290 based on the media material 285 and/or the virtual content. In some cases, the ML system and/or trained ML model may modify media material 285 and/or virtual content such that modified media material 290 includes illustrations and/or representations of the environment from media material 285 Illustrations and/or representations of the environment from different perspectives. In some examples, the ML system and/or the trained ML model of at least one of the application engines 210 may include one or more NNs, one or more CNNs, one or more TDNNs, one or more deep network, one or more autoencoders, one or more DBNs, one or more RNNs, one or more GANs, one or more trained SVMs, one or more trained RFs, one or more A computer vision system, one or more deep learning systems, or a combination thereof.

在一些示例中，包括圖像重投影引擎215的應用程式引擎210可以分析（例如，以決定運動向量）、處理及/或修改媒體資料285，其中由虛擬內容產生器207產生虛擬內容被結合到媒體資料285中。在一些示例中，包括圖像重投影引擎215的應用程式引擎210可以分析（例如，以決定運動向量）、處理及/或修改媒體資料285，其中由虛擬內容產生器207產生的虛擬內容未被結合到媒體資料285中。在一些示例中，例如如果由虛擬內容產生器207產生的虛擬內容被結合到媒體資料285中（該媒體資料輸入到應用程式引擎210中），則由包括圖像重投影引擎215的應用程式引擎210輸出的修改後的媒體資料290可以已經包括該虛擬內容。在一些示例中，例如如果由虛擬內容產生器207產生的虛擬內容未被結合到媒體資料285中（該媒體資料輸入到應用程式引擎210中），則由包括圖像重投影引擎215的應用程式引擎210輸出的修改後的媒體資料290可能缺少該虛擬內容。在此類示例中，在應用程式引擎210輸出修改後的媒體資料290之後但在使用輸出設備270及/或收發器275輸出修改後的媒體資料290之前，可以將由虛擬內容產生器207產生的虛擬內容添加到修改後的媒體資料290。In some examples, application engine 210 , including image reprojection engine 215 , may analyze (eg, to determine motion vectors), process, and/or modify media material 285 into which virtual content generated by virtual content generator 207 is incorporated. Media information 285. In some examples, application engine 210 including image reprojection engine 215 may analyze (eg, to determine motion vectors), process, and/or modify media material 285 in which virtual content generated by virtual content generator 207 is not Incorporated into media material 285. In some examples, such as if virtual content generated by virtual content generator 207 is incorporated into media material 285 (which is input into application engine 210), then by the application engine including image reprojection engine 215 The modified media material 290 output by 210 may already include the virtual content. In some examples, such as if the virtual content generated by the virtual content generator 207 is not incorporated into the media data 285 (the media data is input into the application engine 210), then by the application including the image reprojection engine 215 The modified media material 290 output by the engine 210 may lack this virtual content. In such examples, the virtual content generated by virtual content generator 207 may be generated after application engine 210 outputs modified media data 290 but before outputting modified media data 290 using output device 270 and/or transceiver 275 . Content added to modified media material 290.

在一些示例中，應用程式引擎210中的至少一者包括在成像系統200的一或多個處理器（諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合）上運行的一或多個軟體元素，例如與一或多個程式相對應的一組或多組指令。在一些示例中，應用程式引擎210中的至少一者包括一或多個硬體元件。例如，應用程式引擎210中的至少一者可以包括諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合的處理器。在一些示例中，應用程式引擎210中的至少一者包括一或多個軟體元素與一或多個硬體元件的組合。In some examples, at least one of application engines 210 includes one or more processors in imaging system 200 (such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or other One or more software elements that run on a combination), such as one or more sets of instructions corresponding to one or more programs. In some examples, at least one of application engines 210 includes one or more hardware elements. For example, at least one of application engines 210 may include a processor such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or combinations thereof. In some examples, at least one of application engines 210 includes a combination of one or more software elements and one or more hardware elements.

在一些示例中，成像系統200包括一或多個輸出設備270，其被配置為並且可以輸出修改後的媒體資料290。在一些示例中，輸出設備270包括顯示器，其被配置為並且可以顯示視覺媒體，諸如圖像及/或視訊。在一些示例中，輸出設備270包括音訊輸出設備，諸如揚聲器或頭戴式耳機或被配置為將成像系統200耦合到揚聲器或頭戴式耳機的連接器。音訊輸出設備被配置為並且可以播放音訊媒體，諸如音樂、音效、與視訊相對應的音軌、由（例如，感測器205的）麥克風記錄的音訊記錄，或其組合。輸出設備270可以輸出包括環境表示（例如，由感測器205擷取的媒體資料285）、虛擬內容（例如，由虛擬內容產生器207產生）、環境表示與虛擬內容的組合、對環境表示及/或虛擬內容及/或組合的修改（例如，使用應用程式引擎210及/或圖像重投影引擎215修改），或其組合。在一些示例中，輸出設備270可以面向成像系統200的使用者。例如，輸出設備270的顯示器可以面向成像系統200的使用者，及/或可以向（例如，朝向）成像系統200的使用者顯示視覺媒體。類似地，輸出設備270的音訊輸出設備可以面向成像系統200的使用者，及/或可以向（例如，朝向）成像系統200的使用者播放音訊媒體。在一些示例中，輸出設備270包括輸出設備4135。在一些示例中，輸出設備4135可以包括輸出設備270。在圖2中，將輸出設備270示為顯示視覺媒體資料的顯示器和播放音訊媒體資料的對應揚聲器。In some examples, imaging system 200 includes one or more output devices 270 that are configured to and can output modified media material 290 . In some examples, output device 270 includes a display that is configured to and can display visual media, such as images and/or video. In some examples, output device 270 includes an audio output device, such as a speaker or a headset, or a connector configured to couple imaging system 200 to a speaker or headset. The audio output device is configured to and can play audio media such as music, sound effects, a soundtrack corresponding to video, audio recordings recorded by a microphone (eg, of sensor 205), or a combination thereof. The output device 270 may output an output including an environment representation (eg, media material 285 captured by the sensor 205 ), virtual content (eg, generated by the virtual content generator 207 ), a combination of the environment representation and the virtual content, a combination of the environment representation and the virtual content. or modification of virtual content and/or combination (eg, modification using application engine 210 and/or image reprojection engine 215), or combination thereof. In some examples, output device 270 may be directed to a user of imaging system 200 . For example, a display of output device 270 may face a user of imaging system 200 , and/or visual media may be displayed toward (eg, toward) a user of imaging system 200 . Similarly, the audio output device of output device 270 may be oriented toward a user of imaging system 200 and/or may play audio media toward (eg, toward) a user of imaging system 200 . In some examples, output device 270 includes output device 4135. In some examples, output device 4135 may include output device 270. In Figure 2, output device 270 is shown as a display that displays visual media material and corresponding speakers that play audio media material.

成像系統200還包括一或多個收發器275，成像系統200可以使用該一或多個收發器（例如，藉由將媒體發送到接收方設備）來輸出由應用程式引擎210（例如，包括圖像重投影引擎215）產生的修改後的媒體資料290。接收方設備可以使用其自己的輸出設備來輸出媒體，例如藉由使用輸出設備的顯示器顯示媒體的視覺媒體資料及/或藉由使用輸出設備的音訊輸出設備播放媒體的音訊媒體資料來輸出媒體。收發器275可以包括有線或無線收發器、通訊介面、天線、連接件、耦合器、耦合系統或其組合。在一些示例中，收發器275可以包括計算系統4100的通訊介面4140。在一些示例中，計算系統4100的通訊介面4140可以包括收發器275。在圖2中，收發器275被示為發送媒體資料的無線收發器275。Imaging system 200 also includes one or more transceivers 275 that imaging system 200 may use (e.g., by sending media to a recipient device) to output data generated by application engine 210 (e.g., including images). Modified media material 290 generated by the reprojection engine 215). The recipient device may output the media using its own output device, such as by displaying the media's visual media data using a display of the output device and/or by playing the audio media data of the media using an audio output device of the output device. Transceiver 275 may include a wired or wireless transceiver, communication interface, antenna, connector, coupler, coupling system, or combinations thereof. In some examples, transceiver 275 may include communication interface 4140 of computing system 4100 . In some examples, communication interface 4140 of computing system 4100 may include transceiver 275 . In Figure 2, transceiver 275 is shown as a wireless transceiver 275 transmitting media material.

在一些示例中，成像系統200包括回饋引擎280。回饋引擎280可以偵測經由成像系統的使用者介面從使用者接收的回饋。回饋引擎280可以偵測從成像系統200的另一個引擎接收到的關於成像系統200的一個引擎的回饋，例如一個引擎是否決定使用來自另一引擎的資料。該回饋可以是關於應用程式引擎210中的任一者（諸如圖像重投影引擎215、運動向量引擎220、網格反演引擎225、時間扭曲引擎230、深度感測器支援引擎235、3D穩定化引擎240、3D變焦引擎245、重投影SAT引擎250、頭部姿態校正引擎255、XR後期重投影引擎260、特效引擎265或其組合）的回饋。由回饋引擎280接收到的回饋可以是肯定回饋或否定回饋。例如，如果成像系統200的一個引擎使用來自成像系統200的另一個引擎的資料，則回饋引擎280可以將這種情況解釋為肯定回饋。如果成像系統200的一個引擎拒絕來自成像系統200的另一個引擎的資料，則回饋引擎280可以將這種情況解釋為否定回饋。肯定回饋也可以基於來自感測器205的感測器資料及/或來自使用者介面的輸入的屬性，諸如使用者微笑、大笑、點頭、按下與肯定回饋相關聯的按鈕、做出與肯定回饋相關聯的手勢（例如，豎起大拇指）、說出肯定陳述（例如，「是」、「確認」、「好的」、「下一個」），或以其它方式對媒體做出肯定反應。否定回饋也可以基於來自感測器205的感測器資料及/或來自使用者介面的輸入的屬性，諸如使用者皺眉、哭泣、搖頭（例如，代表「不」動作）、按下與否定回饋相關聯的按鈕、做出與否定回饋相關聯的手勢（例如，大拇指朝下）、說出否定陳述（例如，「不」、「否定」、「不好」、「不是這個」），或以其它方式對虛擬內容的否定反應。In some examples, imaging system 200 includes feedback engine 280 . Feedback engine 280 may detect feedback received from a user via the user interface of the imaging system. Feedback engine 280 may detect feedback regarding one engine of imaging system 200 received from another engine of imaging system 200, such as whether one engine decides to use data from another engine. The feedback may be about any of the application engines 210 (such as image reprojection engine 215, motion vector engine 220, mesh inversion engine 225, time warp engine 230, depth sensor support engine 235, 3D stabilization transformation engine 240, 3D zoom engine 245, re-projection SAT engine 250, head posture correction engine 255, XR post-production re-projection engine 260, special effects engine 265 or a combination thereof). The feedback received by the feedback engine 280 may be a positive feedback or a negative feedback. For example, if one engine of imaging system 200 uses data from another engine of imaging system 200, feedback engine 280 may interpret this as a positive feedback. If one engine of imaging system 200 rejects data from another engine of imaging system 200, feedback engine 280 may interpret this situation as a negative feedback. Positive feedback may also be based on sensor data from sensor 205 and/or attributes of input from the user interface, such as the user smiling, laughing, nodding, pressing a button associated with positive feedback, making a Positive feedback associated with gestures (e.g., thumbs up), uttering affirmative statements (e.g., "yes," "confirm," "ok," "next"), or otherwise affirming the media reaction. Negative feedback may also be based on sensor data from sensor 205 and/or attributes of input from the user interface, such as the user frowning, crying, shaking his head (e.g., for a "no" action), pressing, and negative feedback. associated buttons, making gestures associated with negative feedback (e.g., thumbs down), saying negative statements (e.g., "no," "no," "no," "not this"), or Otherwise negative reactions to virtual content.

在一些示例中，回饋引擎280向成像系統200的一或多個ML系統提供回饋作為訓練資料以更新成像系統200的一或多個ML系統。例如，回饋引擎280可以將回饋作為訓練資料提供給ML系統及/或應用程式引擎210中的任一者的訓練過的ML模型，諸如圖像重投影引擎215、運動向量引擎220、網格反演引擎225、時間扭曲引擎230、深度感測器支援引擎235、3D穩定化引擎240、3D變焦引擎245、重投影SAT引擎250、頭部姿態校正引擎255、XR後期重投影引擎260、特效引擎265或其組合。肯定回饋可以用於加強及/或增強與ML系統及/或訓練過的ML模型的輸出相關聯的權重。否定回饋可以用於削弱及/或去除與ML系統及/或訓練過的ML模型的輸出相關聯的權重。In some examples, feedback engine 280 provides feedback as training material to one or more ML systems of imaging system 200 to update one or more ML systems of imaging system 200 . For example, feedback engine 280 may provide feedback as training data to a trained ML model of any of the ML systems and/or application engines 210, such as image reprojection engine 215, motion vector engine 220, mesh reflection engine 215, etc. Performance engine 225, time warp engine 230, depth sensor support engine 235, 3D stabilization engine 240, 3D zoom engine 245, re-projection SAT engine 250, head posture correction engine 255, XR post-production re-projection engine 260, special effects engine 265 or combination thereof. Positive feedback can be used to strengthen and/or enhance the weights associated with the output of the ML system and/or the trained ML model. Negative feedback can be used to weaken and/or remove weights associated with the output of the ML system and/or the trained ML model.

在一些示例中，回饋引擎280包括在諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合的處理器上運行的軟體元素，例如與程式相對應的一組指令。在一些示例中，回饋引擎280包括一或多個硬體元件。例如，回饋引擎280可以包括諸如計算系統4100的處理器4110、影像處理器150、主機處理器152、ISP 154或其組合的處理器。在一些示例中，回饋引擎280包括一或多個軟體元素與一或多個硬體元件的組合。In some examples, feedback engine 280 includes software elements, such as a program corresponding to a program, running on a processor such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or combinations thereof. Group instructions. In some examples, feedback engine 280 includes one or more hardware components. For example, feedback engine 280 may include a processor such as processor 4110 of computing system 4100, image processor 150, host processor 152, ISP 154, or combinations thereof. In some examples, feedback engine 280 includes a combination of one or more software elements and one or more hardware elements.

圖3A是示出用作擴展現實（XR）系統200的頭戴式顯示器（HMD）310的透視圖300。HMD 310例如可以是增強現實（AR）頭戴式耳機、虛擬實境（VR）頭戴式耳機、混合現實（MR）頭戴式耳機、擴展現實（XR）頭戴式耳機或其某種組合。HMD 310可以是成像系統200的示例。HMD 310包括沿著HMD 310的前部的第一相機330A和第二相機330B。第一相機330A和第二相機330B可以是成像系統200的感測器205的示例。當使用者的眼睛面向顯示器340時，HMD 310包括面向使用者眼睛的第三相機330C和第四相機330D。第三相機330C和第四相機330D可以是成像系統200的感測器205的示例。在一些示例中，HMD 310可能僅具有帶單個圖像感測器的單個相機。在一些示例中，除了第一相機330A、第二相機330B、第三相機330C和第四相機330D之外，HMD 310還可以包括一或多個附加相機。在一些示例中，HMD 310可以除了第一相機330A、第二相機330B、第三相機330C和第四相機330D之外還包括一或多個附加感測器，其還可以包括其它類型的感測器205及/或成像系統200的感測器205。在一些示例中，第一相機330A、第二相機330B、第三相機330C及/或第四相機330D可以是圖像擷取和處理系統100、圖像擷取裝置105A、影像處理設備105B或其組合的示例。3A is a perspective view 300 showing a head mounted display (HMD) 310 used as an extended reality (XR) system 200. HMD 310 may be, for example, an augmented reality (AR) headset, a virtual reality (VR) headset, a mixed reality (MR) headset, an extended reality (XR) headset, or some combination thereof . HMD 310 may be an example of imaging system 200 . The HMD 310 includes a first camera 330A and a second camera 330B along the front of the HMD 310 . The first camera 330A and the second camera 330B may be examples of the sensor 205 of the imaging system 200 . When the user's eyes face the display 340, the HMD 310 includes a third camera 330C and a fourth camera 330D facing the user's eyes. The third camera 330C and the fourth camera 330D may be examples of the sensor 205 of the imaging system 200 . In some examples, HMD 310 may only have a single camera with a single image sensor. In some examples, HMD 310 may include one or more additional cameras in addition to first camera 330A, second camera 330B, third camera 330C, and fourth camera 330D. In some examples, the HMD 310 may include one or more additional sensors in addition to the first camera 330A, the second camera 330B, the third camera 330C, and the fourth camera 330D, which may also include other types of sensing. 205 and/or the sensor 205 of the imaging system 200. In some examples, the first camera 330A, the second camera 330B, the third camera 330C and/or the fourth camera 330D may be the image capture and processing system 100, the image capture device 105A, the image processing device 105B, or the like. Example of combination.

HMD 310可以包括一或多個顯示器340，其對於在使用者320的頭上穿戴HMD 310的使用者320是可見的。HMD 310的一或多個顯示器340可以是成像系統200的輸出設備270的一或多個顯示器的示例。在一些示例中，HMD 310可以包括一個顯示器340和兩個取景器。兩個取景器可以包括用於使用者320的左眼的左取景器和用於使用者320的右眼的右取景器。左取景器可以被取向成使得使用者320的左眼看到顯示器的左側。右取景器可以被取向成使得使用者320的左眼看到顯示器的右側。在一些示例中，HMD 310可以包括兩個顯示器340，其包括向使用者320的左眼顯示內容的左顯示器和向使用者320的右眼顯示內容的右顯示器。HMD 310的一或多個顯示器340可以是數字「直通」顯示器或光學「透視」顯示器。HMD 310 may include one or more displays 340 that are visible to user 320 who wears HMD 310 on the user's 320 head. One or more displays 340 of HMD 310 may be an example of one or more displays of output device 270 of imaging system 200 . In some examples, HMD 310 may include one display 340 and two viewfinders. The two viewfinders may include a left viewfinder for the left eye of the user 320 and a right viewfinder for the right eye of the user 320 . The left viewfinder may be oriented so that the left eye of user 320 sees the left side of the display. The right viewfinder may be oriented so that the left eye of user 320 looks to the right side of the display. In some examples, HMD 310 may include two displays 340 including a left display that displays content to the left eye of user 320 and a right display that displays content to the right eye of user 320 . One or more displays 340 of HMD 310 may be digital "pass-through" displays or optical "see-through" displays.

HMD 310可以包括一或多個耳塞335，其可以用作向HMD 310的使用者的一隻或多隻耳朵輸出音訊的揚聲器及/或頭戴式耳機。在圖3A和圖3B中圖示一個耳塞335，但是應當理解，HMD 310可以包括兩個耳塞，使用者的每只耳朵（左耳和右耳）一個耳塞。在一些示例中，HMD 310還可以包括一或多個麥克風（未圖示）。一或多個麥克風可以是成像系統200的感測器205的示例。一或多個耳塞可以是成像系統200的輸出設備270的示例。在一些示例中，HMD 310經由一或多個耳塞335向使用者輸出的音訊可以包括或基於使用一或多個麥克風記錄的音訊。HMD 310 may include one or more earbuds 335 , which may function as speakers and/or headphones that output audio to one or more ears of the user of HMD 310 . One earbud 335 is illustrated in Figures 3A and 3B, but it should be understood that the HMD 310 may include two earbuds, one for each ear (left and right) of the user. In some examples, HMD 310 may also include one or more microphones (not shown). One or more microphones may be examples of sensors 205 of imaging system 200 . One or more earbuds may be examples of output devices 270 of imaging system 200 . In some examples, audio output by HMD 310 to the user via one or more earbuds 335 may include or be based on audio recorded using one or more microphones.

圖3B是示出由使用者320穿戴的圖3A的頭戴式顯示器（HMD）的透視圖350。使用者320將HMD 310穿戴在使用者320的頭部上位於使用者320的眼睛上方。HMD 310可以利用第一相機330A和第二相機330B擷取圖像。在一些示例中，HMD 310使用顯示器340向使用者320的眼睛顯示一或多個輸出圖像。在一些示例中，輸出圖像可以包括由虛擬內容產生器207產生的、使用合成器合成的及/或由輸出設備270的顯示器顯示的虛擬內容。輸出圖像可以基於由第一相機330A和第二相機330B擷取的圖像，該圖像例如具有疊加的虛擬內容。輸出圖像可以提供環境的立體視圖，在一些情況下，該立體視圖具有疊加的虛擬內容及/或其它修改。例如，HMD 310可以向使用者320的右眼顯示第一顯示圖像，該第一顯示圖像基於由第一相機330A擷取的圖像。HMD 310可以向使用者320的左眼顯示第二顯示圖像，該第二顯示圖像基於由第二相機330B擷取的圖像。例如，HMD 310可以在顯示圖像中提供疊加的虛擬內容，該虛擬內容疊加在由第一相機330A和第二相機330B擷取的圖像的上方。第三相機330C和第四相機330D可以在使用者觀看由顯示器340顯示的顯示圖像之前、期間及/或之後擷取眼睛的圖像。這樣，來自第三相機330C及/或第四相機330D的感測器資料可以擷取使用者眼睛（及/或使用者的其它部位）對虛擬內容的反應。HMD 310的耳塞335被示出在使用者320的耳朵中。HMD 310可以經由耳塞335及/或經由在使用者320的另一隻耳朵（未圖示）中的HMD 310的另一個耳塞（未圖示）向使用者320輸出音訊。FIG. 3B is a perspective view 350 showing the head mounted display (HMD) of FIG. 3A worn by user 320 . The user 320 wears the HMD 310 on the head of the user 320 above the eyes of the user 320 . The HMD 310 may capture images using the first camera 330A and the second camera 330B. In some examples, HMD 310 uses display 340 to display one or more output images to the eyes of user 320 . In some examples, the output image may include virtual content generated by virtual content generator 207, synthesized using a compositor, and/or displayed by the display of output device 270. The output image may be based on images captured by the first camera 330A and the second camera 330B, for example with superimposed virtual content. The output image may provide a stereoscopic view of the environment, in some cases with overlaid virtual content and/or other modifications. For example, the HMD 310 may display a first display image to the right eye of the user 320, the first display image being based on the image captured by the first camera 330A. The HMD 310 may display a second display image to the left eye of the user 320, the second display image being based on the image captured by the second camera 330B. For example, the HMD 310 may provide superimposed virtual content in the display image over the images captured by the first camera 330A and the second camera 330B. The third camera 330C and the fourth camera 330D may capture images of the user's eyes before, during, and/or after viewing the display image displayed by the display 340 . In this way, the sensor data from the third camera 330C and/or the fourth camera 330D can capture the reaction of the user's eyes (and/or other parts of the user) to the virtual content. Earbud 335 of HMD 310 is shown in the ear of user 320 . HMD 310 may output audio to user 320 via earbud 335 and/or via another earbud (not shown) of HMD 310 in the other ear (not shown) of user 320 .

圖4A是示出包括前置相機並且可以用作擴展現實（XR）系統200的移動手持終端410的前表面的透視圖400。移動手持終端410可以是成像系統200的示例。移動手持終端410可以是例如蜂巢式電話、衛星電話、可攜式遊戲控制台、音樂播放機、健康追蹤設備、可穿戴設備、無線通訊設備、膝上型電腦、行動設備、本文討論的任何其它類型的計算設備或計算系統，或其組合。4A is a perspective view 400 showing the front surface of a mobile handheld terminal 410 that includes a front-facing camera and may be used as an extended reality (XR) system 200. Mobile handheld terminal 410 may be an example of imaging system 200 . Mobile handheld terminal 410 may be, for example, a cellular phone, a satellite phone, a portable game console, a music player, a fitness tracking device, a wearable device, a wireless communication device, a laptop computer, a mobile device, or any other device discussed herein. type of computing device or computing system, or a combination thereof.

移動手持終端410的前表面420包括顯示器440。移動手持終端410的前表面420包括第一相機430A和第二相機430B。第一相機430A和第二相機430B可以是成像系統200的感測器205的示例。第一相機430A和第二相機430B可以面向使用者，包括使用者的眼睛，同時內容（例如，由媒體修改引擎235輸出的修改後的媒體）顯示在顯示器440上。顯示器440可以是成像系統200的輸出設備270的顯示器的示例。Front surface 420 of mobile handheld terminal 410 includes display 440 . The front surface 420 of the mobile handheld terminal 410 includes a first camera 430A and a second camera 430B. The first camera 430A and the second camera 430B may be examples of the sensor 205 of the imaging system 200 . The first camera 430A and the second camera 430B may face the user, including the user's eyes, while content (eg, modified media output by the media modification engine 235) is displayed on the display 440. Display 440 may be an example of a display of output device 270 of imaging system 200 .

第一相機430A和第二相機430B被示為在移動手持終端410的前表面420上的顯示器440周圍的邊框中。在一些示例中，第一相機430A和第二相機430B可以定位在從移動手持終端410的前表面420上的顯示器440切出的凹口或切口中。在一些示例中，第一相機430A和第二相機430B可以是定位在顯示器440與移動手持終端410的其餘部分之間的屏下相機，使得光在到達第一相機430A和第二相機430B之前通過顯示器440的一部分。透視圖400的第一相機430A和第二相機430B是前置相機。第一相機430A和第二相機430B面向垂直於移動手持終端410的前表面420的平面的方向。第一相機430A和第二相機430B可以是移動手持終端410的一或多個相機中的兩個相機。第一相機430A和第二相機430B可以分別是第一圖像感測器和第二圖像感測器。在一些示例中，移動手持終端410的前表面420可以僅具有單個相機。The first camera 430A and the second camera 430B are shown in a bezel surrounding the display 440 on the front surface 420 of the mobile handheld terminal 410 . In some examples, the first camera 430A and the second camera 430B may be positioned in notches or cuts cut out from the display 440 on the front surface 420 of the mobile handheld terminal 410 . In some examples, the first camera 430A and the second camera 430B may be under-screen cameras positioned between the display 440 and the remainder of the mobile handheld terminal 410 such that light passes through before reaching the first camera 430A and the second camera 430B. part of display 440. The first camera 430A and the second camera 430B of the perspective view 400 are front-facing cameras. The first camera 430A and the second camera 430B face a direction perpendicular to the plane of the front surface 420 of the mobile handheld terminal 410 . The first camera 430A and the second camera 430B may be two cameras of one or more cameras of the mobile handheld terminal 410 . The first camera 430A and the second camera 430B may be a first image sensor and a second image sensor, respectively. In some examples, the front surface 420 of the mobile handheld terminal 410 may have only a single camera.

在一些示例中，除了第一相機430A和第二相機430B之外，移動手持終端410的前表面420還可以包括一或多個附加相機。一或多個附加相機也可以是成像系統200的感測器205的示例。在一些示例中，除了第一相機430A和第二相機430B之外，移動手持終端410的前表面420還可以包括一或多個附加感測器。一或多個附加感測器也可以是成像系統200的感測器205的示例。在一些情況下，移動手持終端410的前表面420包括一個以上的顯示器440。移動手持終端410的前表面420的一或多個顯示器440可以是成像系統200的輸出設備270的顯示器的示例。例如，一或多個顯示器440可以包括一或多個觸控式螢幕顯示器。In some examples, in addition to the first camera 430A and the second camera 430B, the front surface 420 of the mobile handheld terminal 410 may include one or more additional cameras. One or more additional cameras may also be examples of sensors 205 of imaging system 200 . In some examples, in addition to the first camera 430A and the second camera 430B, the front surface 420 of the mobile handheld terminal 410 may include one or more additional sensors. One or more additional sensors may also be examples of sensor 205 of imaging system 200 . In some cases, front surface 420 of mobile handheld terminal 410 includes more than one display 440. One or more displays 440 of the front surface 420 of the mobile handheld terminal 410 may be examples of displays of the output device 270 of the imaging system 200 . For example, one or more displays 440 may include one or more touch screen displays.

移動手持終端410可以包括一或多個揚聲器435A及/或其它音訊輸出設備（例如，入耳式耳機或頭戴式耳機或其連接器），其可以將音訊輸出到移動手持終端410的使用者的一隻或多隻耳朵。圖4A中圖示一個揚聲器435A，但是應當理解，移動手持終端410可以包括一個以上的揚聲器及/或其它音訊設備。在一些示例中，移動手持終端410還可以包括一或多個麥克風（未圖示）。一或多個麥克風可以是成像系統200的感測器205及/或一或多個感測器205的示例。在一些示例中，移動手持終端410可以包括沿著及/或鄰近移動手持終端410的前表面420的一或多個麥克風，這些麥克風是成像系統200的感測器205的示例。在一些示例中，移動手持終端410經由一或多個揚聲器435A及/或其它音訊輸出設備向使用者輸出的音訊可以包括或基於使用一或多個麥克風記錄的音訊。Mobile handheld terminal 410 may include one or more speakers 435A and/or other audio output devices (e.g., earphones or headphones or connectors thereof) that may output audio to a user of mobile handheld terminal 410 One or more ears. One speaker 435A is illustrated in Figure 4A, but it should be understood that the mobile handheld terminal 410 may include more than one speaker and/or other audio devices. In some examples, mobile handheld terminal 410 may also include one or more microphones (not shown). The one or more microphones may be the sensor 205 of the imaging system 200 and/or an example of the one or more sensors 205 . In some examples, mobile handheld terminal 410 may include one or more microphones along and/or adjacent front surface 420 of mobile handheld terminal 410 , which microphones are examples of sensors 205 of imaging system 200 . In some examples, audio output by mobile handset 410 to the user via one or more speakers 435A and/or other audio output devices may include or be based on audio recorded using one or more microphones.

圖4B是示出包括後置相機並且可以用作擴展現實（XR）系統200的移動手持終端的後表面460的透視圖450。移動手持終端410包括在移動手持終端410的後表面460上的第三相機430C和第四相機430D。透視圖450的第三相機430C和第四相機430D是後置的。第三相機430C和第四相機430D可以是圖2的成像系統200的感測器205的示例。第三相機430C和第四相機430D面向垂直於移動手持終端410的後表面460的平面的方向。FIG. 4B is a perspective view 450 illustrating a rear surface 460 of a mobile handheld terminal that includes a rear camera and may be used as an extended reality (XR) system 200 . The mobile handheld terminal 410 includes third and fourth cameras 430C and 430D on the rear surface 460 of the mobile handheld terminal 410 . The third camera 430C and the fourth camera 430D of the perspective view 450 are rear-facing. The third camera 430C and the fourth camera 430D may be examples of the sensor 205 of the imaging system 200 of FIG. 2 . The third camera 430C and the fourth camera 430D face a direction perpendicular to the plane of the rear surface 460 of the mobile handheld terminal 410 .

第三相機430C和第四相機430D可以是移動手持終端410的一或多個相機中的兩個相機。在一些示例中，移動手持終端410的後表面460可以僅具有單個相機。在一些示例中，除了第三相機430C和第四相機430D之外，移動手持終端410的後表面460還可以包括一或多個附加相機。一或多個附加相機也可以是成像系統200的感測器205的示例。在一些示例中，除了第三相機430C和第四相機430D之外，移動手持終端410的後表面460還可以包括一或多個附加感測器。一或多個附加感測器也可以是成像系統200的感測器205的示例。在一些示例中，第一相機430A、第二相機430B、第三相機430C及/或第四相機430D可以是圖像擷取和處理系統100、圖像擷取裝置105A、影像處理設備105B或其組合的示例。The third camera 430C and the fourth camera 430D may be two cameras of one or more cameras of the mobile handheld terminal 410 . In some examples, the rear surface 460 of the mobile handheld terminal 410 may have only a single camera. In some examples, in addition to the third camera 430C and the fourth camera 430D, the rear surface 460 of the mobile handheld terminal 410 may include one or more additional cameras. One or more additional cameras may also be examples of sensors 205 of imaging system 200 . In some examples, in addition to the third camera 430C and the fourth camera 430D, the rear surface 460 of the mobile handheld terminal 410 may include one or more additional sensors. One or more additional sensors may also be examples of sensor 205 of imaging system 200 . In some examples, the first camera 430A, the second camera 430B, the third camera 430C and/or the fourth camera 430D may be the image capture and processing system 100, the image capture device 105A, the image processing device 105B, or the like. Example of combination.

移動手持終端410可以包括一或多個揚聲器435B及/或其它音訊輸出設備（例如，入耳式耳機或頭戴式耳機或其連接器），其可以將音訊輸出到移動手持終端410的使用者的一隻或多隻耳朵。一或多個揚聲器435B可以是成像系統200的輸出設備270的示例。圖4B中圖示一個揚聲器435B，但是應當理解，移動手持終端410可以包括一個以上的揚聲器及/或其它音訊設備。在一些示例中，移動手持終端410還可以包括一或多個麥克風（未圖示）。一或多個麥克風可以是成像系統200的感測器205及/或一或多個感測器205的示例。在一些示例中，移動手持終端410可以包括沿著及/或鄰近移動手持終端410的後表面460的一或多個麥克風，這些麥克風是成像系統200的感測器205的示例。在一些示例中，移動手持終端410經由一或多個揚聲器435B及/或其它音訊輸出設備向使用者輸出的音訊可以包括或基於使用一或多個麥克風記錄的音訊。Mobile handheld terminal 410 may include one or more speakers 435B and/or other audio output devices (e.g., earphones or headphones or connectors thereof) that may output audio to a user of mobile handheld terminal 410 One or more ears. One or more speakers 435B may be an example of output device 270 of imaging system 200 . One speaker 435B is illustrated in Figure 4B, but it should be understood that the mobile handheld terminal 410 may include more than one speaker and/or other audio devices. In some examples, mobile handheld terminal 410 may also include one or more microphones (not shown). The one or more microphones may be the sensor 205 of the imaging system 200 and/or an example of the one or more sensors 205 . In some examples, mobile handheld terminal 410 may include one or more microphones along and/or adjacent rear surface 460 of mobile handheld terminal 410 , which microphones are examples of sensors 205 of imaging system 200 . In some examples, audio output by mobile handset 410 to the user via one or more speakers 435B and/or other audio output devices may include or be based on audio recorded using one or more microphones.

移動手持終端410可以使用前表面420上的顯示器440作為直通顯示器。例如，顯示器440可以顯示輸出圖像。輸出圖像可以基於由第三相機430C及/或第四相機430D擷取的圖像，該圖像例如具有疊加的虛擬內容及/或應用了媒體修改引擎235的修改。第一相機430A及/或第二相機430B可以在顯示器440上顯示具有虛擬內容的輸出圖像之前、期間及/或之後擷取使用者眼睛（及/或使用者的其它部位）的圖像。這樣，來自第一相機430A及/或第二相機430B的感測器資料可以擷取使用者眼睛（及/或使用者的其它部位）對虛擬內容的反應。Mobile handheld terminal 410 may use display 440 on front surface 420 as a pass-through display. For example, display 440 may display an output image. The output image may be based on an image captured by the third camera 430C and/or the fourth camera 430D, for example, with overlaid virtual content and/or modifications applied by the media modification engine 235 . The first camera 430A and/or the second camera 430B may capture images of the user's eyes (and/or other parts of the user) before, during, and/or after displaying the output image with virtual content on the display 440 . In this way, the sensor data from the first camera 430A and/or the second camera 430B can capture the reaction of the user's eyes (and/or other parts of the user) to the virtual content.

圖5是示出網格反演的示例的概念圖。對網格反演的輸入包括第一組運動向量，其使用圖5中從第一圖像Img1 510到第二圖像Img2 515的黑色實線箭頭示為運動向量（MV）網格。對於每個圖元（或圖元組），運動向量網格使用運動向量（MV）網格505中的運動向量指示該圖元（或圖元組）將在環境的第一圖像Img1 510（例如，視覺或深度）與環境的第二圖像Img2 515（例如，視覺或深度）之間移動的距離。運動向量網格505可以稱為圖像的運動向量圖。運動向量網格505的運動向量可以使用運動向量引擎220（例如使用光流）來決定。FIG. 5 is a conceptual diagram showing an example of grid inversion. The input to the grid inversion includes a first set of motion vectors, represented as a motion vector (MV) grid using the solid black arrows from the first image Img1 510 to the second image Img2 515 in Figure 5 . For each primitive (or group of primitives), the motion vector grid uses the motion vector in the motion vector (MV) grid 505 to indicate that the primitive (or group of primitives) will be in the first image of the environment Img1 510 ( The distance moved between, for example, vision or depth) and the second image Img2 515 of the environment (e.g., vision or depth). The motion vector grid 505 may be referred to as the motion vector map of the image. The motion vectors of motion vector grid 505 may be determined using motion vector engine 220 (eg, using optical flow).

網格反演引擎225可以執行網格反演，其改變第一組運動向量（運動向量網格505）中的運動向量的特性（例如，方向、原點、位置、長度及/或大小）以產生第二組運動向量（反演MV網格520）。第二組運動向量（反演MV網格520）中的運動向量示出來自Img2 515的每個圖元可以如何移動回到圖1510，而不是指示來自Img1 510的每個圖元如何移動到Img2 515（如在MV網格505中）。第二組運動向量（反演MV網格520）中的運動向量在圖5中使用從第二圖像Img2 515到第一圖像Img1 510的黑色虛線箭頭示出。The grid inversion engine 225 may perform a grid inversion that changes the characteristics (eg, direction, origin, position, length, and/or size) of the motion vectors in the first set of motion vectors (the motion vector grid 505 ) to A second set of motion vectors is generated (inversion MV grid 520). The motion vectors in the second set of motion vectors (inversion MV grid 520 ) show how each primitive from Img2 515 can move back to graph 1510 , rather than how each primitive from Img1 510 moves to Img2 515 (as in MV grid 505). The motion vectors in the second set of motion vectors (inversion MV grid 520) are shown in Figure 5 using the black dashed arrow from the second image Img2 515 to the first image Img1 510.

圖5中的各種黑色圖示表示環境中的以兩個圖像Img1 510和Img2 515圖示的各種元素。例如，元素包括房子、鳥、人、汽車和樹。根據MV網格505，房子和樹沒有從Img1 510移動到Img2 515，由MV網格505中的零表示。同樣，在反演MV網格520中，房子和樹不會從Img2 515移動到Img1 510。房子在MV網格505和反演MV網格520中由零表示，該兩個網格位於房子所在的儲存格0。樹可以在樹所在的儲存格8處的反演MV網格中由零表示，但是如下文討論與汽車存在衝突，由黑色圓圈表示。鳥從Img1 510向右移動1個網格儲存格到達Img2 515（從儲存格1移動到儲存格2），由MV網格505中的儲存格1處的1表示。鳥從Img2 515向左移動1個網格儲存格到達Img1 510（從儲存格2移動到儲存格1），由反演MV網格520中的儲存格2處的-1表示。這些值不僅從MV網格505反演（乘以-1）到反演MV網格520，而且還從與元素在Img1 510中的舊位置相對應的單元移動到與元素在Img2 515中的新位置相對應的單元。儲存格1中的黑星（其中鳥在Img1 510中，但是在Img2 515中缺失）指示在反演MV網格520中，與儲存格1相對應的圖像的區域缺失並且可能需要被填充（例如，用插補及/或修復填充）。人從Img1 510向左移動2個網格儲存格到達Img2 515（從儲存格6移動到儲存格4），由MV網格505中的儲存格6處的-2表示。人從Img2 515向右移動2個網格儲存格到達Img1 510（從儲存格4移動到儲存格6），由反演MV網格520中的儲存格4處的2表示。儲存格6中的黑星（其中人員在Img1 510中，但是在Img2 515中缺失）指示在反演MV網格520中，與儲存格1相對應的圖像的區域缺失並且可能需要被填充（例如，用插補及/或修復填充）。汽車從Img1 510向右移動1個網格儲存格到達Img2 515（從儲存格7移動到儲存格8），由MV網格505中的1表示。汽車將從Img2 515向左移動1個網格儲存格到達Img1 510（從儲存格8移動到儲存格7），這可以由反演MV網格520中的-1表示。然而，汽車和樹在Img2 515中位於同一網格儲存格（儲存格8）中，因此紅色圓圈指示該儲存格在反演MV網格520中的衝突值（例如，0表示樹，-1表示汽車）。The various black illustrations in Figure 5 represent various elements in the environment illustrated by the two images Img1 510 and Img2 515. For example, elements include houses, birds, people, cars, and trees. According to MV grid 505, the house and tree did not move from Img1 510 to Img2 515, indicated by the zeros in MV grid 505. Likewise, in the inversion MV grid 520, the house and tree do not move from Img2 515 to Img1 510. The house is represented by zeros in the MV grid 505 and the inverted MV grid 520, which are located in cell 0 where the house is located. The tree can be represented by zeros in the inversion MV grid at cell 8 where the tree resides, but as discussed below conflicts with the car, represented by a black circle. The bird moves 1 grid cell to the right from Img1 510 to Img2 515 (moving from cell 1 to cell 2), represented by the 1 at cell 1 in MV grid 505. The bird moves 1 grid cell left from Img2 515 to Img1 510 (moving from cell 2 to cell 1), represented by the -1 at cell 2 in the inverted MV grid 520. Not only are the values inverted (multiplied by -1) from the MV grid 505 to the inverted MV grid 520, but they are also moved from the cell corresponding to the element's old position in Img1 510 to the element's new position in Img2 515 The unit corresponding to the location. The black star in bin 1 (where the bird is in Img1 510 but is missing in Img2 515) indicates that in the inversion MV grid 520, the area of the image corresponding to bin 1 is missing and may need to be filled ( For example, using interpolation and/or repair filling). The person moves 2 grid cells left from Img1 510 to Img2 515 (moving from cell 6 to cell 4), represented by the -2 at cell 6 in MV grid 505. The person moves 2 grid cells to the right from Img2 515 to Img1 510 (moving from cell 4 to cell 6), represented by the 2 at cell 4 in the inverted MV grid 520. The black star in cell 6 (where the person is in Img1 510 but is missing in Img2 515) indicates that in the inversion MV grid 520, the area of the image corresponding to cell 1 is missing and may need to be filled ( For example, using interpolation and/or repair filling). The car moves 1 grid cell to the right from Img1 510 to Img2 515 (from cell 7 to cell 8), represented by the 1 in MV grid 505. The car will move 1 grid cell left from Img2 515 to Img1 510 (moving from cell 8 to cell 7), which can be represented by a -1 in the inverted MV grid 520. However, the car and the tree are in the same grid cell (cell 8) in Img2 515, so the red circle indicates the conflicting value of that cell in the inversion MV grid 520 (e.g. 0 for tree, -1 for car ).

圖6是示出基於深度的重投影的示例的概念圖600。基於深度的重投影由圖像重投影引擎215執行。該示例示出環境（稱為世界場景605）的相機圖像610，該圖像具有桌子，桌子上具有工具箱，並且桌子周圍有一些椅子。圖像重投影引擎215使用（例如，世界場景605的）環境的深度資料620來重投影相機圖像610以產生重投影圖像615。重投影的圖像615圖示了與相機圖像610相同的環境（例如，世界場景605），但是被重投影成如同環境是從重投影的圖像615中的與相機圖像610不同的視角或視點擷取的。在圖6中所示的示例中，重投影的圖像615看表現為從環境的視角或視點擷取的，該視角或視點平移到在相機圖像610中圖示的環境的視角或視點的左側。在一些示例中，圖像重投影引擎215可以使用由網格反演引擎225例如基於深度資料620產生的反演MV網格（例如，反演MV網格520）來執行圖像重投影。Figure 6 is a conceptual diagram 600 illustrating an example of depth-based reprojection. Depth-based reprojection is performed by image reprojection engine 215. The example shows a camera image 610 of an environment (called world scene 605) that has a table with a toolbox on the table and some chairs around the table. Image reprojection engine 215 reprojects camera image 610 using depth information 620 of the environment (eg, world scene 605 ) to produce reprojected image 615 . Reprojected image 615 illustrates the same environment (eg, world scene 605 ) as camera image 610 , but is reprojected as if the environment were from a different perspective than camera image 610 in reprojected image 615 or Viewpoint captured. In the example shown in FIG. 6 , reprojected image 615 appears to be captured from a perspective or viewpoint of the environment that is translated to that of the environment illustrated in camera image 610 left side. In some examples, image reprojection engine 215 may perform image reprojection using an inverted MV grid (eg, inverted MV grid 520 ) generated by grid inversion engine 225 , such as based on depth profile 620 .

圖7是示出由時間扭曲引擎230執行的時間扭曲705的示例的概念圖700。在左側，大的或密集的運動向量圖720被示為黑色實線箭頭，其圖示圖元如何在圖像圖框n與圖像圖框n-4之間移動。圖像圖框n和n-4被示為高豎直線。時間扭曲705在大的或密集的運動向量圖720上使用網格反演（使用網格反演引擎225）來建立較小的運動向量圖，被示為例如從圖像圖框n到圖像圖框n-1、從圖像圖框n-1到圖像圖框n-2、從圖像圖框n-2到圖像圖框n-3以及從圖像圖框n-3到圖像圖框n-4的較短豎直箭頭。FIG. 7 is a conceptual diagram 700 illustrating an example of time warping 705 performed by time warping engine 230. On the left, a large or dense motion vector map 720 is shown as a solid black arrow, illustrating how primitives move between image frame n and image frame n-4. Image frames n and n-4 are shown as tall vertical lines. Time warp 705 uses grid inversion (using grid inversion engine 225) on a large or dense motion vector map 720 to build a smaller motion vector map, shown e.g. from image frame n to image Frame n-1, from image frame n-1 to image frame n-2, from image frame n-2 to image frame n-3 and from image frame n-3 to image Shorter vertical arrow like frame n-4.

為了建立較小的向量圖，時間扭曲引擎230使用重採樣。例如，為了產生減小的向量圖，時間扭曲引擎230例如藉由將運動向量圖中的值（表示元素在圖框n與圖框n-4之間移動的距離）乘以¼而使該值更小。另外，類似於圖5的網格反演中的值移動，時間扭曲引擎230將值移動到每個元素在對應圖框中的新位置。To create smaller vector maps, the time warp engine 230 uses resampling. For example, to generate a reduced vector map, the time warp engine 230 multiplies a value in the motion vector map (representing the distance an element moves between frame n and frame n-4) by ¼, for example. smaller. Additionally, similar to the value movement in the grid inversion of Figure 5, the time warp engine 230 moves the values to each element's new position in the corresponding plot frame.

例如如果僅每k圖框執行光流，則時間扭曲705可以用於在現有運動向量圖之間插補運動向量圖。光流是可能要使用大量功率來執行的計算成本高昂的操作，而此處演示的時間扭曲705是成本較低且功耗較低的操作。因此，可以節制地使用光流來減少計算成本和功率使用，並且時間扭曲705仍然可以允許成像系統200獲得用於任何兩個相鄰圖框之間（並且在一些情況下，任何兩個圖框之間）的每個圖框轉變的運動向量。For example, if optical flow is only performed every k frames, time warping 705 can be used to interpolate motion vector maps between existing motion vector maps. Optical flow is a computationally expensive operation that can use a lot of power to perform, whereas the time warp 705 demonstrated here is a less expensive and lower power operation. Thus, optical flow can be used sparingly to reduce computational cost and power usage, and time warping 705 can still allow imaging system 200 to obtain information for use between any two adjacent frames (and in some cases, any two frames). motion vector for each frame transition between).

在一些示例中，由時間扭曲705產生的較小運動向量圖可以用於在視訊的現有圖框之間插補附加圖框，例如將視訊的圖框率從第一圖框率增加到高於第一圖框率的第二圖框率。In some examples, the smaller motion vector maps produced by time warping 705 can be used to interpolate additional frames between existing frames of the video, such as increasing the frame rate of the video from a first frame rate to a higher The second frame rate of the first frame rate.

在一些示例中，由時間扭曲705產生的較小的運動向量圖可以用於提高視訊的某些圖框的品質。例如，如果特定的視訊圖框是模糊的，包括大量壓縮偽影，包括使圖像難以清楚地看到圖示的場景的壓縮偽影，或者以其它方式遭受低品質，則時間扭曲705可以提高這種視訊圖框的品質。時間扭曲705可以用於決定來自視訊的一或多個相鄰或附近圖框的運動向量圖，並且來自彼等圖框的圖像資料可以用於產生修改後的圖像以替換所討論的特定圖框，以便提高所討論的特定圖框的圖像品質。概念圖700圖示男孩的圖像的兩個實例-左側的第一圖像710未應用時間扭曲705，而右側的第二圖像715應用了時間扭曲705，從而與第一圖像710相比提高了第二圖像715中的對男孩的圖示的清晰度。使用時間扭曲705改進的右側圖像715表現為比左側圖像710更銳化和更清晰，尤其是在對男孩的圖示中的各個邊緣處和附近更銳化和更清晰，如使用實線表示圖像715中的對男孩的圖示的各個線和邊緣所指示的。另外，在一些示例中，諸如頭髮圖案、織物圖案、另一種圖案、文字、徽標及/或其它設計的圖案可以在應用了時間扭曲705的圖像（例如，如右側圖像715）中與在未應用時間扭曲705的圖像（例如，左側圖像710）中相比表現為更清晰銳化。In some examples, the smaller motion vector maps generated by time warping 705 can be used to improve the quality of certain frames of the video. For example, if a particular video frame is blurry, includes extensive compression artifacts, including compression artifacts that make it difficult to clearly see the scene illustrated in the image, or otherwise suffers from low quality, time warping 705 can improve The quality of this video frame. Time warping 705 can be used to determine a motion vector map from one or more adjacent or nearby frames of the video, and the image data from those frames can be used to generate a modified image to replace the specific frame in question. frame to improve the image quality of the specific frame in question. Conceptual diagram 700 illustrates two examples of images of a boy - a first image 710 on the left with no time warp 705 applied, and a second image 715 on the right with time warp 705 applied, thus comparing to the first image 710 The clarity of the illustration of the boy in the second image 715 has been improved. The right image 715 improved using time warp 705 appears sharper and clearer than the left image 710, especially at and near various edges in the illustration of the boy, as shown by the use of solid lines As indicated by the various lines and edges representing the illustration of the boy in image 715 . Additionally, in some examples, patterns such as hair patterns, fabric patterns, another pattern, text, logos, and/or other designs may appear in the image to which time warp 705 is applied (eg, as in image 715 on the right). Images without time warp 705 (eg, image 710 on the left) appear sharper than in images without time warp 705 applied.

圖23和圖29中圖示時間扭曲705和使用時間扭曲705的圖像改進的附加示例。Additional examples of time warping 705 and image improvements using time warping 705 are illustrated in FIGS. 23 and 29 .

圖8是示出由深度感測器支援引擎235執行的深度感測器支持805的示例的概念圖800。成像系統200上的感測器205的群集被示出為包括一組圖像感測器810和一組深度感測器815，其可以包括飛行時間（ToF）感測器。在一些情況下，在影像處理中，一起使用來自圖像感測器810的圖像資料和來自深度感測器815的深度資料可能是有用的，例如以產生散景、模擬景深模糊、物件辨識等。然而，圖像感測器810和深度感測器815並未同位。相反，圖像感測器810和深度感測器815彼此偏移達偏移820。因此，使用來自圖像感測器810的圖像資料和來自深度感測器815的深度資料可能會產生視差問題，這是由於偏移820引起的視角輕微失配。因此，深度資料中的深度可能與在圖像資料中圖示的物件失配。對於環境中靠近感測器的物件，這種失配可能尤其明顯，這些物件可能出現在圖像資料與深度資料中相當不同的位置。更遠的物件在圖像資料和深度資料中可能看起來更相似。8 is a conceptual diagram 800 illustrating an example of depth sensor support 805 performed by the depth sensor support engine 235. The cluster of sensors 205 on the imaging system 200 is shown to include a set of image sensors 810 and a set of depth sensors 815, which may include time-of-flight (ToF) sensors. In some cases, it may be useful to use image data from image sensor 810 and depth data from depth sensor 815 together in image processing, for example to generate bokeh, simulate depth blur, object recognition wait. However, the image sensor 810 and the depth sensor 815 are not co-located. Instead, image sensor 810 and depth sensor 815 are offset from each other by offset 820. Therefore, using image data from image sensor 810 and depth data from depth sensor 815 may create parallax issues due to a slight mismatch in viewing angles caused by offset 820 . Therefore, the depth in the depth data may not match the object depicted in the image data. This mismatch can be especially pronounced for objects in the environment that are close to the sensor, which can appear in significantly different locations in the image data than in the depth data. Objects that are further away may appear more similar in image data and depth data.

為了校正這種失配，在一些示例中，圖像重投影引擎215可以重投影來自深度感測器815的深度資料以表現為來自圖像感測器810的視角。在一些示例中，圖像重投影引擎215可以重投影來自圖像感測器810的圖像資料以表現為來自深度感測器815的視角。因為圖像重投影引擎215執行重投影可能需要深度資料，所以圖像重投影引擎215可以依賴於圖像感測器810與深度感測器815之間的外部校準以獲得適當的深度資料。To correct for this mismatch, in some examples, image reprojection engine 215 may reproject depth data from depth sensor 815 to represent the perspective from image sensor 810 . In some examples, image reprojection engine 215 may reproject image data from image sensor 810 to represent the perspective from depth sensor 815 . Because image reprojection engine 215 may require depth information to perform reprojection, image reprojection engine 215 may rely on external calibration between image sensor 810 and depth sensor 815 to obtain appropriate depth information.

圖9是示出由3D穩定化引擎240執行的3D穩定化905的示例的概念圖900。傳統的穩定技術可以補償旋轉運動，但是大體不能補償真實世界中的平移（例如，視差）移動。基於環境的深度資料使用圖像重投影引擎215進行圖像重投影可以提供真正的3D穩定化905，其校正視差移動，包括平移移動、旋轉移動或兩者。對於使用感測器205擷取的視訊的每個視訊圖框，包括圖9中標記為原始（「orig」）的四個視訊圖框，使用圖像重投影引擎215執行重投影以產生原始視訊圖框的穩定變體（「穩定的」）。重投影所得的重投影的視訊圖框使得它們相應的視角全部都適應表示虛擬穩定移動路徑的線，而沒有垂直於該線的任何視差移動或圍繞與該線的相對應軸線（或任何其它軸線）的任何旋轉。該線可以彎曲以表示彎曲移動路徑，但是不具有與此類視差移動或旋轉相對應的任何鋸齒狀邊緣。FIG. 9 is a conceptual diagram 900 illustrating an example of 3D stabilization 905 performed by the 3D stabilization engine 240. Traditional stabilization techniques can compensate for rotational motion, but generally cannot compensate for real-world translational (e.g., parallax) motion. Image reprojection of environment-based depth data using an image reprojection engine 215 can provide true 3D stabilization 905 that corrects for parallax movement, including translational movement, rotational movement, or both. For each video frame of the video captured using the sensor 205 , including the four video frames labeled original (“orig”) in FIG. 9 , reprojection is performed using the image reprojection engine 215 to generate the original video. A stable variant of the frame ("stable"). The resulting reprojected video frames have their corresponding viewing angles all adapted to a line representing a virtual stable movement path, without any parallax movement perpendicular to the line or around the axis corresponding to the line (or any other axis ) any rotation. The line can be curved to represent a curved movement path, but does not have any jagged edges corresponding to such parallax movement or rotation.

對於圖示的3D穩定化905，由視訊圖框所示的輸入視訊在不同方向上擺動-向上平移、向下平移、向左平移、向右平移、向前平移、向後平移及/或旋轉（例如，俯仰、偏航、及/或橫滾）。因為圖像重投影引擎215重投影圖像以改變環境的視角，所以擺動中的所有這些移動都藉由使用圖像重投影引擎215的重投影來穩定化。For the illustrated 3D stabilization 905, the input video represented by the video frame swings in different directions - translation up, translation down, translation left, translation right, translation forward, translation back and/or rotation ( For example, pitch, yaw, and/or roll). Because the image reprojection engine 215 reprojects the image to change the perspective of the environment, all of these movements in the swing are stabilized by reprojection using the image reprojection engine 215.

在一些情況下，空白區域可能出現在穩定框架中，例如在框架的邊緣處及/或在框架中的人的周圍（例如，在圖9的右下角的第四個穩定框架中的女人的右側）。這些可以表示在原始圖像中沒有對應資料的遮擋區域。這些遮擋區域可以由圖像重投影引擎215填充，例如使用插補及/或修復（例如，基於深度學習的修復）來填充。3D穩定化905的附加示例3205在圖30中示出。在一些示例中，這些空白區域可能表現為黑色。在一些示例中，這些空白區域可能表現為白色。在圖9中，這些空白區域以白色示出。In some cases, white space may appear in the stable frame, such as at the edges of the frame and/or around the person in the frame (e.g., to the right of the woman in the fourth stable frame in the lower right corner of Figure 9 ). These can represent occluded areas that have no corresponding material in the original image. These occluded areas may be filled by the image reprojection engine 215, such as using interpolation and/or inpainting (eg, deep learning based inpainting). An additional example 3205 of 3D stabilization 905 is shown in Figure 30. In some examples, these empty areas may appear black. In some examples, these empty areas may appear white. In Figure 9, these blank areas are shown in white.

在一些示例中，對於3D穩定化以及對於圖像重投影引擎215的某些其它應用，將遠處的圖元視為好像它們處於無窮遠的距離從而使此類圖元的位置在重投影下不變可能是有用的。在一些示例中，圖像重投影引擎215可以使用平移衰減來將平移值平滑地過渡到表示無窮遠的值，以將遠處的圖元視為好像它們處於無窮遠的距離。In some examples, for 3D stabilization, and for certain other applications of image reprojection engine 215, distant primitives are treated as if they were at an infinite distance such that the location of such primitives is under reprojection. Immutability may be useful. In some examples, image reprojection engine 215 may use translation attenuation to smoothly transition translation values to values representing infinity to treat distant primitives as if they were at infinite distance.

圖10是示出由3D變焦引擎245執行的3D變焦1005（也稱為電影變焦）的示例的概念圖1000。由3D變焦引擎245執行的3D變焦1005可以包括放大圖像（例如，使圖像的某些部分變大，同時去除圖像的其它部分）、在不同方向上移動虛擬相機（例如，平移、旋轉等）及/或其它類型的變焦。在一些情況下，為了對圖像執行數位變焦，整個圖像傳統上被放大和裁剪，如圖10中標記為數位變焦（「dig.zm.」）的四個圖像的序列所示。這些圖像圖示房子前面的滑板運動員。執行數字變焦（或者在一些示例中甚至是使用光學變焦鏡頭或相機及/或鏡頭之間的切換進行的光學變焦）丟失房子的視野的很大一部分。然而，如果將相機靠近滑板運動員，則房子的視野的丟失不會像使用數碼變焦發生丟失那樣多。這是因為滑板運動員比房子更靠近相機。換句話說，滑板運動員在前景中，而房子在背景中。FIG. 10 is a conceptual diagram 1000 illustrating an example of 3D zoom 1005 (also referred to as movie zoom) performed by the 3D zoom engine 245. 3D zoom 1005 performed by 3D zoom engine 245 may include enlarging the image (e.g., making certain portions of the image larger while removing other portions of the image), moving the virtual camera in different directions (e.g., panning, rotating etc.) and/or other types of zoom. In some cases, to perform digital zoom on an image, the entire image is traditionally enlarged and cropped, as shown in the sequence of four images labeled Digital Zoom ("dig.zm.") in Figure 10. These images illustrate a skateboarder in front of a house. Performing a digital zoom (or in some examples even an optical zoom using an optical zoom lens or switching between cameras and/or lenses) loses a large portion of the view of the house. However, if you move the camera closer to the skateboarder, the view of the house is not lost as much as it would be if you used digital zoom. This is because the skateboarder is closer to the camera than the house. In other words, the skateboarder is in the foreground and the house is in the background.

3D變焦1005或基於深度的變焦或電影變焦使用利用圖像重投影引擎215基於用於環境的深度資料1020進行的圖像重投影來模擬相機在環境中的向前移動，在這種情況下更靠近滑板運動員的移動。如圖10中標記為基於深度的變焦（「depth.zm.」）的四個圖像的序列所示，滑板運動員的身材增加與數字變焦一樣多，但是房子的景深丟失較少。例如，在序列中的四個圖像中的最後一個圖像中，房子的四扇窗戶的跨度在數位元變焦下至少部分地在圖框中，而房子的六扇窗戶的跨度在基於3D深度的變焦下至少部分地在圖框中（儘管這些窗戶中的一者完全位於滑板運動員的後面）。因此，基於3D深度的變焦（或電影變焦）最大限度地減少視野丟失，尤其是背景元素的丟失。3D變焦1005（或基於深度的變焦或電影變焦）的附加示例在圖31中示出。3D zoom 1005 or depth-based zoom or movie zoom uses image reprojection based on depth information 1020 for the environment using an image reprojection engine 215 to simulate forward movement of the camera in the environment, in this case more Move closer to the skateboarder. As shown in the sequence of four images labeled Depth-based Zoom ("depth.zm.") in Figure 10, the skateboarder's size increases as much as digital zoom, but the house's depth of field is less lost. For example, in the last of the four images in the sequence, the span of the house's four windows is at least partially within the frame under digital zoom, while the span of the house's six windows is in the frame based on 3D depth. The zoom down is at least partially in the frame (although one of those windows is entirely behind the skateboarder). Therefore, 3D depth-based zoom (or movie zoom) minimizes the loss of field of view, especially the loss of background elements. An additional example of 3D zoom 1005 (or depth-based zoom or movie zoom) is shown in Figure 31.

圖11是示出由重投影SAT引擎250執行的重投影1105的示例的概念圖1100。成像系統200的感測器205的群集在圖11中示出，具有長焦感測器1110、廣角感測器1115和另一個感測器1125。在一些情況下，成像系統200可以在長焦感測器1110與廣角感測器1115之間切換，例如以提供環境的圖像的不同級別的變焦。然而，類似於圖8的圖像感測器810和深度感測器815的場景，長焦感測器1110和廣角感測器1115並未同位。相反，長焦感測器1110與廣角感測器1115之間存在偏移1120。因此，長焦感測器1110與廣角感測器1115之間的切換產生視差效果。例如，圖示使用長焦感測器1110擷取的長焦圖像1130（標記為「tele」），而圖示使用廣角感測器1115擷取並裁剪以與長焦的視野匹配（即，在過渡到長焦感測器之前數位地變焦）的廣角圖像1135（標記為「wide」）。兩個圖像都圖示了遙遠背景前方的男人。在長焦圖像1130中，男人出現在廣角圖像1135中的男人的位置的稍微右側。FIG. 11 is a conceptual diagram 1100 illustrating an example of reprojection 1105 performed by the reprojection SAT engine 250. The cluster of sensors 205 of imaging system 200 is shown in Figure 11, with telephoto sensor 1110, wide angle sensor 1115, and another sensor 1125. In some cases, imaging system 200 may switch between telephoto sensor 1110 and wide-angle sensor 1115, for example, to provide different levels of zoom in images of the environment. However, similar to the image sensor 810 and depth sensor 815 scene in FIG. 8 , the telephoto sensor 1110 and the wide-angle sensor 1115 are not co-located. In contrast, there is an offset 1120 between the telephoto sensor 1110 and the wide-angle sensor 1115 . Therefore, switching between the telephoto sensor 1110 and the wide-angle sensor 1115 creates a parallax effect. For example, the illustration shows a telephoto image 1130 (labeled "tele") captured using the telephoto sensor 1110, while the illustration is captured using the wide-angle sensor 1115 and cropped to match the telephoto field of view (i.e., Digitally zooming before transitioning to the telephoto sensor) wide-angle image 1135 (labeled "wide"). Both images illustrate a man in front of a distant background. In the telephoto image 1130 , the man appears slightly to the right of the position of the man in the wide-angle image 1135 .

類似於圖8的深度感測器支援805，重投影SAT引擎250可以執行重投影1105以基於深度資料1160校正偏移1120。例如，重投影SAT引擎250可以執行重投影1105以修改長焦圖像以修改視角，使得修改後的長焦圖像1140（標記為「modif.tele」）表現為從廣角感測器1115的視角擷取的（例如，如在廣角圖像1135中）而不是從長焦感測器1110的視角擷取的（例如，如在長焦圖像1130中）。在修改後的長焦圖像1140中，男人出現在未修改的長焦圖像1130中的男人的位置的稍微左側。在修改後的長焦圖像1140中，男人表現為類似地位於廣角圖像1135中的男人的位置。黑色陰影出現在修改後的長焦圖像1140中的男人的右側，這是由圖示男人的圖像資料相對於背景的視差移動引起的。黑色陰影表示可以用圖像資料（例如使用如進一步討論的插補及/或修復）填充的「孔洞」。Similar to the depth sensor support 805 of FIG. 8 , the reprojection SAT engine 250 can perform reprojection 1105 to correct the offset 1120 based on the depth data 1160 . For example, the reprojection SAT engine 250 may perform reprojection 1105 to modify the telephoto image to modify the viewing angle such that the modified telephoto image 1140 (labeled "modif.tele") appears as the viewing angle from the wide-angle sensor 1115 captured (eg, as in wide-angle image 1135) rather than from the perspective of telephoto sensor 1110 (eg, as in telephoto image 1130). In the modified telephoto image 1140, the man appears slightly to the left of the man's position in the unmodified telephoto image 1130. In the modified telephoto image 1140, the man appears similarly positioned as the man in the wide-angle image 1135. A black shadow appears to the right of the man in the modified telephoto image 1140, which is caused by the parallax movement of the image material of the illustrated man relative to the background. Black shading represents "holes" that can be filled with image data (eg using interpolation and/or inpainting as discussed further).

在一些示例中，重投影SAT引擎250可以相反地基於深度資料1160來執行重投影1105以修改廣角圖像以修改視角，使得修改後的廣角圖像（未圖示）表現為從長焦感測器1110的視角擷取的，而不是從廣角感測器1115的視角擷取的。與感測器之間的變換（其中來自一個感測器的一組數位變焦圖像基於圖像估計而扭曲以在切換之前與第二感測器匹配）不同，重投影SAT引擎250可以基於深度資料校正偏移，從而尤其是對於較近的物件（例如，前景中的物件及/或深度小於閾值的物件），減少視差問題（例如，視差誤差）。在圖32中示出重投影1105的附加示例。In some examples, the reprojection SAT engine 250 may instead perform reprojection 1105 based on the depth information 1160 to modify the wide-angle image to modify the viewing angle such that the modified wide-angle image (not shown) appears to be from telephoto sensing It is captured from the viewing angle of the wide-angle sensor 1110, rather than from the viewing angle of the wide-angle sensor 1115. Unlike sensor-to-sensor transformation, where a set of digitally zoomed images from one sensor is warped based on image estimates to match the second sensor before switching, the reprojection SAT engine 250 can be depth-based. The data corrects the offset, thereby reducing parallax problems (e.g., parallax errors), especially for closer objects (e.g., objects in the foreground and/or objects with depths less than a threshold). An additional example of reprojection 1105 is shown in Figure 32.

圖12是示出由頭部姿態校正引擎255執行的頭部姿態校正1205的示例的概念圖1200。在一些情況下，可以從次優角度及/或不討人喜歡的角度（例如，除垂直於使用者面部的垂直角度之外的角度）擷取使用者的圖像。例如，當使用者擷取自己的自拍圖像或者在視訊會議中將相機對準自己時，擷取圖像的角度通常未與使用者的頭部姿態對準，使得使用者表現為向下看、向上看、向左看及/或向右看。在一些情況下，使用者的手可能因為長時間握住他們的電話或其它成像系統200而變得疲倦及/或不舒服，這在使用者的手下垂或靠在附近表面上時可能加劇該問題。FIG. 12 is a conceptual diagram 1200 illustrating an example of head posture correction 1205 performed by the head posture correction engine 255. In some cases, images of the user may be captured from suboptimal and/or unflattering angles (eg, angles other than vertical to the user's face). For example, when a user captures a selfie image of himself or points the camera at himself during a video conference, the angle of the captured image is usually not aligned with the user's head posture, causing the user to look downward. , look up, left and/or right. In some cases, a user's hands may become tired and/or uncomfortable from holding their phone or other imaging system 200 for extended periods of time, which may be exacerbated when the user's hands are dropped or resting against a nearby surface. problem.

由頭部姿態校正引擎255執行的頭部姿態校正1205可以使用圖像重投影引擎215來執行重投影，以重投影真實感測器以與虛擬感測器位置匹配以獲得更最佳化及/或更討人喜歡的視角，諸如來自垂直於使用者面部的垂直角度的視角。Head pose correction 1205 performed by head pose correction engine 255 may use image reprojection engine 215 to perform reprojection to reproject real sensors to match virtual sensor positions for more optimization and/or or a more flattering viewing angle, such as from a vertical angle perpendicular to the user's face.

例如，輸入圖像1210中的女性的原始頭部姿態是從略低於女性頭部位準的不討人喜歡的角度擷取的，從而強調女性的頸部和下巴區域。頭部姿態校正1205基於輸入圖像1210和深度資料1220使用圖像重投影引擎215以從垂直於使用者面部的垂直角度的視角產生重投影圖像1215。重投影的圖像1215表現為從更討人喜歡的垂直角度觀看女人的面部，從而強調女人的面部特徵，而不是輸入圖像1210中的女人的頸部和下巴。頭部姿態校正1205的附加示例在圖33中示出。For example, the original head pose of the woman in input image 1210 is captured from an unflattering angle slightly below the level of the woman's head, thereby emphasizing the woman's neck and chin area. Head pose correction 1205 uses an image reprojection engine 215 based on the input image 1210 and depth data 1220 to generate a reprojected image 1215 from a perspective perpendicular to the vertical angle of the user's face. The reprojected image 1215 appears to view the woman's face from a more flattering vertical angle, thereby emphasizing the woman's facial features rather than the woman's neck and chin in the input image 1210 . Additional examples of head pose correction 1205 are shown in Figure 33.

圖13是示出由XR後期重投影引擎260執行的XR後期重投影1305的示例的概念圖1300。一些XR設備（例如，HMD 1320）或其它行動設備使用它們的感測器205以低圖框率擷取感測器資料（例如，圖像、視訊、深度圖像及/或點雲）以節省電池電量。插補可以用於在低圖框率感測器資料的圖框之間產生附加圖框以提高圖框率。高圖框率對於XR應用程式可能很重要，因為低圖框率XR可能導致使用者感到噁心及/或可能導致XR表現為信號干擾和不真實。FIG. 13 is a conceptual diagram 1300 illustrating an example of XR post-reprojection 1305 performed by the XR post-reprojection engine 260. Some XR devices (eg, HMD 1320) or other mobile devices use their sensors 205 to capture sensor data (eg, images, videos, depth images, and/or point clouds) at a low frame rate to save Battery power. Interpolation can be used to generate additional frames between frames of low frame rate sensor data to increase frame rate. A high frame rate may be important for XR applications, as low frame rate XR may cause the user to feel nauseated and/or may cause XR to appear to be interferencey and unrealistic.

插補技術並不總是能夠真實地表示XR設備（例如，HMD 1320）的所有視角變化。例如，插補可以使用數位變焦來模擬使用者移動得更靠近或更遠離物件，這可能導致視野中的失配，該失配類似於關於圖10的3D變焦1005所討論的失配。插補技術也可能難以處理例如由XR設備（例如，HMD 1320）的平移移動引起的視差移動。插補技術也可能難以處理例如由XR設備（例如，HMD 1320）的取向變化（例如，俯仰、橫滾及/或偏航）引起的旋轉移動。Interpolation techniques are not always able to faithfully represent all viewing angle changes of an XR device (e.g., HMD 1320). For example, interpolation may use digital zoom to simulate a user moving closer or further away from an object, which may result in a mismatch in the field of view similar to that discussed with respect to 3D zoom 1005 of FIG. 10 . Interpolation techniques may also have difficulty handling parallax movements caused, for example, by translational movement of the XR device (e.g., HMD 1320). Interpolation techniques may also have difficulty handling rotational movements caused, for example, by changes in orientation (eg, pitch, roll, and/or yaw) of the XR device (eg, HMD 1320).

由XR後期重投影引擎260執行的XR後期重投影1305可以使用圖像重投影引擎215執行圖像重投影以基於對XR設備的位置的改變來重投影環境的圖像。可以基於來自XR設備（例如，HMD 1320）的姿態感測器的感測器資料來決定XR設備（例如，HMD 1320）的位置變化，這可以使用比圖像感測器或深度感測器更少的頻寬及/或功率。可以基於來自圖像感測器、深度感測器及/或XR設備（例如，HMD 1320）的感測器205的麥克風的圖像資料、深度資料及/或音訊資料來推斷XR設備（例如，HMD 1320）的位置變化。XR post-reprojection 1305 performed by XR post-reprojection engine 260 may perform image reprojection using image reprojection engine 215 to reproject an image of the environment based on changes to the position of the XR device. Position changes of the XR device (e.g., HMD 1320) may be determined based on sensor data from an attitude sensor of the XR device (e.g., HMD 1320), which may use more sensors than image sensors or depth sensors. less bandwidth and/or power. Inferences may be made based on image data, depth data, and/or audio data from the image sensor, depth sensor, and/or microphone of the sensor 205 of the XR device (e.g., HMD 1320) HMD 1320) position changes.

例如，圖示輸入圖像1310，基於該輸入圖像，XR後期重投影引擎260使用XR後期重投影1305基於HMD 1320（其是XR設備的示例）的所示取向變化產生重投影圖像1315。For example, an input image 1310 is illustrated based on which the XR post-reprojection engine 260 uses the XR post-reprojection 1305 to generate a reprojected image 1315 based on the illustrated orientation changes of the HMD 1320 (which is an example of an XR device).

圖14是示出由特效引擎265執行的特效1405的示例的概念圖1400。由特效引擎265執行的特效1405可以使用圖像重投影引擎215執行圖像重投影，以重投影輸入圖像1410以圍繞物件旋轉，在物件旁邊平移，使視角圍繞軸線旋轉、將視角沿著路徑移動，或其某種組合。在圖14中所示的示例中，環境的輸入圖像1410從環境的不同視角被重投影以形成重投影的圖像1415。關於重投影的圖像1415中的環境的視角位於關於輸入圖像1410中的環境的視角的左側，例如使工具箱在重投影的圖像1415中表現為相對於輸入圖像1410向右旋轉及/或傾斜。FIG. 14 is a conceptual diagram 1400 illustrating an example of special effects 1405 performed by special effects engine 265. The special effects 1405 performed by the special effects engine 265 may perform image reprojection using the image reprojection engine 215 to reproject the input image 1410 to rotate around the object, translate next to the object, rotate the perspective around an axis, move the perspective along a path Move, or some combination thereof. In the example shown in Figure 14, an input image 1410 of the environment is reprojected from different perspectives of the environment to form a reprojected image 1415. The perspective with respect to the environment in the reprojected image 1415 is to the left of the perspective with respect to the environment in the input image 1410 , such that the toolbox appears in the reprojected image 1415 rotated to the right relative to the input image 1410 and /or tilt.

圖15是示出基於矩陣運算的圖像重投影變換的概念圖1500。概念圖1500圖示圖像重投影引擎215如何能夠重投影環境的擷取圖像1510以從與擷取圖像1510不同的視角產生環境的重投影的圖像1515。圖像重投影引擎215從感測器205、具體是從相機接收擷取的圖像1510。擷取的圖像從第一視角（「第一persp.」）圖示環境。擷取的圖像1510的示例在圖15中示出。例如，使用針孔相機範例以及焦距（f）和深度，成像系統可以決定物件在環境中相對於相機的位置。圖像重投影引擎215可以使用圖示第一相機（也稱為原始相機、源相機或第一視角）的固有矩陣、圖示第二相機或3D世界中的虛擬相機（也稱為目標相機，或第二視角）的第二固有矩陣，以及3D變換矩陣，以便從第一相機移動或重投影到第二相機。在一些示例中，圖像重投影引擎還可以執行深度重投影，以基於與本文描述的圖像重投影相同的原理來建立從第二視角圖示環境的第二深度圖。此外，各種變換範例可以用於圖像及/或深度重投影，諸如考慮鏡頭畸變（例如，徑向畸變）的變換範例。Figure 15 is a conceptual diagram 1500 illustrating an image reprojection transformation based on matrix operations. Conceptual diagram 1500 illustrates how image reprojection engine 215 can reproject a captured image 1510 of the environment to produce a reprojected image 1515 of the environment from a different perspective than the captured image 1510 . Image reprojection engine 215 receives captured image 1510 from sensor 205, specifically from a camera. The captured image illustrates the environment from a first-person perspective ("first persp."). An example of a captured image 1510 is shown in Figure 15. For example, using the pinhole camera paradigm and focal length (f) and depth, the imaging system can determine the position of objects in the environment relative to the camera. The image reprojection engine 215 may use an intrinsic matrix illustrating a first camera (also known as an original camera, a source camera, or a first view), a illustrating second camera, or a virtual camera in the 3D world (also known as a target camera, or second perspective), and a 3D transformation matrix for movement or reprojection from the first camera to the second camera. In some examples, the image reprojection engine may also perform depth reprojection to establish a second depth map illustrating the environment from a second perspective based on the same principles as image reprojection described herein. Additionally, various transformation paradigms may be used for image and/or depth reprojection, such as those that account for lens distortion (eg, radial distortion).

圖像重投影引擎215例如從深度感測器及/或基於使用相機（例如，立體深度感知、ToF感測器及/或結構光）決定深度來接收深度圖（「圖像域上的深度」）（例如，深度資料620）。基於深度圖，圖像重投影引擎215可以決定擷取的圖像1510中的任何給定物件（諸如在擷取的圖像1510中圖示的椅子或桌子或工具箱中的任一者）在3D座標（例如，X、Y和Z）中的確切位置。例如，圖15中標識了用於基於物件的深度、相機的固有矩陣（固有 _cam）和擷取的圖像1510中的物件的座標和來決定物件的X、Y和Z座標的一組方程。該方程如下： The image reprojection engine 215 receives a depth map ("depth in the image domain", for example, from a depth sensor and/or based on depth determination using a camera (eg, stereo depth sensing, ToF sensor, and/or structured light) ) (e.g., depth profile 620). Based on the depth map, the image reprojection engine 215 may determine that any given object in the captured image 1510 (such as any of the chairs or tables or tool boxes illustrated in the captured image 1510) is The exact location in 3D coordinates (for example, X, Y, and Z). For example, Figure 15 identifies the coordinates for the object in the captured image 1510 based on the object's depth, the camera's intrinsic matrix (intrinsic _cam ) and A set of equations to determine the X, Y, and Z coordinates of an object. The equation is as follows:

相機的固有矩陣（固有 _cam）可以用於將3D相機座標變換為2D圖像座標，並且可以基於焦距（f _x及/或f _y）及/或主點偏移（c _x及/或c _y）的測量值，如下文指示： The camera's intrinsic matrix (intrinsic _cam ) can be used to transform 3D camera coordinates into 2D image coordinates, and can be based on focal length (f _x and/or f _y ) and/or principal point offset (c _x and/or _cy ), as indicated below:

3D變換可以基於源相機位置和與重投影相對應的目標相機位置處的固有矩陣，例如如下文指示： The 3D transformation can be based on intrinsic matrices at the source camera position and the target camera position corresponding to the reprojection, for example as indicated below:

圖像重投影引擎215接收及/或決定重投影矩陣，其指示視角如何在重投影的環境中移動（例如，相機的模擬移動）。在圖15中圖示的重投影矩陣中的值被標記為R11、R12、R13、Tx、R21、R22、R23、Ty、R31、R32、R33和Tz。在另一個示例中，圖像重投影引擎可以直接獲得該變換作為3DTransform矩陣（例如，無需執行上文指示的至少一些計算）。一旦圖像重投影引擎215知道視角如何以重投影矩陣的形式在環境中移動，圖像重投影引擎215就可以藉由如下決定X _out、Y _out和Z _out來決定物件在相機移動之後在環境中（例如，在重投影的圖像1515中）的新3D位置： Image reprojection engine 215 receives and/or determines a reprojection matrix that indicates how the perspective moves in the reprojected environment (eg, simulated movement of the camera). The values in the reprojection matrix illustrated in Figure 15 are labeled R11, R12, R13, Tx, R21, R22, R23, Ty, R31, R32, R33 and Tz. In another example, the image reprojection engine can obtain this transformation directly as a 3DTransform matrix (eg, without performing at least some of the calculations indicated above). Once the image reprojection engine 215 knows how the perspective moves in the environment in the form of a reprojection matrix, the image reprojection engine 215 can determine the position of the object in the environment after the camera moves by determining X _out , Y _out and Z _out as follows. new 3D position in (e.g., in reprojected image 1515):

圖像重投影引擎215可以使用物件在環境中的新位置（由座標X _out、Y _out和Z _out定義）來決定物件在重投影的圖像1515中的新座標，分別表示為和。物件在重投影的圖像1515中的新座標（和）由圖像重投影引擎215決定如下： The image reprojection engine 215 may use the object's new position in the environment (defined by the coordinates X _out , Y _out , and Z _out ) to determine the object's new coordinates in the reprojected image 1515 , respectively represented as and . The new coordinates of the object in the reprojected image 1515 ( and ) is determined by the image reprojection engine 215 as follows:

圖像重投影引擎215可以使用物件在擷取的圖像1510中的座標（和）以及物件在重投影的圖像1515中的新座標（和）來決定物件從擷取的圖像1510到重投影的圖像1515的運動向量。圖像重投影引擎215可以將運動向量的水平值決定為MV _x，並將運動向量的豎直值決定為MV _y，如下所示： The image reprojection engine 215 may use the coordinates of the object in the captured image 1510 ( and ) and the new coordinates of the object in the reprojected image 1515 ( and ) to determine the motion vector of the object from the captured image 1510 to the reprojected image 1515. Image reprojection engine 215 may determine the horizontal value of the motion vector as MV _x and the vertical value of the motion vector as MV _y as follows:

圖像重投影引擎215可以使用運動向量MV _x和MV _y來針對擷取的圖像1510中的任何物件的任何圖元瞭解該圖元應當落在重投影的圖像1515中的位置。在說明性示例中，椅子的部分可以從擷取的圖像1510向右移動4個圖元到達重投影的圖像1515。同時，工具箱的部分可以從擷取的圖像1510向右移動10個圖元到達重投影的圖像1515，因為工具箱比椅子更靠近相機。因此，對於每個物件，圖像重投影引擎215可以計算物件應當相對於擷取的圖像1510移動到重投影的圖像1515中的位置。 Image reprojection engine 215 may use motion vectors MV _x and MV _y to learn for any primitive of any object in captured image 1510 where that primitive should fall in reprojected image 1515 . In the illustrative example, portions of the chair may be moved 4 primitives to the right from captured image 1510 to reprojected image 1515 . At the same time, the toolbox portion can be moved 10 primitives to the right from the captured image 1510 to the reprojected image 1515 because the toolbox is closer to the camera than the chair. Therefore, for each object, the image reprojection engine 215 can calculate the position where the object should be moved into the reprojected image 1515 relative to the captured image 1510 .

運動向量可以表示第一圖像資料之每一者圖元到第二圖像資料中的圖元位置的圖元位移，其中該位移將取決於第一視角和第二視角的相對觀察視點以及深度的倒數。如上文所討論，可以基於深度資料（例如，以上方程式中的「深度」）來決定運動向量。例如，在一些示例中，可以基於物件在環境中的位置（諸如可以從擷取的圖像資料基於深度資料決定的3D座標（例如，X、Y、Z））來決定運動向量。在一些示例中，可以基於物件在環境中的位置的變換的輸出（諸如物件的3D座標（例如，X、Y、Z）的變換（例如，3DTransformation）的輸出（例如，X _out、Y _out、Z _out））來決定運動向量。 The motion vector may represent a primitive displacement of each primitive in the first image data to a position of the primitive in the second image data, where the displacement will depend on the relative viewing viewpoints and depths of the first and second perspectives. The countdown. As discussed above, motion vectors can be determined based on depth data (eg, "depth" in the equation above). For example, in some examples, motion vectors may be determined based on the location of the object in the environment, such as 3D coordinates (eg, X, Y, Z) that may be determined based on depth data from captured image data. In some examples, the output of a transformation (eg, a 3DTransformation) of the object's position in the environment (eg, X _out , Y _out , Z _out )) to determine the motion vector.

在一些示例中，相機的焦距f也可能影響上述一些方程。例如，物件在環境中的X和Y座標的決定可以基於焦距f以及物件在重投影的圖像1515中的座標（和）的決定，例如如下文指示： In some examples, the camera's focal length f may also affect some of the above equations. For example, the determination of the X and Y coordinates of the object in the environment may be based on the focal length f and the coordinates of the object in the reprojected image 1515 ( and ), for example as indicated below:

圖16是示出基於深度資料的網格反演變換以及3D變換的方塊圖1600。網格反演變換獲得3D變換1605（例如，以重投影矩陣的形式）和深度圖1610，並且使用MV計算1615來產生運動向量（MV）1620，其指示物件在環境中從擷取圖像1510到重投影的圖像1515的運動，如圖15中所示。在一些示例中，初始運動向量可以被稱為現有運動向量。網格反演變換對現有MV1620執行網格反演1625以成為反演運動向量1630。在一些示例中，反演運動向量可以稱為所需運動向量。Figure 16 is a block diagram 1600 illustrating mesh inversion transformation and 3D transformation based on depth data. The mesh inversion transformation obtains the 3D transformation 1605 (e.g., in the form of a reprojection matrix) and the depth map 1610 and uses MV computation 1615 to produce motion vectors (MVs) 1620 that indicate the object's position in the environment from the captured image 1510 Movement to the reprojected image 1515, as shown in Figure 15. In some examples, the initial motion vector may be referred to as an existing motion vector. Grid Inversion Transformation Grid inversion 1625 is performed on the existing MV 1620 to become inverted motion vectors 1630 . In some examples, the inverted motion vector may be referred to as the desired motion vector.

圖17是示出基於運動向量的圖像重投影變換的方塊圖1700。圖示扭曲引擎1705，其可以是圖像重投影引擎215的一部分。扭曲引擎1705使用反演運動向量1730（例如，圖15至圖16的反演MV）而不是最初決定的運動向量（圖15至圖16的MV）。這是因為反演運動向量1730是由外到內運動向量，而最初決定的運動向量（MV）是由內到外運動向量。由外到內運動向量變換比由內到外運動向量變換在計算上花費更少。具體地，如果扭曲引擎1705使用諸如反演運動向量1730的由外到內運動向量產生重投影的圖像1715，則扭曲引擎1705可以按重投影的圖像的光柵順序（或相反光柵順序或任何優選順序）逐圖元地產生重投影的圖像1715。對於重投影的圖像1715之每一者圖元，由外到內反演運動向量1730指示扭曲引擎1705從擷取的圖像1710中的特定位置提取圖元資料，並用來自擷取的圖像1710的該圖元資料填充重投影的圖像1715的該圖元。例如，對於重投影的圖像1715中的某個圖元，扭曲引擎1705可以讀取由外到內反演運動向量1730以決定該圖元的值應當取自擷取的圖像1710中左邊4個圖元的圖元，以此類推。Figure 17 is a block diagram 1700 illustrating a motion vector based image reprojection transformation. Illustration warping engine 1705, which may be part of image reprojection engine 215. The warp engine 1705 uses the inverted motion vector 1730 (eg, the inverted MV of Figures 15-16) instead of the originally determined motion vector (the MV of Figures 15-16). This is because the inverted motion vector 1730 is an outside-in motion vector, while the initially determined motion vector (MV) is an inside-out motion vector. The outside-in motion vector transformation is less computationally expensive than the inside-out motion vector transformation. Specifically, if the warp engine 1705 generates the reprojected image 1715 using an outside-in motion vector, such as the inverted motion vector 1730, the warp engine 1705 may produce the reprojected image 1715 in raster order (or the reverse raster order or whatever) of the reprojected image. preferred order) to generate a reprojected image 1715 primitive by primitive. For each primitive in the reprojected image 1715 , the outside-in inversion motion vector 1730 instructs the warp engine 1705 to extract primitive data from a specific location in the captured image 1710 and use The primitive information of 1710 populates the primitive of the reprojected image 1715 . For example, for a certain primitive in the reprojected image 1715, the warp engine 1705 can read the outside-in inversion motion vector 1730 to determine that the value of the primitive should be taken from the left 4 in the captured image 1710. primitives, and so on.

由內到外運動向量可以代表指示圖元從場景的初始圖像（從初始視角）到場景的目標圖像（從目標視角）的運動的運動向量。最初決定的運動向量（例如，圖15至圖16的MV）可以是由內到外運動向量的示例。由外到內運動向量可以代表指示圖元從場景的目標圖像（從目標視角）到場景的初始圖像（從初始視角）的運動的運動向量。反演MV 1730可以是由外到內運動向量的示例。An inside-out motion vector may represent a motion vector indicating the motion of a primitive from an initial image of the scene (from the initial perspective) to a target image of the scene (from the target perspective). The initially decided motion vector (eg, MV of FIGS. 15 and 16 ) may be an example of an inside-out motion vector. An outside-in motion vector may represent a motion vector indicating the motion of a primitive from a target image of the scene (from the target perspective) to an initial image of the scene (from the initial perspective). Inversion MV 1730 may be an example of an outside-in motion vector.

當扭曲引擎1705執行變形（例如，從擷取的圖像1710到重投影的圖像1715）時，將由外到內運動向量（例如，反演運動向量1730）用於扭曲相對於將由內到外運動向量（例如，圖15至圖16的MV）用於扭曲可以提供計算資源消耗的減少。由內到外運動向量（例如，圖15至圖16的MV）是基於擷取的圖像1710組織的，而不是基於重投影的圖像1715組織的。另一方面，由外到內運動向量（例如，反演運動向量1730）反而是基於重投影的圖像1715組織的。當扭曲引擎1705執行扭曲以產生重投影的圖像1715時，根據基於重投影的圖像1715的圖元順序（例如，以根據重投影的圖像1715的光柵順序）產生重投影的圖像1715而不是根據基於擷取的圖像1710的圖元順序（例如，以根據擷取的圖像1710的光柵順序）產生重投影的圖像1715是最優的。將由外到內運動向量（例如，反演運動向量1730）用於扭曲可以允許扭曲引擎1705根據基於重投影的圖像1715的圖元順序（例如，以根據重投影的圖像1715的光柵順序）的產生重投影的圖像1715。例如，使用反演運動向量1730，扭曲引擎1705可以產生重投影的圖像1715的每個圖元，其中已經解決了如關於圖5所討論的任何衝突或缺失區域。另一方面，為了使扭曲引擎1705使用由內到外運動向量以重投影的圖像1715中的圖元的光柵順序產生重投影的圖像1715，扭曲引擎1705將藉由逐圖元地搜尋擷取的圖像1710和重投影的圖像1715的每個特定圖元的由內到外運動向量來在運動向量中重複地搜尋，以找到應當在重投影的圖像1715的該特定圖元中結束的資料。在擷取的圖像1710和由內到外運動向量中重複搜尋在計算上是昂貴的，並使用大量功率。在一些情況下，扭曲引擎1705可能還需要解決衝突或填充缺失區域，並且如果這些搜尋以錯誤順序產生運動向量（例如，錯誤地將較遠的物件優先於較近的物件而不是將較近的物件優先於較遠的物件），則可能錯誤地解決衝突或錯誤地填充缺失區域。因此，即使由內到外運動向量（例如，圖15至圖16的運動向量）產生外到內運動向量（例如，反演運動向量1730）需要一些計算成本，使用由外到內運動向量（例如，反演運動向量1730）進行扭曲的最終結果仍然節省計算資源並提高準確度。When the warp engine 1705 performs a warp (e.g., from captured image 1710 to reprojected image 1715), an outside-to-in motion vector (e.g., inversion motion vector 1730) is used to warp relative to an inside-to-outer motion vector. The use of motion vectors (eg, MVs of Figures 15-16) for warping can provide a reduction in computational resource consumption. The inside-out motion vectors (eg, MVs of FIGS. 15-16 ) are organized based on the captured image 1710 rather than the reprojected image 1715 . On the other hand, outside-in motion vectors (eg, inverted motion vectors 1730 ) are instead organized based on the reprojected image 1715 . When the warp engine 1705 performs the warp to produce the reprojected image 1715 , the reprojected image 1715 is produced according to a primitive order based on the reprojected image 1715 (eg, in a raster order based on the reprojected image 1715 ). It is optimal to generate the reprojected image 1715 rather than in primitive order based on the captured image 1710 (eg, in raster order based on the captured image 1710). Using outside-in motion vectors (eg, inverted motion vectors 1730 ) for warping may allow the warp engine 1705 to be in a primitive order based on the reprojected image 1715 (eg, in a raster order based on the reprojected image 1715 ) The resulting reprojected image 1715. For example, using the inverted motion vectors 1730, the warping engine 1705 may produce each primitive of the reprojected image 1715 in which any conflicts or missing areas have been resolved as discussed with respect to FIG. 5. On the other hand, in order for the warp engine 1705 to use the inside-out motion vectors to generate the reprojected image 1715 in the raster order of the primitives in the reprojected image 1715 , the warp engine 1705 will search the capture by primitive by primitive. Take the inside-out motion vector for each specific primitive of the reprojected image 1710 and the reprojected image 1715 and iteratively search among the motion vectors to find what should be in that specific primitive of the reprojected image 1715 Ending information. Repeatedly searching through the captured image 1710 and the inside-out motion vectors is computationally expensive and uses a lot of power. In some cases, the warp engine 1705 may also need to resolve conflicts or fill in missing areas, and if these searches produce motion vectors in the wrong order (e.g., mistakenly prioritizing farther objects over closer ones instead of closer ones) objects over more distant objects), conflicts may be resolved incorrectly or missing areas may be filled incorrectly. Therefore, even though generating an outside-in motion vector (eg, inverting motion vector 1730) from an inside-out motion vector (eg, the motion vectors of Figures 15-16) requires some computational cost, using an outside-in motion vector (eg, the motion vector of Figures 15-16) , inverse motion vector 1730), the final result of the distortion still saves computational resources and improves accuracy.

在一些示例中，由內到外MV（現有MV）以低解析度決定，例如以擷取的圖像的解析度的¼決定，因為決定由內到外MV可能很昂貴。藉由將網格反演應用於由內到外MV來產生由外到內MV（所需MV）在計算上並不昂貴。此外，使用由外到內MV（所需MV）進行重投影在計算上並不昂貴。這些操作的不昂貴計算的本質允許有效地執行使用由外到內MV（所需MV）進行的網格反演及/或重投影，即使在更高解析度（諸如擷取的圖像的全解析度）下也是如此。因此，儘管以較低解析度決定了由內到外MV（現有MV），但是扭曲引擎1705可以產生重投影的圖像以作為對擷取的圖像的完全重投影。這允許進一步節省計算資源和功率。In some examples, the inside-out MV (existing MV) is determined at a low resolution, such as ¼ of the resolution of the captured image, since determining the inside-out MV can be expensive. Generating the outside-in MV (required MV) by applying grid inversion to the inside-out MV is computationally inexpensive. Furthermore, reprojection using outside-in MVs (required MVs) is computationally inexpensive. The computationally inexpensive nature of these operations allows grid inversion and/or reprojection using outside-in MVs (required MVs) to be performed efficiently even at higher resolutions such as the full range of captured images. resolution). Therefore, although the inside-out MV (existing MV) is determined at a lower resolution, the warp engine 1705 can generate the reprojected image as a complete reprojection of the captured image. This allows further savings in computing resources and power.

網格反演引擎225包括若干機制來處理反演的MV網格中的缺失資料及/或衝突。如先前所解釋，網格反演引擎改變MV的位置以將圖元在目標圖像（例如，重投影的圖像1715）中的位置關聯。在一些情況下，有些圖元在輸入網格中沒有MV指向它們，因此單獨使用反演不會將MV放置在這些位置處。網格反演引擎在其程序期間藉由插補將這些儲存格填充在反演的MV網格中。再次參考圖5，反演MV網格520經由網格反演產生，並且包括使用星號標記的缺失單元。例如，反演MV網格520中的儲存格1沒有來自MV網格505的對應運動向量，而是使用修復來填充。一種插補選擇是使用儲存格1的鄰域儲存格0和2中的值來插補該儲存格的值。例如，插補的權重可以按距離，因此基於儲存格0中的值0和儲存格2中的值-1，儲存格1的插補值可以是-1/2。可以對儲存格3、5、6和7執行類似類型的插補。The grid inversion engine 225 includes several mechanisms to handle missing data and/or conflicts in the inverted MV grid. As explained previously, the grid inversion engine changes the position of the MV to correlate the primitive's position in the target image (eg, reprojected image 1715). In some cases, there are primitives that do not have MVs pointing to them in the input mesh, so inversion alone will not place MVs at these locations. The grid inversion engine fills these cells in the inverted MV grid by interpolation during its process. Referring again to Figure 5, inversion MV grid 520 is generated via grid inversion and includes missing cells marked with an asterisk. For example, cell 1 in the inversion MV grid 520 does not have a corresponding motion vector from the MV grid 505, but is filled using repair. One interpolation option is to interpolate the value of cell 1 using the values in its neighbor cells 0 and 2. For example, the weight of the interpolation can be by distance, so based on the value 0 in cell 0 and the value -1 in cell 2, the interpolated value of cell 1 can be -1/2. A similar type of interpolation can be performed on cells 3, 5, 6, and 7.

網格反演引擎225還包括用於處理反演的MV網格中的衝突的機制。在一些情況下，MV網格505中的多個MV可以指向第二圖像（例如，第二圖像Img2 515、重投影的圖像1715）中的相同圖元，因此在反演的MV網格520中產生MV衝突，從而需要網格反演引擎為反演MV網格525中的給定單元選擇衝突值中的一者。這種衝突的示例在反演MV網格520的儲存格8中示出。根據從MV網格505中的儲存格7和8擴展的運動向量，第一圖像Img1 510的儲存格7中的汽車和第一圖像Img1 510的儲存格8中的樹最終都在與第二圖像Img2 515中的儲存格8相對應的相同圖元中結束。結果，可能不清楚網格反演引擎應當選擇哪個值放入反演MV網格520的儲存格8。The grid inversion engine 225 also includes mechanisms for handling conflicts in the inverted MV grid. In some cases, multiple MVs in the MV mesh 505 may point to the same primitive in the second image (eg, the second image Img2 515, the reprojected image 1715), so in the inverted MV mesh An MV conflict occurs in grid 520 , requiring the grid inversion engine to select one of the conflict values for a given cell in inversion MV grid 525 . An example of such a conflict is shown in cell 8 of the inversion MV grid 520. According to the motion vectors expanded from cells 7 and 8 in the MV grid 505, the car in cell 7 of the first image Img1 510 and the tree in cell 8 of the first image Img1 510 end up in the same position as the first image Img1 510. The second image ends in the same primitive corresponding to cell 8 in image Img2 515. As a result, it may be unclear which value the grid inversion engine should select to place into cell 8 of the inversion MV grid 520 .

為解決衝突，網格反演引擎225可以選擇一個值或另一值。在一些示例中，可以使用衝突值的加權平均值。如果網格反演引擎225具有與這兩個物件相對應的深度資訊（例如，來自深度資料620），則網格反演引擎225可以選擇與更靠近感測器205的物件相對應的值。這是因為在許多情況下，較近的物件會覆蓋、阻擋或遮擋較遠物件的視野。如果網格反演引擎225缺少與這兩個物件相對應的深度資訊，則網格反演引擎225可以基於其它啟發法或技術來選擇該值，例如選擇與較大運動相對應的值或表現為更大的物件。無論物件大小如何，經歷較大運動的物件都更有可能更靠近感測器205，因為較近物件的移動表現為比較遠物件的移動覆蓋更大的感測器205視野，即使移動速度相同也是如此。在一些示例中，表現為較大的物件也可以更靠近感測器205。To resolve conflicts, grid inversion engine 225 may select one value or the other. In some examples, a weighted average of conflicting values may be used. If grid inversion engine 225 has depth information corresponding to the two objects (eg, from depth data 620 ), grid inversion engine 225 may select the value corresponding to the object closer to sensor 205 . This is because in many cases, closer objects cover, block, or obscure the view of farther objects. If the mesh inversion engine 225 lacks depth information corresponding to these two objects, the mesh inversion engine 225 can select the value based on other heuristics or techniques, such as selecting a value that corresponds to larger motion or representation. for larger objects. Regardless of object size, objects that experience greater motion are more likely to be closer to the sensor 205 because the movement of a closer object covers a larger field of view of the sensor 205 than the movement of a farther object, even if the movement is the same. in this way. In some examples, objects that appear larger may also be closer to the sensor 205 .

在一些示例中，參考圖5，從第一圖像Img1 510的儲存格7移動到第二圖像Img2 515的儲存格8的汽車比樹更靠近感測器205，在這種情況下，網格反演引擎225可以選擇反演MV網格520的儲存格8中的值為-1（作為MV網格505的儲存格7中的對應值1的倒數）。在一些示例中，在圖5中，樹比汽車更靠近感測器205，在這種情況下，網格反演引擎225可以選擇反演MV網格520的儲存格8中的值為0（基於MV網格505的儲存格8中的對應值為0）。在一些示例中，網格反演引擎225可能缺少關於汽車相對於樹的相對深度的資訊。在這種情況下，因為汽車正在經歷較大的運動（其值在MV網格505中為1，而樹的值為0），所以反演MV網格520的儲存格8中的值被選擇為-1，因為汽車可能比樹更靠近感測器205。在一些示例中，如果汽車在圖像中表現為比樹大，則反演MV網格520的儲存格8中的值被選擇為-1，因為汽車可能比樹更靠近感測器205。在一些示例中，將反演MV網格520的儲存格8中的值選擇為-1/2，作為MV網格505的儲存格7和8中的值的倒數的平均值。In some examples, referring to Figure 5, the car moving from cell 7 of the first image Img1 510 to cell 8 of the second image Img2 515 is closer to the sensor 205 than the tree, in which case the network Lattice inversion engine 225 may choose to invert the value in bin 8 of MV grid 520 as -1 (as the reciprocal of the corresponding value of 1 in bin 7 of MV grid 505). In some examples, in FIG. 5 , the tree is closer to the sensor 205 than the car, in which case the grid inversion engine 225 may choose to invert a value of 0 in cell 8 of the MV grid 520 ( Based on the corresponding value in cell 8 of MV grid 505 being 0). In some examples, grid inversion engine 225 may lack information regarding the relative depth of the car relative to the tree. In this case, the value in cell 8 of the inverted MV grid 520 is selected because the car is experiencing a large motion (its value is 1 in the MV grid 505 and the tree's value is 0). is -1 because the car may be closer to the sensor 205 than the tree. In some examples, if the car appears larger than the tree in the image, the value in cell 8 of the inversion MV grid 520 is selected to be -1 because the car may be closer to the sensor 205 than the tree. In some examples, the value in cell 8 of the inverted MV grid 520 is chosen to be -1/2 as the average of the reciprocal of the values in cells 7 and 8 of the MV grid 505 .

可以執行不同類型的插補，在一個示例中，該插補可以基於距鄰域儲存格的距離對該值進行加權。在另一個示例中，插補可以根據鄰域的深度對值進行加權。可以應用其它方法。例如，對於較大間隙，如在反演MV網格520的儲存格5、6和7中，插補可以使來自較近儲存格的資訊的權重高於來自較遠儲存格的資訊。例如，反演MV網格520的儲存格6中的值可以是反演MV網格520的儲存格4中的值（2）與反演MV網格520的儲存格8中的值之間的平均值。反演MV網格520的儲存格8中的值可以取決於如何解決儲存格8中的衝突，如上文所討論的。假設反演MV網格520的儲存格8中的值為-1，則反演MV網格520的儲存格6中的值可以是½。在其插補中，反演MV網格的儲存格5中的值可以520使反演MV網格520的儲存格4中的值（2）的權重高於反演MV網格520的儲存格8中的值，例如是反演MV網格520的儲存格4中的值與反演MV網格520的儲存格6中的插補值的平均值。類似地，在其插補中，反演MV網格520的儲存格7中的值可以使反演MV網格520的儲存格4中的值（2）的權重低於反演MV網格520的儲存格8中的值，例如是反演MV網格520的儲存格8中的值與反演MV網格520的儲存格6中的插補值的平均值。例如，假設反演MV網格的儲存格8中的值為-1，則反演MV網格的儲存格5中的值可以被設置為1.25，而反演MV網格的儲存格7中的值可以被設置為-0.25。Different types of interpolation can be performed, in one example the interpolation can weight the value based on distance from a neighborhood cell. In another example, interpolation can weight values based on the depth of the neighborhood. Other methods can be applied. For example, for larger gaps, such as in cells 5, 6, and 7 of inversion MV grid 520, interpolation can weight information from closer cells higher than information from farther cells. For example, the value in cell 6 of the inverted MV grid 520 may be between the value (2) in the cell 4 of the inverted MV grid 520 and the value in cell 8 of the inverted MV grid 520 average value. Inverting the values in cell 8 of MV grid 520 may depend on how conflicts in cell 8 are resolved, as discussed above. Assuming that the value in cell 8 of the inversion MV grid 520 is -1, the value in cell 6 of the inversion MV grid 520 may be ½. In its interpolation, the value in cell 5 of the inverted MV grid can 520 cause the value (2) in cell 4 of the inverted MV grid 520 to be weighted higher than the value in the cell of the inverted MV grid 520 The value in 8 is, for example, the average value of the value in cell 4 of the inverted MV grid 520 and the interpolated value in the cell 6 of the inverted MV grid 520 . Similarly, in its interpolation, the value in bin 7 of the inverted MV grid 520 may weight the value (2) in bin 4 of the inverted MV grid 520 less than the value in the inverted MV grid 520 The value in cell 8 is, for example, the average value of the value in cell 8 of the inverted MV grid 520 and the interpolated value in the cell 6 of the inverted MV grid 520 . For example, assuming the value in cell 8 of the inverted MV grid is -1, the value in cell 5 of the inverted MV grid could be set to 1.25, while the value in cell 7 of the inverted MV grid could be set to 1.25. Value can be set to -0.25.

圖18是示出修復以解決遮擋的示例的概念圖1800。某些重投影的圖像中的一些區域可能沒有來自輸入圖像的適當資料，因此可能表示此類重投影的圖像中的間隙或遮擋。在重投影的圖像1805中，遮擋區域表現為黑色區域。例如，遮擋區域在每把椅子的左側（尤其是最左邊椅子）、工具箱的左側和桌子的左側都是可見的。當靠近感測器205的物件從一側移動到另一側時，可能會出現這些遮擋區域。重投影的圖像1805的遮擋圖1810以白色示出遮擋區域，所有未遮擋區域以黑色示出。成像系統200修改重投影的圖像1805以使用修復來填充遮擋區域以產生修復的圖像1815。在一些示例中，使用基於深度學習的修復，這可以提供基於用於基於深度學習的修復的深度學習模型的訓練進行智慧修復的高品質修復，該深度學習模型可能已經基於包括圖像的原始副本和添加了遮擋的圖像的第二副本的訓練資料進行了訓練，該遮擋類似於重投影的圖像1805和遮擋圖1810中所示的遮擋。修復的圖像1815中圖示基於深度學習的修復的示例。Figure 18 is a conceptual diagram 1800 illustrating an example of repair to address occlusions. Some regions in some reprojected images may not have appropriate material from the input image and may therefore represent gaps or occlusions in such reprojected images. In the reprojected image 1805, the occluded areas appear as black areas. For example, occluded areas are visible on the left side of each chair (especially the leftmost chair), the left side of the tool box, and the left side of the table. These occlusion areas may occur when an object close to the sensor 205 moves from side to side. Occlusion map 1810 of reprojected image 1805 shows occluded areas in white and all unoccluded areas in black. Imaging system 200 modifies the reprojected image 1805 to fill the occluded areas using inpainting to produce an inpainted image 1815 . In some examples, deep learning-based inpainting is used, which can provide high-quality inpainting based on the training of a deep learning model for deep learning-based inpainting, which may have been based on including the original copy of the image. Training is performed with training material on a second copy of the image with occlusions added, similar to the occlusions shown in the reprojected image 1805 and occlusion map 1810 . An example of deep learning based inpainting is illustrated in inpainted image 1815.

在一些示例中，基於成像系統200的用於修復操作的可用計算頻寬及/或功率容限，可以使用計算成本較低的修復，諸如插補或線上或最近值修復。在圖18的底部使用基於3D深度的變焦示例圖示基於插補的修復的示例，例如使用插補及/或線上或最近值修復。圖18中圖示基於3D深度的變焦圖像1825，在滑板的先前位置處，遮擋區域1835在滑板運動員的兩腿之間是可見的。修復的圖像1830被示出為使用基於插補的修復，例如使用插補或線上或最近值修復來修復該遮擋區域1835。In some examples, a computationally less expensive repair, such as interpolation or online or nearest value repair, may be used based on the available computational bandwidth and/or power margin of the imaging system 200 for the repair operation. An example of interpolation based inpainting is illustrated at the bottom of Figure 18 using a 3D depth based zoom example, eg using interpolation and/or line or nearest value inpainting. A 3D depth-based zoom image 1825 is illustrated in Figure 18, with an occlusion area 1835 visible between the skateboarder's legs at the previous position of the skateboard. The repaired image 1830 is shown repairing the occluded region 1835 using interpolation based repair, such as using interpolation or line or nearest value repair.

圖19是示出重投影和網格反演系統1905的架構的方塊圖1900。重投影和網格反演系統1905可以按光柵順序讀取資料。在一些示例中，重投影和網格反演系統1905以光柵順序讀取MV網格1910，及/或以光柵順序（例如，從深度感測器）讀取深度資料（例如，第一選項1915），並獲得3D矩陣。對於輸入之每一者圖元，對於輸入之每一者運動向量及/或深度值，重投影和網格反演系統1905將輸出中的圖元放在輸出中的位置中。每個圖塊編號表示輸出中的一組圖元。按照光柵順序，由箭頭1930指示的圖元將進入圖塊1，而由箭頭1935指示的圖元將進入圖塊2。輸入網格中彼此不靠近的圖元可以在輸出網格中更靠近。基於此，在重投影和網格反演系統1905需要將更多資料寫入到圖塊的情況下，將圖塊保存在快取記憶體中可能是有用的。例如，如果重投影和網格反演系統1905從圖塊1開始，然後移動到圖塊2，則重投影和網格反演系統1905可能稍後再次需要圖塊1。將圖塊保持在快取記憶體中（只要重投影和網格反演系統1905可以基於最近最少使用（LRU）快取記憶體系統）允許重投影和網格反演系統1905再次快速修改圖塊而不是從DRAM中讀取它。Figure 19 is a block diagram 1900 illustrating the architecture of a reprojection and grid inversion system 1905. The reprojection and grid inversion system 1905 can read data in raster order. In some examples, the reprojection and grid inversion system 1905 reads the MV grid 1910 in raster order, and/or reads the depth data (eg, first option 1915 ) in raster order (eg, from a depth sensor). ), and obtain a 3D matrix. For each primitive in the input, and for each motion vector and/or depth value in the input, the reprojection and mesh inversion system 1905 places the primitive in the output at the location in the output. Each tile number represents a set of primitives in the output. In raster order, the primitive indicated by arrow 1930 will go into tile 1, while the primitive indicated by arrow 1935 will go into tile 2. Primitives that are not close to each other in the input mesh can be closer together in the output mesh. For this reason, it may be useful to save the tiles in cache in the event that the reprojection and grid inversion system 1905 needs to write more data to the tiles. For example, if the reprojection and grid inversion system 1905 starts with tile 1 and then moves to tile 2, the reprojection and grid inversion system 1905 may need tile 1 again later. Keep the tiles in cache (as long as the reprojection and grid inversion system 1905 can be based on a least recently used (LRU) cache system) allowing the reprojection and grid inversion system 1905 to quickly modify the tiles again instead of reading it from DRAM.

在一些情況下，使用基於深度的重投影，較近的物件比較遠的物件移動得更多。因此，來自輸入圖像中不同區域的物件可以出現在重投影的圖像中的同一區域中。圖元/箭頭1930和圖元/箭頭1940是這種情況的一個示例，它們起源於輸入（例如，MV網格1910）中的不同位置，但是落在輸出中的同一區域中，例如在圖塊1中。重投影和網格反演系統1905因此可以將圖塊1保存在記憶體中，使得它可以修改圖塊1（例如，用由箭頭1940指示的圖元值覆蓋圖塊1）。將整個輸出緩衝器保存在記憶體硬體中可能會過度，使得重投影和網格反演系統1905可以包括快取記憶體機制以將圖塊保存在記憶體硬體中。In some cases, using depth-based reprojection, closer objects move more than farther objects. Therefore, objects from different regions in the input image can appear in the same region in the reprojected image. Primitives/Arrows 1930 and Primitives/Arrows 1940 are an example of a situation where they originate from different locations in the input (e.g. MV grid 1910) but fall in the same area in the output, e.g. in a tile 1 in. Reprojection and mesh inversion system 1905 can therefore save Tile 1 in memory so that it can modify Tile 1 (eg, overwrite Tile 1 with the primitive values indicated by arrow 1940). Keeping the entire output buffer in memory hardware may be excessive, so the reprojection and mesh inversion system 1905 may include a cache mechanism to keep the tiles in memory hardware.

如果重投影和網格反演系統1905在光柵順序的開頭開始，並且這是重投影和網格反演系統1905首次想要寫入圖塊（例如，將由箭頭1930指示的圖元值寫入圖塊1），則重投影和網格反演系統1905只是重置圖塊1並將所討論的值寫入圖塊1，而無需首先從DRAM讀取圖塊。在一些示例中，來自圖塊1的值可以從快取記憶體移動到DRAM。重投影和網格反演系統1905使用快取記憶體使得它不需要執行太多次讀取/修改/寫入操作，但是重投影和網格反演系統1905確實具有在必要時進行讀取/修改/寫入操作的能力。只要圖塊在快取記憶體中，重投影和網格反演系統1905就可以立即存取它們。在某個時間點，快取記憶體可能會變滿，並且重投影和網格反演系統1905可以將圖塊從快取記憶體發送到DRAM以為另一個圖塊騰出空間（基於LRU）。在某個其它時間點，重投影和網格反演系統1905再次需要從快取記憶體發送到DRAM的圖塊，然後重投影和網格反演系統1905可以將該圖塊從DRAM讀回快取記憶體以便修改它，並且在其它某個時間點可以將圖塊寫入DRAM。If the reprojection and grid inversion system 1905 starts at the beginning of the raster sequence and this is the first time the reprojection and grid inversion system 1905 wants to write to a tile (e.g., the primitive value indicated by arrow 1930 is written to the tile block 1), then the reprojection and grid inversion system 1905 simply resets tile 1 and writes the values in question to tile 1 without first reading the tile from DRAM. In some examples, the value from Tile 1 may be moved from cache to DRAM. The reprojection and grid inversion system 1905 uses cache memory so that it does not need to perform many read/modify/write operations, but the reprojection and grid inversion system 1905 does have the ability to read/write when necessary. Ability to modify/write operations. As long as the tiles are in cache, the reprojection and mesh inversion system 1905 can immediately access them. At some point in time, the cache may become full, and the reprojection and grid inversion system 1905 may send a tile from the cache to DRAM to make room for another tile (based on LRU). At some other point in time, the reprojection and grid inversion system 1905 again needs the tile sent from cache to DRAM, and then the reprojection and grid inversion system 1905 can read the tile back from DRAM to cache. The memory is fetched in order to modify it, and at some other point in time the tile can be written to DRAM.

另外，重投影和網格反演系統1905具有預取機制，其允許重投影和網格反演系統1905提前和在處理之前產生所需圖塊，以避免從DRAM讀取圖塊的延遲問題。重投影和網格反演系統1905以有序方式工作，並且預取機制可以確保重投影和網格反演系統1905在快取記憶體中總是有它需要的。重投影和網格反演系統1905可以在預取與處理之間以鎖步方式切換，而不是隨機切換，以確保重投影和網格反演系統1905可以按有序方式處理所有資料並將需要處理的所有內容都放在快取記憶體中。Additionally, the reprojection and grid inversion system 1905 has a prefetch mechanism that allows the reprojection and grid inversion system 1905 to generate required tiles in advance and before processing to avoid latency issues in reading tiles from DRAM. The reprojection and grid inversion system 1905 works in an ordered manner, and the prefetch mechanism ensures that the reprojection and grid inversion system 1905 always has what it needs in cache. The reprojection and grid inversion system 1905 can switch between prefetching and processing in a lock-step manner rather than randomly to ensure that the reprojection and grid inversion system 1905 can process all data in an orderly manner and will be required Everything processed is placed in cache.

在第一選項1915中，重投影和網格反演系統1905可以接收深度資料和3D矩陣。在一些示例中，重投影和網格反演系統1905可以從深度資料和3D矩陣產生MV網格1910。在第二選項1920中，重投影和網格反演系統1905可以接收具有深度資料和2D矩陣的MV網格。在一些示例中，重投影和網格反演系統1905可以從具有深度資料和2D矩陣的MV網格產生MV網格1910。如果重投影和網格反演系統1905接收深度和3D矩陣（第一選項1915）或者如果重投影和網格反演系統1905接收MV網格及/或2D矩陣（第二選項1920），則重投影和網格反演系統1905使用其座標計算系統來計算輸出座標（outCoord）和輸出資料（outData）。在一些示例中，輸出資料可以包括輸出運動向量（outMV）和輸出深度（outDepth）。重投影和網格反演系統1905還可以輸出附加的輸出資料（作為outData的一部分），諸如置信度（outConf）及/或遮擋（outOcc），以決定遮擋區域的位置。來自重投影和網格反演系統1905的輸出可以作為輸出資料輸出到一或多個緩衝器、快取記憶體或其它記憶體。在一個說明性示例中，圖19右側所示的輸出緩衝器（或快取記憶體或其它記憶體）包括用於深度的輸出緩衝器（或快取記憶體或其它記憶體）、用於MV網格（例如，具有深度及/或置信度）的輸出緩衝器（或快取記憶體或其它記憶體），以及用於遮擋的輸出緩衝器（或快取記憶體或其它記憶體）。這些輸出緩衝器（或快取記憶體或其它記憶體）可以被輸出作為多個輸出圖像。預取和快取記憶體機制可以同時處理三個緩衝器。因為每個輸出緩衝器可以在每個圖塊中儲存不同數量的位元，所以預取和快取記憶體機制可以在每個階段處理所有不同級別的位元與不同大小的圖塊之間的同步。In a first option 1915, the reprojection and grid inversion system 1905 can receive depth data and 3D matrices. In some examples, the reprojection and mesh inversion system 1905 can generate the MV mesh 1910 from the depth data and the 3D matrix. In a second option 1920, the reprojection and mesh inversion system 1905 may receive the MV mesh with depth data and a 2D matrix. In some examples, the reprojection and mesh inversion system 1905 can generate the MV mesh 1910 from the MV mesh with depth data and a 2D matrix. Reproject if reprojection and grid inversion system 1905 receives depth and 3D matrices (first option 1915) or if reprojection and grid inversion system 1905 receives MV grids and/or 2D matrices (second option 1920) And the grid inversion system 1905 uses its coordinate calculation system to calculate the output coordinates (outCoord) and output data (outData). In some examples, the output data may include an output motion vector (outMV) and an output depth (outDepth). The reprojection and grid inversion system 1905 may also output additional output data (as part of outData), such as confidence (outConf) and/or occlusion (outOcc), to determine the location of the occlusion region. Output from the reprojection and grid inversion system 1905 may be output as output data to one or more buffers, caches, or other memories. In one illustrative example, the output buffer (or cache or other memory) shown on the right side of Figure 19 includes an output buffer (or cache or other memory) for depth, an output buffer (or cache or other memory) for MV An output buffer (or cache or other memory) for the mesh (eg, with depth and/or confidence), and an output buffer (or cache or other memory) for occlusion. These output buffers (or cache or other memory) can be output as multiple output images. The prefetch and cache mechanisms can handle three buffers simultaneously. Because each output buffer can store a different number of bits per tile, the prefetch and cache mechanisms can handle all the different levels of bits and tiles of different sizes at each stage. Synchronize.

在一些示例中，重投影和網格反演系統1905使用被專門設計用於在運動向量操縱、座標計算、快取記憶體、預取和產生輸出緩衝器方面高效的專用硬體。在一些態樣中，某些操作可以使用諸如CPU或GPU的處理器來執行。In some examples, the reprojection and mesh inversion system 1905 uses specialized hardware specifically designed to be efficient in motion vector manipulation, coordinate calculations, caching, prefetching, and generating output buffers. In some aspects, certain operations may be performed using a processor such as a CPU or GPU.

在一些示例中，輸出置信度（outConf）不是專門為重投影產生的，而是來自深度感測器的深度測量的副產品。在一些示例中，獲取的深度可能遭受測量不準確及/或可以由置信度圖表示的其它問題。基於置信度圖及/或視覺（RGB）圖像改進深度可能是有益的。重投影和網格反演系統1905可以重投影深度和置信度以與視覺（RGB）圖像匹配，並允許置信度用於重投影圖像中的正確域。一旦深度與RGB圖像匹配，重投影和網格反演系統1905就可以使用置信度來提高深度。In some examples, the output confidence (outConf) is not generated specifically for reprojection, but is a by-product of the depth measurement from the depth sensor. In some examples, the acquired depth may suffer from measurement inaccuracies and/or other issues that may be represented by a confidence map. Improving depth based on confidence maps and/or visual (RGB) images may be beneficial. The reprojection and grid inversion system 1905 can reproject the depth and confidence to match the visual (RGB) image and allow the confidence to be used for the correct domain in the reprojected image. Once the depth matches the RGB image, the reprojection and grid inversion system 1905 can use confidence to improve the depth.

在一些示例中，成像系統可以使用「三角行走」操作來決定來自輸入圖像（例如，第一圖像Img1 510、擷取的圖像1710）的給定圖元應當被移動到重投影的圖像（例如，第二圖像Img2、515，重投影的圖像1715）中的位置。In some examples, the imaging system may use a "triangle walking" operation to determine that a given primitive from the input image (eg, first image Img1 510, captured image 1710) should be moved to the reprojected image The position in the image (eg, second image Img2, 515, reprojected image 1715).

圖20是示出三角行走操作的示例的概念圖2000。在一些示例中，來自輸入圖像的不同圖元可以移動到重投影的圖像中的不同位置。該系統可以一次處理X個輸入，其中X等於任何整數值（例如，3、4、5、6、10等）。該系統可以產生Y個輸出三角形（例如，每組輸入），其中Y等於任何整數值（例如，6、7、8、9、10、15等）。輸入中的圖元包括圖元a、圖元b、圖元c等。在一些示例中，來自輸入圖像中的圖元a的圖元資料可以移動到重投影的圖像中的第一位置，來自輸入圖像中的圖元b的圖元資料可以移動到重投影的圖像中的第二位置，並且來自輸入圖像中的圖元c的圖元資料可以移動到重投影的圖像中的第三位置，依此類推。通過地圖（例如，MV網格505或反演MV網格520），該系統找出輸入圖像之每一者圖元應當在重投影的圖像中的位置。因此，在說明性示例中，輸入圖像的圖元a結束於輸出的圖元2010，該輸入的圖元b結束於輸出的圖元2015，並且該輸入的圖元1結束於輸出的圖元2020，以此類推。對於每個輸入圖元，成像系統計算輸入圖元的值被配置為在輸出中結束的位置。對於輸出中的特定圖元之間的區域（例如，圖元2010、2015和2020之間的陰影三角形區域），成像系統使用插補填充該區域。為了執行插補，成像系統可以讓處理器（例如，GPU或其它處理器）分別遍歷每個三角形，並為每個輸出圖元單獨地一個接一個地進行插補。FIG. 20 is a conceptual diagram 2000 showing an example of a triangular walking operation. In some examples, different primitives from the input image can be moved to different locations in the reprojected image. The system can process X inputs at a time, where X equals any integer value (e.g., 3, 4, 5, 6, 10, etc.). The system can produce Y output triangles (e.g., for each set of inputs), where Y equals any integer value (e.g., 6, 7, 8, 9, 10, 15, etc.). The primitives in the input include primitive a, primitive b, primitive c, etc. In some examples, primitive data from primitive a in the input image can be moved to the first position in the reprojected image, and primitive data from primitive b in the input image can be moved to the first position in the reprojected image. to the second position in the image, and the primitive data from primitive c in the input image can be moved to the third position in the reprojected image, and so on. Through a map (eg, MV grid 505 or inverted MV grid 520), the system finds where each primitive of the input image should be in the reprojected image. Thus, in the illustrative example, primitive a of the input image ends in primitive primitive 2010 of output, primitive primitive b of the input ends in primitive primitive 2015 of output, and primitive primitive 1 of the input ends in primitive primitive 2015 of output 2020, and so on. For each input primitive, the imaging system calculates the position at which the input primitive's value is configured to end in the output. For the area between specific primitives in the output (for example, the shaded triangle area between primitives 2010, 2015, and 2020), the imaging system uses interpolation to fill the area. To perform interpolation, the imaging system can have a processor (eg, a GPU or other processor) iterate through each triangle individually and perform interpolation for each output primitive individually, one after the other.

然而，為了提高效率，成像系統可以將三角形放在一起形成大的多邊形，即，由圖20的輸出側的所有三角形（包括具有圖元2010、2015和2020的三角形）組合構成的多邊形。成像系統可以具有被專門設計用於有效插補的專用硬體處理器，或者讓其它處理器（例如，GPU或其它處理器）執行插補。成像系統使用處理器（例如，GPU）分別遍歷每個三角形並分別對每個輸出圖元進行插補可能效率低下，因為這些三角形中的許多三角形包括靠近在一起且相似的圖像資料。為了提高效率，成像系統可以將三角形合併成多邊形，並且可以讓處理器（例如，GPU）一次遍歷整個多邊形，對整個多邊形的圖元執行插補。However, to improve efficiency, the imaging system can put the triangles together to form a large polygon, that is, a polygon composed of the combination of all triangles on the output side of Figure 20 (including triangles with primitives 2010, 2015, and 2020). The imaging system may have a dedicated hardware processor specifically designed for efficient interpolation, or have other processors (eg, a GPU or other processor) perform the interpolation. Imaging systems using a processor (e.g., GPU) to iterate through each triangle individually and interpolate each output primitive separately can be inefficient because many of these triangles include image material that is close together and similar. To increase efficiency, the imaging system can combine triangles into polygons and can have the processor (e.g., GPU) perform interpolation on the entire polygon's primitives by traversing the entire polygon at once.

成像系統包括主行走引擎2025、N個三角控制引擎2030（其中N可以等於任何整數值，諸如6、8、10或其它值）和M個圖元插補引擎2035（其中M可以是等於任何整數值，諸如6、8、10或其它值，並且在一些實施方式中可能等於N）。主行走引擎2025（被示為帶有白色陰影的虛線的框）一次遍歷整個多邊形。N個三角形控制引擎2030，其中兩個三角形控制引擎被示為帶有虛線和淺色陰影的框，並且每個三角形控制引擎負責三角形中的一者。主行走引擎2025遍歷整個多邊形，有效地預掃瞄要由成像系統用於圖像重投影的輸出位置及/或區域，允許成像系統及早從DRAM中預取及/或擷取資料（例如，圖塊）以快取記憶體資料，由此減少或消除原本可能由從DRAM中擷取資料引起的延遲（例如，填充、插補或其它影像處理操作的延遲）。The imaging system includes a main walking engine 2025, N triangle control engines 2030 (where N can be equal to any integer value, such as 6, 8, 10 or other values), and M primitive interpolation engines 2035 (where M can be equal to any integer value). A numerical value such as 6, 8, 10 or other values, and in some embodiments may be equal to N). The main walking engine 2025 (shown as a dashed box with white shading) traverses the entire polygon at a time. N triangle control engines 2030, two of which are shown as boxes with dashed lines and light shading, and each triangle control engine is responsible for one of the triangles. The main walking engine 2025 traverses the entire polygon, effectively pre-scanning the output locations and/or areas to be used by the imaging system for image reprojection, allowing the imaging system to early prefetch and/or retrieve data from DRAM (e.g., image blocks) to cache memory data, thereby reducing or eliminating delays that might otherwise be caused by retrieving data from DRAM (e.g., delays in padding, interpolation, or other image processing operations).

圖21是示出遮擋遮蔽的示例的概念圖2100。遮擋區域是重投影的圖像的區域，在該區域中圖像重投影引擎215沒有圖像資料可用。如先前提及，圖像重投影引擎215對原始擷取的圖像中不具有特定值的區域執行插補。即使對於遮擋區域，仍執行該插補，例如以避免彼等區域被不可靠的資料（例如，DRAM中發生的任何資料）填充。圖像2110可以是使用這種不可靠資料進行填充的示例。為了執行重投影，某些物件（諸如工具箱）可能會在某些方向（例如，水平）稍微拉伸，但是這種拉伸大體不足以產生負面影響，並且在一些情況下可以增強新視角在重投影的圖像中外觀。然而，在某些區域中，孔洞或間隙超過閾值大小，超過該閾值大小，插補就可能不可靠，圖像重投影引擎215可以將該區域決定為遮擋區域。Figure 21 is a conceptual diagram 2100 illustrating an example of occlusion masking. Occlusion areas are areas of the reprojected image where image reprojection engine 215 has no image material available. As mentioned previously, the image reprojection engine 215 performs interpolation on regions in the original captured image that do not have specific values. This interpolation is performed even for occluded areas, eg to avoid filling those areas with unreliable data (eg any data occurring in DRAM). Image 2110 may be an example of filling using such unreliable material. In order to perform reprojection, some objects (such as toolboxes) may be stretched slightly in some directions (e.g., horizontally), but this stretching is generally not enough to have a negative impact, and in some cases can enhance the new perspective in Appearance in the reprojected image. However, in some areas, holes or gaps exceed a threshold size, beyond which interpolation may be unreliable, and the image reprojection engine 215 may determine the area to be an occlusion area.

在一些示例中，圖像重投影引擎215可以基於拐角深度來決定遮擋區域是否存在。例如，如果區域的拐角處的深度之間的差異超過閾值差異，則圖像重投影引擎215可以決定該區域中存在遮擋區域（例如，如圖20的三角形或其它形狀）。閾值差可以基於深度的最小值而改變。In some examples, image reprojection engine 215 may determine whether an occlusion region exists based on corner depth. For example, if the difference between the depths at the corners of a region exceeds a threshold difference, the image reprojection engine 215 may determine that there is an occlusion region (eg, a triangle or other shape of Figure 20) in the region. The threshold difference can be changed based on the depth minimum.

一旦圖像重投影引擎215（例如，基於區域的拐角處的深度之間的差異超過閾值差）定義遮擋區域存在，圖像重投影引擎215就可以執行修復以填充帶有圖像資料的重投影的圖像的遮擋區域。圖像2110中的「不可靠殘留物」可以表示使用遮擋區域中的工具箱圖像資料的部分進行的一種修復形式。在一些情況下，這種類型的修復可能效果很好，即使它在圖像2110中看起來不尋常也是如此。在一些示例中，可以使用深度學習（例如使用一或多個訓練過的ML模型）來執行遮擋。Once the image reprojection engine 215 defines that an occlusion region exists (e.g., based on the difference between the depths at the corners of the region exceeding a threshold difference), the image reprojection engine 215 can perform repair to fill in the reprojection with the image material. occluded area of the image. The "unreliable residue" in image 2110 may represent a form of repair using portions of the toolbox image data in occluded areas. In some cases this type of repair may work well, even if it looks unusual in image 2110. In some examples, occlusion can be performed using deep learning (e.g., using one or more trained ML models).

圖22是示出孔洞填充的示例的概念圖2200。孔洞填充是指在不存在運動向量資料的間隙中進行插補。流程2220圖示，在關閉孔洞填充的情況下，重投影的圖像具有許多視覺偽影，例如在工具箱和相機附近的其它物件上特別明顯的視覺偽影圖案中具有黑點和白點。在開啟孔洞填充後，重投影的圖像中的孔洞將使用插補法進行填充，並且圖像看起來乾淨且沒有此類視覺偽影或視覺偽影圖案。在一些示例中，孔洞填充可以使用修復（諸如基於深度學習的修復）作為插補的替代或補充。Figure 22 is a conceptual diagram 2200 showing an example of hole filling. Hole filling refers to interpolating gaps where motion vector data does not exist. Flow 2220 illustrates that with hole filling turned off, the reprojected image has many visual artifacts, such as black and white dots in the pattern of visual artifacts that are particularly noticeable on toolboxes and other objects near the camera. When hole filling is turned on, holes in the reprojected image are filled using interpolation, and the image looks clean and free of such visual artifacts or patterns of visual artifacts. In some examples, hole filling may use inpainting (such as deep learning-based inpainting) as an alternative to or in addition to interpolation.

圖23是示出由時間扭曲引擎230執行的時間扭曲705的附加示例的概念圖2300。時間扭曲引擎230計算密集光流，在此分別在圖框n+1與圖框n之間以及在圖框n與圖框n-1之間的密集光流。輸入圖框率（單位為每秒圖框數（FPS））等於Fin，其可以是30 FPS、60 FPS、120 FPS、240 FPS或其它圖框率。輸出圖框率等於Fout，其可以是60 FPS、120 FPS、240 FPS、480 FPS或其它圖框率。這些密集光流以高品質計算，但是可能計算昂貴及/或使用大量功率。類似於圖7的時間扭曲705，時間扭曲引擎230劃分密集光流以在其它圖框之間（例如在圖框n-1與n之間或在圖框n與n+之間1）產生更小的部分光流。例如，時間扭曲引擎230劃分密集光流以針對圖框n + ¾、n + ½、n+ ¼、n – ¼、n – ½和n – ¾產生更小的部分光流。這些部分光流可以作為光流的替代，就好像每個部分光流都是直接使用光流計算來計算的一樣。這些部分光流可以像該示例例中那樣被分解成四分之一，或其它類似分數。這些部分光流在存在於圖框n+¾、n+½、n+¼、n–¼、n–½和n–¾處時可以用於改進現有圖框。這些部分光流可以用於在圖框n+¾、n+½、n+¼、n–¼、n–½和n–¾處產生新的插補圖框。在一些示例中，時間扭曲705可以用於藉由首先以較低圖框率（例如，30或60 fps）並使用時間扭曲705將計算出的密集光流劃分為中間圖框的光流來以高圖框率（例如，90、120、240、480或960 fps）為視訊產生密集光流。23 is a conceptual diagram 2300 illustrating additional examples of time warping 705 performed by time warping engine 230. The time warp engine 230 computes dense optical flow, here between frame n+1 and frame n and between frame n and frame n-1 respectively. The input frame rate (in frames per second (FPS)) is equal to Fin, which can be 30 FPS, 60 FPS, 120 FPS, 240 FPS, or other frame rates. The output frame rate is equal to Fout, which can be 60 FPS, 120 FPS, 240 FPS, 480 FPS or other frame rates. These dense optical flows are computed with high quality, but may be computationally expensive and/or use large amounts of power. Similar to time warp 705 of Figure 7, time warp engine 230 divides the dense optical flow to produce smaller pixels between other frames (eg, between frames n-1 and n or between frames n and n+1). partial optical flow. For example, the time warp engine 230 divides the dense optical flow to produce smaller partial optical flows for frames n + ¾, n + ½, n + ¼, n – ¼, n – ½, and n – ¾. These partial optical flows can be used as surrogates for optical flow, as if each partial optical flow was calculated directly using optical flow calculations. These partial optical flows can be broken down into quarters as in this example, or other similar fractions. These partial optical flows can be used to improve existing frames when present at frames n+¾, n+½, n+¼, n–¼, n–½, and n–¾. These partial optical flows can be used to generate new interpolated frames at frames n+¾, n+½, n+¼, n–¼, n–½, and n–¾. In some examples, time warping 705 may be used by first dividing the computed dense optical flow into optical flow for intermediate frames at a lower frame rate (eg, 30 or 60 fps) and using time warping 705 High frame rates (e.g., 90, 120, 240, 480, or 960 fps) produce dense optical flow for video.

在一些示例中，時間扭曲引擎230可以採用光流的運動向量，將運動向量與全域矩陣組合，並且在組合之後如在時間扭曲705中一樣將結果劃分成部分光流或運動向量。In some examples, time warp engine 230 may take the motion vectors of the optical flow, combine the motion vectors with the global matrix, and after combination divide the result into partial optical flow or motion vectors as in time warp 705.

圖示沒有時間扭曲705和使用時間扭曲705的圖像的圖像銳化益處的附加示例。使用時間扭曲705恢復細節，如箭頭所指的區域（例如中間圖像中男孩的頭髮、耳朵和T恤，以及右側圖像中的標記處的區域）指示。具體地，看起來模糊的邊緣及/或區域使用虛線表示，而看起來清晰銳化的邊緣及/或區域使用實線表示。Additional examples of image sharpening benefits for images without time warp 705 and with time warp 705 are illustrated. Use Time Warp 705 to restore detail, as indicated by the areas pointed by arrows (such as the boy's hair, ears, and T-shirt in the middle image, and the area at the mark in the right image). Specifically, edges and/or areas that appear blurry are represented by dashed lines, while edges and/or areas that appear sharp are represented by solid lines.

圖24是示出重投影引擎24341的一些示例中用於時間扭曲引擎230的示例性架構的方塊圖2400。光流引擎2420從具有圖像感測器2410和動態隨機存取記憶體（DRAM）2415的相機2405接收圖框n和圖框n-M。光流引擎2420產生運動資訊。在一些示例中，運動資訊包括兩種類型的運動資訊，其包括全域運動和局部運動。例如，矩陣（例如，全域矩陣）在一些情況下可以表示全域運動。光流引擎可以產生密集的運動向量網格來指示局部運動和3D運動。在其它示例中，密集的運動向量網格還可以指示全域運動及/或局部運動、3D運動與全域運動的組合。24 is a block diagram 2400 illustrating an exemplary architecture for the time warp engine 230 in some examples of the reprojection engine 24341. Optical flow engine 2420 receives frame n and frame n-M from camera 2405 having image sensor 2410 and dynamic random access memory (DRAM) 2415. The optical flow engine 2420 generates motion information. In some examples, the motion information includes two types of motion information, including global motion and local motion. For example, a matrix (eg, a global matrix) can represent global motion in some cases. Optical flow engines can produce dense grids of motion vectors to indicate local motion and 3D motion. In other examples, a dense motion vector grid may also indicate global motion and/or local motion, a combination of 3D motion and global motion.

網格反演引擎2425從光流引擎2420接收運動資訊（例如，密集的運動向量網格以及在一些情況下表示全域運動的矩陣）。網格反演引擎2425運行多次（M）次，每次運行劃分運動向量並輸出運動向量的不同部分。網格反演引擎2425輸出M個運動向量。在一些情況下，運動向量可以乘以一個因數。可以使用扭曲引擎2430來縮小運動向量以提供不同的解析度。扭曲引擎2430可以從密集網格接收運動向量並對密集運動網格執行一些扭曲、縮放及/或其它操作。在一些示例中，扭曲引擎2430也可以獲得變換矩陣並基於它來扭曲密集網格。在其它示例中，扭曲引擎2430可以獲得變換矩陣並將其與密集網格組合。由網格反演引擎2425及/或扭曲引擎2430輸出的反演運動向量被輸出到影像處理引擎2440以基於反演運動向量產生重投影的圖像。The grid inversion engine 2425 receives motion information from the optical flow engine 2420 (eg, a dense grid of motion vectors and, in some cases, a matrix representing global motion). The grid inversion engine 2425 is run multiple (M) times, each run dividing the motion vector and outputting a different portion of the motion vector. The grid inversion engine 2425 outputs M motion vectors. In some cases, the motion vector can be multiplied by a factor. The warp engine 2430 can be used to shrink motion vectors to provide different resolutions. Warp engine 2430 may receive motion vectors from the dense mesh and perform some warping, scaling, and/or other operations on the dense motion mesh. In some examples, the warp engine 2430 also obtains the transformation matrix and warps the dense mesh based on it. In other examples, the warp engine 2430 can obtain the transformation matrix and combine it with the dense mesh. The inverted motion vectors output by the mesh inversion engine 2425 and/or the warping engine 2430 are output to the image processing engine 2440 to generate a reprojected image based on the inverted motion vectors.

圖25是示出具有時間模糊的重投影引擎2535的一些示例中用於具有時間模糊的時間扭曲引擎230的示例性架構的方塊圖2500。圖25中的架構類似於圖24中的架構，但是系統的時間去模糊引擎2505（例如，基於運動偵測及/或圖像分析）決定哪些M圖框是模糊的並使用由網格反演引擎2425產生的部分運動向量來去模糊及/或銳化模糊圖框。在一些示例中，重投影引擎2535的時間深度學習演演演演算法分析姿態感測器資料並查看在擷取每個圖框期間移動了多少距離（以及因此有多少模糊）。在一些示例中，在一些情況下在進一步變換2520（例如，收縮）之後，原始運動向量從光流引擎2420提供給影像處理引擎2440。25 is a block diagram 2500 illustrating an exemplary architecture for a time warp engine 230 with temporal blur in some examples of a reprojection engine 2535 with temporal blur. The architecture in Figure 25 is similar to the architecture in Figure 24, but the system's temporal deblurring engine 2505 (eg, based on motion detection and/or image analysis) determines which M frames are blurred and uses grid inversion Partial motion vectors generated by engine 2425 are used to deblur and/or sharpen blurred frames. In some examples, the temporal deep learning algorithm of the reprojection engine 2535 analyzes the attitude sensor data and looks at how much distance was moved (and therefore how much blur was present) during the capture of each frame. In some examples, the original motion vectors are provided from the optical flow engine 2420 to the image processing engine 2440, in some cases after further transformation 2520 (eg, shrinking).

圖26是示出深度感測器支援引擎235的示例性架構的方塊圖2600。飛行時間（ToF）感測器是深度感測器的示例，但是深度感測器支援引擎235在一些示例中可以使用如本文描述的不同類型的深度感測器。可以應用後處理（例如藉由過濾掉離群值及/或將雜訊正規化）來清理來自深度感測器的深度值以提供更高品質的深度值。在一些情況下，後處理還可以接收置信度圖和深度，然後該後處理也可以清理置信度圖，及/或使用置信度圖來輔助深度處理。深度以及在一些情況下置信度被發送到重投影引擎，其可以基於3D變換來重投影深度圖像和置信度圖，例如以與圖像感測器（例如，廣角或長焦）對準。重投影引擎可以產生重投影的深度和置信度值，其可以藉由深度後處理再次運行以清理深度和置信度值。深度後處理還可以接受來自廣角和長焦感測器的圖像，及/或來自輔助深度感測器的輔助深度感測器資料（例如，DFS深度），並且深度後處理可以調整深度以進一步改進它並校正來自原始深度的不準確性。3D變換可以基於圖像感測器與深度感測器之間的3D校準。如果深度感測器和圖像感測器相對於彼此移動（例如，焦點改變、變焦、OIS及/或其它），則3D校準可以考慮到這一點並更新3D變換。應當理解，圖26底部的輔助深度流（即，具有廣角和長焦圖像的DFS）是說明性示例。在其它示例中，輔助深度可以來自另一個深度感測器、深度學習深度引擎及/或任何其它深度源。在一些示例中，深度後處理將沒有輔助深度。在一些示例中，深度後處理可以具有多於兩個深度源。26 is a block diagram 2600 illustrating an exemplary architecture of the depth sensor support engine 235. A time-of-flight (ToF) sensor is an example of a depth sensor, but the depth sensor support engine 235 may use different types of depth sensors as described herein in some examples. Post-processing (e.g., by filtering out outliers and/or normalizing noise) can be applied to clean up the depth values from the depth sensor to provide higher quality depth values. In some cases, the post-processing can also receive a confidence map and depth, and the post-processing can then also clean the confidence map, and/or use the confidence map to assist in depth processing. The depth and in some cases the confidence are sent to a reprojection engine, which can reproject the depth image and confidence map based on the 3D transformation, for example to align with the image sensor (eg wide angle or telephoto). The reprojection engine can produce reprojected depth and confidence values, which can be run again with depth post-processing to clean up the depth and confidence values. The depth post-processing can also accept images from the wide-angle and telephoto sensors, and/or secondary depth sensor data from the secondary depth sensor (e.g., DFS depth), and the depth post-processing can adjust the depth to further Improve it and correct inaccuracies from the original depth. The 3D transformation can be based on 3D calibration between the image sensor and the depth sensor. If the depth sensor and image sensor move relative to each other (eg, focus change, zoom, OIS, and/or other), the 3D calibration can take this into account and update the 3D transform. It should be understood that the auxiliary depth flow (i.e., DFS with wide-angle and telephoto images) at the bottom of Figure 26 is an illustrative example. In other examples, the auxiliary depth can come from another depth sensor, a deep learning depth engine, and/or any other depth source. In some examples, the depth postprocessing will have no auxiliary depth. In some examples, depth post-processing can have more than two depth sources.

圖27是示出由深度感測器支援引擎235執行的深度感測器支援805的附加示例的概念圖2700。在這些附加示例中，主圖像感測器（例如，RGB3）和深度感測器（例如，TOF系統）被示出在電路板上。圖示深度圖和圖像。在左側的示例（投影對準2705）中，一些元素對準，但是與相機距離不同的其它物件（諸如泰迪熊或人物的頭部）在圖像資料與深度資料之間未對準。例如，與熊的圖像資料相比，熊的深度資料（例如，使用虛線示出）在右邊（視差轉移）。類似地，與人物的圖像資料相比，人物的深度資料（例如，使用虛線示出）在右邊（視差轉移）。另一方面，在右邊的示例（基於深度的對準2710）中，視差是固定的並且每個物件的深度資料和圖像資料是對準的。27 is a conceptual diagram 2700 illustrating additional examples of depth sensor support 805 performed by depth sensor support engine 235. In these additional examples, the main image sensor (eg, RGB3) and depth sensor (eg, TOF system) are shown on the circuit board. Illustrated depth maps and images. In the example on the left (projection alignment 2705), some elements are aligned, but other objects at different distances from the camera (such as a teddy bear or a character's head) are misaligned between the image data and the depth data. For example, the bear's depth data (e.g., shown using a dashed line) is on the right (parallax shift) compared to the bear's image data. Similarly, the depth profile of a person (e.g. shown using dashed lines) is on the right (parallax shift) compared to the image profile of the person. On the other hand, in the example on the right (depth-based alignment 2710), the disparity is fixed and the depth data and image data of each object are aligned.

圖28是示出包括圖像重投影引擎215及/或3D穩定化引擎240的成像系統的示例性架構的方塊圖2800。成像系統接受輸入並將視角重投影到環境中的新位置。在3D穩定化的情況下，可以進行這種重投影以減少或消除相機的擺動，及/或模擬相機穩定及/或穩定化的情況，使得任何移動都沒有（或有很少的）擺動或信號干擾。例如，成像系統的3D穩定化引擎240可以建立虛擬路徑，就好像視訊是沿著包括很少或沒有信號干擾及/或擺動的虛擬路徑擷取的一樣。成像系統還可以用於本文描述的圖像重投影的至少一些其它應用，諸如時間扭曲、頭部姿態校正、感測器支持等。成像系統接收圖像資料及/或深度資料作為輸入，對資料中的任何失真進行穩定化或以其它方式校正資料中的任何失真，然後將資料提供給重投影引擎。對於3D穩定化，成像系統的3D穩定化引擎240可以建立指示穩定平滑虛擬路徑的矩陣。成像系統可以建立3D變換以改變圖像的視角。例如，對於3D穩定化，3D變換可以改變一系列圖像的相應視角，使得圖像的相應視角具有沿著虛擬路徑（例如，穩定的平滑虛擬路徑）的原點。3D變換以及在一些情況下虛擬路徑可以被饋送到重投影引擎。重投影引擎可以產生運動向量（MVGrid）以將圖像扭曲到標識的視角（例如，使得擷取視角沿著虛擬路徑）。在一些示例中，成像系統可以使用另一個運動向量網格對圖像執行鏡頭失真校正（LDC）及/或捲簾快門校正（RSC）以減少來自鏡頭及/或捲簾快門的任何失真。在其它示例中，也可以使用運動向量及/或矩陣來校正其它失真及/或變換誤差。如圖30中所示，在一些示例中，3D穩定化和用於LDC和RSC的網格藉由組合來自兩者的運動向量而組合在一起，並且一起扭曲。新的一組MV既可以進行3D穩定化，也可以進行LDC和RSC。在一些示例中，LDC和RSCMV網格可以比3D穩定化MV網格更稀疏，在這種情況下，LDC和RSCMV網格可以在組合之前被放大。在一些示例中，3D穩定化MV網格可以比LDC和RSCMV網格更稀疏，在這種情況下，3D穩定化MV網格可以在組合之前被放大。可以將組合的MV網格發送到執行扭曲的扭曲引擎。圖示所得圖像，其中應用了3D穩定化（經由重投影）、LDC和RSC。28 is a block diagram 2800 illustrating an exemplary architecture of an imaging system including image reprojection engine 215 and/or 3D stabilization engine 240. The imaging system accepts the input and reprojects the perspective to a new location in the environment. In the case of 3D stabilization, this reprojection can be done to reduce or eliminate camera wobble, and/or simulate the situation of camera stabilization and/or stabilization such that any movement has no (or very little) wobble or Signal interference. For example, the imaging system's 3D stabilization engine 240 can establish a virtual path as if the video was captured along a virtual path that includes little or no signal interference and/or wobble. The imaging system may also be used for at least some other applications of image reprojection described herein, such as time warping, head pose correction, sensor support, and the like. The imaging system receives image data and/or depth data as input, stabilizes or otherwise corrects any distortion in the data, and then provides the data to the reprojection engine. For 3D stabilization, the imaging system's 3D stabilization engine 240 may build a matrix indicating a stabilized smooth virtual path. Imaging systems can create 3D transformations to change the perspective of an image. For example, for 3D stabilization, the 3D transformation can change the corresponding viewing angles of a series of images such that the corresponding viewing angles of the images have origins along a virtual path (eg, a stabilized smooth virtual path). 3D transformations and in some cases virtual paths can be fed to the reprojection engine. The reprojection engine can generate motion vectors (MVGrid) to warp the image to the identified perspective (e.g., so that the captured perspective follows a virtual path). In some examples, the imaging system may perform lens distortion correction (LDC) and/or rolling shutter correction (RSC) on the image using another motion vector grid to reduce any distortion from the lens and/or rolling shutter. In other examples, motion vectors and/or matrices may also be used to correct other distortions and/or transformation errors. As shown in Figure 30, in some examples, 3D stabilization and meshes for LDC and RSC are combined by combining motion vectors from both, and warped together. The new set of MVs can perform both 3D stabilization and LDC and RSC. In some examples, the LDC and RSCMV meshes can be sparser than the 3D stabilized MV mesh, in which case the LDC and RSCMV meshes can be amplified before combining. In some examples, the 3D stabilized MV mesh can be sparser than the LDC and RSCMV meshes, in which case the 3D stabilized MV mesh can be enlarged before combining. The combined MV mesh can be sent to the warp engine which performs the warp. The figure shows the resulting image, to which 3D stabilization (via reprojection), LDC and RSC were applied.

由於將重投影用於3D穩定化，因此遮擋區域可能仍保留在所得圖像中。深度重投影、遮擋圖、圖像的低解析度副本（例如，具有完整視野（FoV））及/或來自圖像的Q個高解析度補丁（例如，大小為64x64的500個補丁，或具有任何合適大小的其它數量的補丁）可以被發送到深度學習引擎（NSP）來執行修復。例如，3D穩定化引擎240可以從一個區域中取出補丁，但是不需要讀取另一個區域。由於遮擋圖，3D穩定化引擎240知道要用高解析度補丁聚焦哪些區域。在一些示例中，補丁和遮擋圖很小（例如，遮擋圖是二進位的或者可以包括少量位元，諸如3位元、4位元、6位元等），使得補丁成為用於執行修復的深度學習引擎（NSP）的廉價輸入。深度重投影可以有助於確保使用正確類型的材料進行修復。例如，深度學習引擎（NSP）將不會使用像工具箱此類附近物件來修復背景區域-唯一應當用於修復背景區域的資料是來自背景區域的圖像資料深度。這種智慧修復非常高效且使用功率減少。Since reprojection is used for 3D stabilization, occluded areas may still remain in the resulting image. Depth reprojection, occlusion map, low-resolution copy of the image (e.g., with full field of view (FoV)) and/or Q high-resolution patches from the image (e.g., 500 patches of size 64x64, or with Any other number of patches of suitable size) can be sent to the deep learning engine (NSP) to perform repair. For example, the 3D stabilization engine 240 may take patches from one region without reading another region. Thanks to the occlusion map, the 3D stabilization engine 240 knows which areas to focus with high-resolution patches. In some examples, the patches and occlusion maps are small (e.g., the occlusion map is binary or can include a small number of bits, such as 3 bits, 4 bits, 6 bits, etc.), making the patch useful for performing repairs. Cheap input for deep learning engines (NSP). Depth reprojection can help ensure that the correct type of material is used for the restoration. For example, the deep learning engine (NSP) will not use nearby objects like toolboxes to repair background areas - the only data that should be used to repair background areas is the depth of the image data from the background area. This smart fix is very efficient and uses less power.

在一些示例中，修復可以使用時間過濾，例如使用視訊中的先前圖像來產生特定區域的圖像內容。例如，如果先前圖像在當前圖像圖框中的遮擋區域中圖示的場景區域中具有清晰的圖像內容，則來自先前圖像的圖像資料可以用於修復及/或用於3D穩定化來平息任何擺動。補丁可以與壓縮圖塊對準，使得由深度學習引擎（NSP）輸出的修復補丁可以移動到記憶體中（例如，直接移動到DRAM中）以用於所得圖像的相關部分。In some examples, restoration can use temporal filtering, such as using previous images in a video to generate image content for a specific region. For example, if the previous image has clear image content in the scene area illustrated in the occlusion area in the current image frame, the image material from the previous image can be used for inpainting and/or for 3D stabilization to calm any wobbles. Patches can be aligned with compressed tiles such that the inpainted patches output by the deep learning engine (NSP) can be moved into memory (e.g., directly into DRAM) for the relevant portions of the resulting image.

圖29是示出與沒有時間扭曲引擎230處理的圖像相比使用時間扭曲引擎230執行的時間扭曲705的附加示例的概念圖2900。尤其是在圖像的邊緣和拐角處和周圍，具有時間扭曲引擎230的示例比沒有時間扭曲引擎230的圖像看起來更清晰銳化。例如，看起來模糊的邊緣在圖29中使用虛線再現，而看起來共享和清晰的邊緣在圖29中使用實線再現。29 is a conceptual diagram 2900 illustrating an additional example of time warping 705 performed using time warping engine 230 compared to an image processed without time warping engine 230. Especially in and around the edges and corners of the image, the sample with the time warp engine 230 looks sharper than the image without the time warp engine 230. For example, edges that look blurry are reproduced using dashed lines in Figure 29, while edges that look shared and clear are reproduced using solid lines in Figure 29.

圖30是示出由3D穩定化引擎240執行的3D穩定化905的附加示例3005的概念圖3000。附加示例3005包括視訊的四個視訊圖框，其以原始（非穩定的）和穩定的形式示出。如先前討論，重投影用於消除擺動及/或視差移動。30 is a conceptual diagram 3000 illustrating an additional example 3005 of 3D stabilization 905 performed by the 3D stabilization engine 240. Additional example 3005 includes four video frames of a video, shown in original (unsteady) and stabilized forms. As discussed previously, reprojection is used to eliminate wobble and/or parallax motion.

圖31是示出由3D變焦引擎245執行的3D變焦1005的附加示例的概念圖3100。數位變焦3105裁剪和放大，如圖左側的虛線框和虛線所示。滑板運動員的深度圖像與基於3D深度的變焦一起示出。基於3D深度的變焦使用基於深度圖像的重投影來模擬使相機更靠近滑板運動員，如使電話更靠近男人的圖示3110中所示。31 is a conceptual diagram 3100 illustrating an additional example of 3D zoom 1005 performed by 3D zoom engine 245. Digital zoom 3105 crops and enlarges, as shown by the dotted box and dotted line on the left side of the figure. A depth image of a skateboarder is shown with 3D depth-based zoom. 3D depth-based zoom uses depth-based image reprojection to simulate moving the camera closer to the skateboarder, as shown in illustration 3110 of moving the phone closer to the man.

圖32是示出由重投影SAT引擎250執行的重投影1105的附加示例的概念圖3200。重投影1105使用從一個感測器的視角到不同感測器的視角的重投影將視角轉移一定偏移量。32 is a conceptual diagram 3200 illustrating additional examples of reprojection 1105 performed by the reprojection SAT engine 250. Reprojection 1105 shifts the perspective by an offset using reprojection from one sensor's perspective to a different sensor's perspective.

圖33是示出由頭部姿態校正引擎255執行的頭部姿態校正1205的附加示例的概念圖3300。圖示作為重投影基礎的女人頭部的深度圖像3515。還圖示重投影的圖像1215的遮擋圖3320。在輸入圖像1210下方圖示人相對於相機的相對位置的圖示，表明相機正在從使用者面部略下方角度略微向上拍攝照片。在重投影的圖像1215下方圖示人相對於相機的模擬的相對位置的圖示，表明模擬的相機位置正在從與使用者面部的海拔或高度匹配的海拔或高度、從與擷取輸入圖像1210的位置分開偏移距離3305以及從與擷取輸入圖像1210的角度分開偏移角度3310拍攝照片。重投影的圖像1215的擷取角度垂直於人的面部、身體及/或垂直於重力。33 is a conceptual diagram 3300 illustrating an additional example of head posture correction 1205 performed by head posture correction engine 255. Illustration of depth image 3515 of a woman's head as a basis for reprojection. Also illustrated is an occlusion map 3320 for the reprojected image 1215. An illustration illustrating the relative position of the person relative to the camera below the input image 1210 indicates that the camera is taking the photo from slightly below the user's face and slightly upward. Below the reprojected image 1215 is an illustration illustrating the simulated relative position of the person relative to the camera, indicating that the simulated camera position is retrieving the input image from an elevation or height that matches the elevation or height of the user's face, from The position of the image 1210 is separated by an offset distance 3305 and the photo is taken from an angle separated by an offset angle 3310 from the angle at which the input image 1210 was captured. The captured angle of the reprojected image 1215 is perpendicular to the person's face, body and/or perpendicular to gravity.

圖34是示出網格反演的附加示例的概念圖3400。針對具有太陽和雲的目標圖像示出原始MV網格和反演MV網格。使用星號（經由插補及/或修復）來填充缺失內容的示例，例如在輸入圖像中太陽的一部分被雲阻擋但是不在重投影的圖像中。使用圓圈示出衝突值的示例，例如雲和太陽都有資料，並且雲資料最終勝出，因為雲在太陽前面。Figure 34 is a conceptual diagram 3400 illustrating additional examples of grid inversion. The original MV grid and the inverted MV grid are shown for a target image with sun and clouds. An example of using asterisks (via interpolation and/or repair) to fill in missing content, such as where part of the sun is blocked by clouds in the input image but is not in the reprojected image. Use circles to show examples of conflicting values, such as where both clouds and the sun have data, and the cloud data ultimately wins because the clouds are in front of the sun.

圖35是示出基於深度學習的修復的使用的示例的概念圖3500。圖示多組圖像，該多組圖像中的每一者包括該組圖像中的一者中的遮擋區域3505。在使用訓練過的深度學習修復引擎（諸如神經網路3900）填充之前，遮擋區域被示為空白。Figure 35 is a conceptual diagram 3500 illustrating an example of the use of deep learning-based repair. A plurality of sets of images are illustrated, each of the plurality of images including an occluded region 3505 in one of the set of images. Occluded areas are shown as blank before being filled in using a trained deep learning inpainting engine such as Neural Network 3900.

圖36是示出不使用深度學習的修復的使用的示例的概念圖3600。多組圖像被示為按列佈置。第一列包括由網格反演引擎（RGE）輸出的圖像，其包括被示為空白的遮擋區域3605。第二列包括由網格反演引擎（RGE）輸出的圖像，其中發出修復以填充遮擋區域3605。例如，圖36的修復可以使用插補及/或線上或最近值修復。如圖所示，可以基於相似性及/或優先順序來選擇用於修復的補丁。第三列包括由網格反演引擎（RGE）輸出的沒有遮擋區域的圖像3605。第三列中的圖像包括遮擋區域3605在第一列圖像中的一些邊緣周圍的模糊或視覺「拖尾」，這可能看起來類似於運動模糊，並且可能由其它位置及/或使用網格反演引擎（RGE）變換的原始擷取的圖像中的物件的圖示引起的。Figure 36 is a conceptual diagram 3600 illustrating an example of the use of inpainting without deep learning. Groups of images are shown arranged in columns. The first column includes the image output by the Grid Inversion Engine (RGE), which includes the occluded area 3605 shown as blank. The second column includes images output by the Grid Inversion Engine (RGE) where inpaintings were issued to fill occluded areas 3605. For example, the repair of Figure 36 may use interpolation and/or online or nearest value repair. As shown, patches can be selected for remediation based on similarity and/or priority. The third column includes the image 3605 without occluded areas output by the Grid Inversion Engine (RGE). The images in the third column include blur or visual "tailing" around some edges of the occlusion area 3605 in the first column images, which may look similar to motion blur and may be caused by other locations and/or the use of mesh. The lattice inversion engine (RGE) transforms the representation of objects in the original captured image.

圖37是示出邊緣濾波器和深度濾波器在邊緣上的使用的示例的概念圖3700。在一些示例中，邊緣濾波器可以用於將深度資料及/或圖像資料中的塊狀邊緣平滑化，這可以減少圖像重投影中的視覺偽影。雖然過濾器被示為大小為3x3，但是過濾器在一些情況下可以更大（例如，4x4、6x6等）。邊緣過濾器可以偵測深度圖中的邊緣。邊緣上的深度過濾器可以減少不屬於任何物件的插補深度值。Figure 37 is a conceptual diagram 3700 illustrating an example of the use of edge filters and depth filters on edges. In some examples, edge filters can be used to smooth block edges in depth data and/or image data, which can reduce visual artifacts in image reprojection. Although the filter is shown as being 3x3 in size, the filter can be larger in some cases (eg, 4x4, 6x6, etc.). Edge filters detect edges in depth maps. Depth filters on edges reduce interpolated depth values that do not belong to any object.

圖38是示出重投影的示例的概念圖3800。感測器205包括擷取3D場景的圖像和深度資料（cam1深度）的相機cam1。相機間3D平移用於在3D空間中重投影在圖像中圖示的3D場景，以使用相機cam2的視角。使用虛線示出前向映射（例如，運動向量網格）。使用從cam2回到cam1的實線箭頭示出後向映射（例如，反演運動向量網格）。Figure 38 is a conceptual diagram 3800 illustrating an example of reprojection. Sensor 205 includes camera cam1 that captures images of the 3D scene and depth data (cam1 depth). Inter-camera 3D translation is used to reproject the 3D scene illustrated in the image in 3D space using the perspective of camera cam2. The forward mapping (eg, motion vector grid) is shown using dashed lines. Backward mapping (eg, inverted motion vector grid) is shown using a solid arrow from cam2 back to cam1.

圖39是示出可用於媒體處理操作的神經網路（NN）3900的示例的方塊圖。神經網路3900可以包括任何類型的深度網路，諸如迴旋神經網路（CNN）、自動編碼器、深度置信網路（DBN）、遞迴神經網路（RNN）、產生式對抗網路（GAN）及/或其它類型的神經網路。神經網路3900可以是成像系統200的一或多個訓練過的神經網路中的一者的示例，諸如應用程式引擎210中的任一者的神經網路，諸如圖像重投影引擎215、運動向量引擎220、網格反演引擎225、時間扭曲引擎230、深度感測器支援引擎235、3D穩定化引擎240、3D變焦引擎245、重投影SAT引擎250、頭部姿態校正引擎255、XR後期重投影引擎260、特效引擎265或其組合。Figure 39 is a block diagram illustrating an example of a neural network (NN) 3900 that may be used for media processing operations. Neural network 3900 may include any type of deep network, such as convolutional neural network (CNN), autoencoder, deep belief network (DBN), recurrent neural network (RNN), generative adversarial network (GAN) ) and/or other types of neural networks. Neural network 3900 may be an example of one of one or more trained neural networks of imaging system 200 , such as a neural network of any of application engines 210 , such as image reprojection engine 215 , Motion vector engine 220, grid inversion engine 225, time warp engine 230, depth sensor support engine 235, 3D stabilization engine 240, 3D zoom engine 245, reprojection SAT engine 250, head pose correction engine 255, XR Post-reprojection engine 260, special effects engine 265, or a combination thereof.

神經網路3900的輸入層3910包括輸入資料。輸入層3910的輸入資料可以包括表示一或多個輸入圖像圖框的圖元的資料，諸如媒體資料285、來自感測器205的感測器資料、來自虛擬內容產生器207的虛擬內容，或其組合。輸入層3910的輸入資料可以包括來自深度感測器的深度資料。輸入層3910的輸入資料可以包括運動向量及/或光流。輸入層3910的輸入資料可以包括矩陣。輸入層3910的輸入資料可以包括遮擋圖。The input layer 3910 of the neural network 3900 includes input data. Input data to input layer 3910 may include data representing primitives of one or more input image frames, such as media data 285, sensor data from sensor 205, virtual content from virtual content generator 207, or combination thereof. Input data to input layer 3910 may include depth data from a depth sensor. The input data of the input layer 3910 may include motion vectors and/or optical flow. The input data of the input layer 3910 may include matrices. Input data to input layer 3910 may include occlusion maps.

圖像可以包括來自圖像感測器的圖像資料，其包括原始圖元資料（包括基於例如拜耳過濾器的每圖元單一顏色）或處理過的圖元值（例如，RGB圖像的RGB圖元）。神經網路3900包括多個隱藏層3912A、3912B至3912N。隱藏層3912A、3912B至3912N包括「N」個隱藏層，其中「N」是大於或等於一的整數。隱藏層的數量可以根據給定應用的需要包括儘可能多的層。神經網路3900還包括輸出層3914，其提供由隱藏層3912A、3912B至3912N執行的處理產生的輸出。Images may include image data from an image sensor that includes raw primitive data (including a single color per primitive based on, for example, a Bayer filter) or processed primitive values (e.g., RGB for an RGB image) Graph element). Neural network 3900 includes multiple hidden layers 3912A, 3912B through 3912N. Hidden layers 3912A, 3912B through 3912N include "N" hidden layers, where "N" is an integer greater than or equal to one. The number of hidden layers can include as many layers as necessary for a given application. Neural network 3900 also includes an output layer 3914 that provides output resulting from processing performed by hidden layers 3912A, 3912B through 3912N.

在一些示例中，輸出層3914可以提供輸出圖像或其一部分，諸如修改後的媒體資料290、本文討論的任何重投影的圖像、本文討論的任何重投影的深度資料、本文討論的任何運動向量或光流、本文討論的任何修復圖像資料或其組合。In some examples, output layer 3914 may provide an output image or a portion thereof, such as modified media material 290 , any reprojected image discussed herein, any reprojected depth material discussed herein, any motion discussed herein vector or optical flow, any inpainted image material discussed in this article, or a combination thereof.

神經網路3900是互連過濾器的多層神經網路。可以訓練每個過濾器來學習表示輸入資料的特徵。與過濾器相關聯的資訊在不同層之間共享，並且每個層都在處理資訊時保留資訊。在一些情況下，神經網路3900可以包括前饋網路，在這種情況下不存在網路的輸出被回饋回自身的回饋連接。在一些情況下，網路3900可以包括遞迴神經網路，其可以具有允許在讀入輸入時跨節點攜帶資訊的循環。Neural Network 3900 is a multi-layer neural network of interconnected filters. Each filter can be trained to learn features that represent the input data. The information associated with the filter is shared between different layers, and each layer retains the information as it is processed. In some cases, neural network 3900 may include a feedforward network, in which case there is no feedback connection in which the output of the network is fed back to itself. In some cases, network 3900 may include a recurrent neural network, which may have loops that allow information to be carried across nodes as input is read in.

在一些情況下，可以藉由各個層之間的節點間互連在層之間交換資訊。在一些情況下，網路可以包括迴旋神經網路，它可能不會將一層之每一者節點都連結到下一層之每一者其它節點。在多層之間交換資訊的網路中，輸入層3910的節點可以啟動第一隱藏層3912A中的一組節點。例如，如圖所示，輸入層3910的每個輸入節點可以連接到第一隱藏層3912A的每個節點。隱藏層的節點可以藉由將啟動函數（例如，過濾器）應用到該資訊來變換每個輸入節點的資訊。從變換中得出的資訊然後可以被傳遞到並且可以啟動下一個隱藏層3912B的節點，這些節點可以執行它們自己指定的功能。示例性函數包括迴旋函數、縮小、放大、資料變換及/或任何其它合適的函數。隱藏層3912B的輸出然後可以啟動下一個隱藏層的節點，以此類推。最後一個隱藏層3912N的輸出可以啟動輸出層3914的一或多個節點，從而提供處理過的輸出圖像。在一些情況下，雖然神經網路3900中的節點（例如，節點3916）被示為具有多條輸出線，但是節點具有單個輸出並且示為從節點輸出的所有線都表示相同的輸出值。In some cases, information can be exchanged between layers via inter-node interconnections between layers. In some cases, a network may include a convolutional neural network, which may not connect every node in one layer to every other node in the next layer. In a network that exchanges information between multiple layers, a node in the input layer 3910 can activate a group of nodes in the first hidden layer 3912A. For example, as shown, each input node of input layer 3910 may be connected to each node of first hidden layer 3912A. Hidden layer nodes can transform the information of each input node by applying an activation function (e.g., a filter) to that information. The information derived from the transformation can then be passed to and enable the nodes of the next hidden layer 3912B, which can perform their own designated functions. Exemplary functions include convolution, reduction, expansion, data transformation, and/or any other suitable function. The output of hidden layer 3912B can then initiate the nodes of the next hidden layer, and so on. The output of the last hidden layer 3912N may activate one or more nodes of the output layer 3914, thereby providing a processed output image. In some cases, although a node in neural network 3900 (eg, node 3916) is shown as having multiple output lines, the node has a single output and all lines shown as output from the node represent the same output value.

在一些情況下，每個節點或節點之間的互連可以具有權重，該權重是從神經網路3900的訓練匯出的一組參數。例如，節點之間的互連可以表示關於互連節點學習的一條資訊。互連可以具有可以被調諧的可調諧數值權重（例如，基於訓練資料集），從而允許神經網路3900適應輸入並且能夠隨著處理越來越多的資料而學習。In some cases, each node or interconnection between nodes may have a weight, which is a set of parameters derived from the training of neural network 3900. For example, interconnections between nodes can represent a piece of information learned about the interconnected nodes. The interconnections may have tunable numerical weights that can be tuned (eg, based on a training data set), allowing the neural network 3900 to adapt to the input and learn as it processes more and more data.

預先訓練神經網路3900以使用不同的隱藏層3912A、3912B至3912N處理來自輸入層3910中的資料的特徵，以便經由輸出層3914提供輸出。Neural network 3900 is pre-trained to process features from the data in input layer 3910 using different hidden layers 3912A, 3912B through 3912N to provide an output via output layer 3914.

圖40是示出用於媒體處理操作的程序的流程圖。程序4000可以由媒體處理系統執行。在一些示例中，媒體處理系統可以包括例如圖像擷取和處理系統100、圖像擷取裝置105A、影像處理設備105B、影像處理器150、ISP 154、主機處理器152、成像系統200、HMD 310、移動手持終端410、重投影和網格反演系統2490、圖25的系統、圖26的系統、圖27的系統、圖28的系統、神經網路3900、計算系統4100、處理器4110或其組合。FIG. 40 is a flowchart showing a procedure for media processing operations. Process 4000 may be executed by a media processing system. In some examples, the media processing system may include, for example, image capture and processing system 100, image capture device 105A, image processing device 105B, image processor 150, ISP 154, host processor 152, imaging system 200, HMD 310. Mobile handheld terminal 410, reprojection and grid inversion system 2490, system of Figure 25, system of Figure 26, system of Figure 27, system of Figure 28, neural network 3900, computing system 4100, processor 4110 or its combination.

在操作4005處，媒體處理系統被配置為並且可以接收包括與環境相對應的深度資訊的深度資料。在一些示例中，深度資訊可以包括來自第一視角的環境的表示的深度測量。在一些示例中，深度資訊包括與環境相對應的點雲。在一些示例中，可以使用一或多個深度感測器擷取深度資料，該一或多個深度感測器諸如一或多個光探測和測距（LIDAR）感測器、無線電探測和測距（RADAR）感測器、聲音探測和測距（SODAR）感測器、聲音導航和測距（SONAR）感測器、飛行時間（ToF）感測器、結構光感測器或其組合。在一些示例中，可以使用一或多個相機及/或圖像感測器例如基於使用立體相機佈置進行的立體深度感測來擷取深度資料。在一些示例中，可以使用圖像擷取和處理系統100、感測器205、相機330A至330B、相機430A至430D、圖像感測器810、深度感測器815、長焦感測器1110、廣角感測器1115、感測器1125、圖像感測器2610、圖38中的cam1、圖38中的cam2、本文描述的任何其它感測器或其組合來擷取深度資料。深度資料的示例包括媒體資料285、深度資料620、深度資料1020、深度資料1160、深度資料1220、圖15的深度資料、深度圖1610、與第一選項1915相關聯的深度資料、深度輸入2402、圖26的深度、圖27的深度資料、圖28的深度資料、深度資料3315、深度圖像3410、圖37的深度圖、圖38的Cam1深度、本文描述的任何其它深度資料或其組合。At operation 4005, the media processing system is configured to and can receive depth data including depth information corresponding to the environment. In some examples, the depth information may include depth measurements from a first-view representation of the environment. In some examples, the depth information includes point clouds corresponding to the environment. In some examples, depth data may be captured using one or more depth sensors, such as one or more light detection and ranging (LIDAR) sensors, radio detection and ranging (LIDAR) sensors, radio detection and ranging (LIDAR) sensors, range (RADAR) sensor, sound detection and ranging (SODAR) sensor, sound navigation and ranging (SONAR) sensor, time of flight (ToF) sensor, structured light sensor or a combination thereof. In some examples, depth data may be captured using one or more cameras and/or image sensors, such as based on stereoscopic depth sensing using a stereoscopic camera arrangement. In some examples, image capture and processing system 100, sensor 205, cameras 330A-330B, cameras 430A-430D, image sensor 810, depth sensor 815, telephoto sensor 1110 may be used , wide-angle sensor 1115, sensor 1125, image sensor 2610, cam1 in Figure 38, cam2 in Figure 38, any other sensor described herein, or a combination thereof to capture depth data. Examples of depth data include media data 285, depth data 620, depth data 1020, depth data 1160, depth data 1220, depth data of Figure 15, depth map 1610, depth data associated with first option 1915, depth input 2402, Depth of Figure 26, depth information of Figure 27, depth information of Figure 28, depth information 3315, depth image 3410, depth map of Figure 37, Cam1 depth of Figure 38, any other depth information described herein, or combinations thereof.

在操作4010處，媒體處理系統被配置為並且可以接收由圖像感測器擷取的第一圖像資料，該第一圖像資料包括對環境的圖示。在一些示例中，可以使用圖像擷取和處理系統100、感測器205、相機330A至330B、相機430A至430D、圖像感測器810、深度感測器815、長焦感測器1110、廣角感測器1115、感測器1125、圖像感測器2610、圖38中的cam1、圖38中的cam2、本文描述的任何其它感測器或其組合來擷取第一圖像資料。第一圖像資料的示例包括媒體資料285、第一圖像Img1 510、相機圖像610、圖像710、圖9中的「orig」圖像、圖10的原始非變焦圖像（變焦前）、長焦圖像1130、輸入圖像1210、輸入圖像1310、輸入圖像1410、擷取的圖像1510、擷取的圖像1710、流程2310的輸入圖像Image1、流程2320的輸入圖像、圖25中沒有時間扭曲的輸入圖像705、圖24至圖25中的圖框n和n-M、圖25的m個模糊圖框、圖26的廣角和長焦圖像、圖27的輸入圖像、圖30的「orig」圖像、圖31的未變焦輸入圖像、圖34的輸入圖像、圖35的輸入圖像、圖36的輸入圖像、圖38中的原始圖元、提供給輸入層3910的圖像、本文描述的其它圖像資料或其組合。At operation 4010, the media processing system is configured and can receive first image data captured by the image sensor, the first image data including a representation of the environment. In some examples, image capture and processing system 100, sensor 205, cameras 330A-330B, cameras 430A-430D, image sensor 810, depth sensor 815, telephoto sensor 1110 may be used , wide-angle sensor 1115, sensor 1125, image sensor 2610, cam1 in Figure 38, cam2 in Figure 38, any other sensor described herein, or a combination thereof to capture the first image data . Examples of first image data include media data 285, first image Img1 510, camera image 610, image 710, the "orig" image in Figure 9, the original non-zoomed image of Figure 10 (before zooming) , telephoto image 1130, input image 1210, input image 1310, input image 1410, captured image 1510, captured image 1710, input image Image1 of process 2310, input image of process 2320 , the input image 705 without time distortion in Figure 25, the frames n and n-M in Figures 24 to 25, the m blurred frames in Figure 25, the wide-angle and telephoto images in Figure 26, the input image in Figure 27 Image, "orig" image in Figure 30, unzoomed input image in Figure 31, input image in Figure 34, input image in Figure 35, input image in Figure 36, original primitive in Figure 38, provided An image, other image material described herein, or a combination thereof is given to input layer 3910.

在操作4015處，媒體處理系統被配置為並且可以基於至少該深度資料產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量。第一複數個運動向量的示例包括MV網格505中的運動向量、圖15的運動向量（例如，MV _in、MV _x、MV _y）、MV 1620、圖23的密集MV、與光流引擎2420相關聯的運動向量、圖28的MV網格、圖34的原始MV和MV網格、圖38的前向映射、本文描述的其它運動向量或其組合。 At operation 4015, the media processing system is configured to and can generate a first plurality of motion vectors corresponding to changes in perspective of the illustration of the environment in the first image material based on at least the depth material. Examples of the first plurality of motion vectors include the motion vectors in MV grid 505, the motion vectors of Figure 15 (eg, _MVin , _MVx , _MVy ), MV 1620, the dense MV of Figure 23, and the optical flow engine 2420 The associated motion vectors, the MV grid of Figure 28, the original MV and MV grid of Figure 34, the forward map of Figure 38, other motion vectors described herein, or combinations thereof.

在操作4020處，媒體處理系統被配置為並且可以基於該第一複數個運動向量使用網格反演產生第二複數個運動向量，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離。第二複數個運動向量的示例包括反演MV網格520中的運動向量、反演MV 1630、反演MV 1730、與網格反演引擎2425相關聯的反演運動向量、圖28的MV網格、圖24的反演MV和MV網格、圖38的後向映射、本文描述的其它反演運動向量或其組合。At operation 4020, the media processing system is configured to and may use grid inversion to generate a second plurality of motion vectors based on the first plurality of motion vectors, the second plurality of motion vectors indicative of the first plurality of motion vectors in the first image material. The corresponding distance that the corresponding primitives of the representation of the environment move in response to the change in perspective. Examples of the second plurality of motion vectors include motion vectors in inversion MV grid 520, inversion MV 1630, inversion MV 1730, inversion motion vectors associated with grid inversion engine 2425, the MV net of Figure 28 lattice, the inverted MV and MV grid of Figure 24, the backward mapping of Figure 38, other inverted motion vectors described herein, or combinations thereof.

在操作4025處，媒體處理系統被配置為並且可以至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示。第二圖像資料的示例包括修改後的媒體資料290、第二圖像Img2 515、重投影的圖像615、圖像715、圖9的「穩定」圖像、圖10的3D變焦圖像、修改後的長焦圖像1140、重投影的圖像1215、輸入圖像1315、重投影的圖像1415、重投影的圖像1515、重投影的圖像1715、重投影的圖像1805、修復的圖像1815、重投影的圖像2110、重投影的圖像2115、流程2210的重投影的圖像、流程2220的重投影的圖像、圖23中具有時間扭曲的重投影的圖像705、使用影像處理引擎2440輸出的圖像、圖27的基於深度的對準2710圖像、圖29的時間扭曲圖像、圖30的「穩定」圖像、圖31的基於深度的3D變焦圖像、圖34的輸出圖像、圖35的輸出圖像、圖36的輸出圖像、圖38中的重投影的圖像、使用輸出層3914輸出的圖像、本文描述的其它圖像資料或其組合。At operation 4025, the media processing system is configured to and can generate second image data by, at least in part, modifying the first image data based on the second plurality of motion vectors, wherein the second image data includes from A second illustration of the environment from a different perspective than the first image material. Examples of second image data include modified media data 290, second image Img2 515, reprojected image 615, image 715, the "steady" image of Figure 9, the 3D zoomed image of Figure 10, Modified Telephoto Image 1140, Reprojected Image 1215, Input Image 1315, Reprojected Image 1415, Reprojected Image 1515, Reprojected Image 1715, Reprojected Image 1805, Repair image 1815, reprojected image 2110, reprojected image 2115, reprojected image of process 2210, reprojected image of process 2220, reprojected image 705 with time warping in Figure 23 , using the image output by the image processing engine 2440, the depth-based alignment 2710 image of Figure 27, the time warp image of Figure 29, the "stable" image of Figure 30, the depth-based 3D zoom image of Figure 31 , the output image of Figure 34, the output image of Figure 35, the output image of Figure 36, the reprojected image in Figure 38, the image output using the output layer 3914, other image materials described herein, or their combination.

在一些示例中，第二圖像資料包括被配置為圖示在第一時間與第三時間之間的第二時間處的環境的插補圖像。在此類示例中，第一圖像資料包括圖示至少在第一時間或第三時間中的至少一個時間處的環境的至少一個圖像。這種圖像插補的示例可以使用如圖7及/或圖23中的時間扭曲705來執行。在一些示例中，成像系統可以在不使用深度資料的情況下產生插補圖像。In some examples, the second image material includes an interpolated image configured to illustrate the environment at a second time between the first time and the third time. In such examples, the first image material includes at least one image illustrating an environment at least one of a first time or a third time. Examples of such image interpolation may be performed using time warping 705 as in Figure 7 and/or Figure 23. In some examples, the imaging system can produce interpolated images without using depth information.

在一些示例中，該第一圖像資料包括包括視差移動的複數個視訊資料圖框，其中該第二圖像資料包括該複數個視訊資料圖框的減少該視差移動的穩定變體。例如，3D穩定化905可以將視差移動、旋轉或其組合穩定、減少及/或消除，如圖9及/或圖30所示。In some examples, the first image data includes a plurality of video data frames that include parallax motion, and the second image data includes a stable variant of the plurality of video data frames that reduces the parallax motion. For example, 3D stabilization 905 may stabilize, reduce, and/or eliminate parallax movement, rotation, or a combination thereof, as shown in FIG. 9 and/or FIG. 30 .

在一些示例中，該第一圖像資料包括人從第一角度觀看該圖像感測器，並且該第二圖像資料包括此人從與該第一角度不同的第二角度觀看該圖像感測器。這種情況的示例包括頭部姿態校正1205，如圖12及/或圖33中所示。In some examples, the first image material includes the person viewing the image sensor from a first angle, and the second image material includes the person viewing the image from a second angle different from the first angle. sensor. Examples of this include head pose correction 1205, as shown in Figure 12 and/or Figure 33.

在一些示例中，視角變化包括根據角度並圍繞軸線的視角旋轉。在一些示例中，視角變化包括根據方向和距離的視角平移。在一些示例中，視角變化包括轉變。在一些示例中，該視角變化包括沿著軸線在該第一圖像資料中的對該環境的圖示的原始視角與物件在該環境中的位置之間的移動，其中該物件的至少一部分在該第一圖像資料中進行了圖示。在一些示例中，旋轉、平移、變換及/或移動可以基於執行本文例如在圖7至圖14的任何示例中描述的任何類型的重投影及/或扭曲所需的內容來標識。在一些示例中，可以使用使用者介面標識旋轉、平移、變換及/或移動。例如，在一些示例中，該視角變化包括視角的視差移動或圍繞軸線的視角旋轉中的至少一者，該方法還包括：經由使用者介面接收以下一者：對視角的視差移動的距離的指示，或對視角的旋轉角度或軸線的指示。In some examples, the viewing angle changes include viewing angle rotations based on angles and about an axis. In some examples, the perspective changes include perspective translation as a function of direction and distance. In some examples, the change in perspective includes a shift. In some examples, the change in perspective includes movement along an axis between an original perspective of the illustration of the environment in the first image material and a position of an object in the environment, where at least a portion of the object is in This first image material is illustrated. In some examples, rotations, translations, transformations, and/or movements may be identified based on what is required to perform any type of reprojection and/or distortion described herein, such as in any of the examples of FIGS. 7-14. In some examples, user interface identification may be used to identify rotations, translations, transformations, and/or movements. For example, in some examples, the change in perspective includes at least one of a parallax shift of the perspective or a rotation of the perspective about the axis, the method further comprising receiving, via the user interface, one of: an indication of a distance of the parallax shift of the perspective , or an indication of the angle of rotation or axis of the viewing angle.

在操作4030處，媒體處理系統被配置為並且可以輸出第二圖像資料（例如，使用輸出設備270）。例如，媒體處理系統可以顯示第二圖像資料，輸出第二圖像資料用於進一步處理、儲存第二圖像資料、其任何組合及/或以其它方式輸出第二圖像資料。At operation 4030, the media processing system is configured and can output the second image material (eg, using output device 270). For example, the media processing system may display the second image data, output the second image data for further processing, store the second image data, any combination thereof, and/or otherwise output the second image data.

在一些示例中，輸出該第二圖像資料包括使用至少顯示器來顯示該第二圖像資料。在一些示例中，輸出該第二圖像資料包括使用至少一個通訊介面使該第二圖像資料發送到至少一個接收方設備。In some examples, outputting the second image material includes using at least a display to display the second image material. In some examples, outputting the second image data includes using at least one communication interface to cause the second image data to be sent to at least one recipient device.

在一些示例中，媒體處理系統被配置為並且可以基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個間隙，並且在輸出該第二圖像資料之前，藉由使用插補填充該第二圖像資料中的一或多個間隙來至少部分地修改該第二圖像資料。在一些示例中，媒體處理系統被配置為並且可以基於該第一複數個運動向量的相應端點中的一或多個間隙，標識該第二複數個運動向量中的一或多個間隙在該第二圖像資料中產生一或多個間隙，並且在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。間隙的示例包括在圖5中由星號指示的反演MV網格520中（及/或第二圖像Img2 515中）的間隙。In some examples, the media processing system is configured to and can identify one or more gaps in the second image material based on the one or more gaps in the second plurality of motion vectors, and upon outputting the second Before image data, the second image data is at least partially modified by filling one or more gaps in the second image data using interpolation. In some examples, the media processing system is configured to and can identify one or more gaps in the second plurality of motion vectors based on the one or more gaps in corresponding endpoints of the first plurality of motion vectors. One or more gaps are generated in the second image data, and before outputting the second image data, the first image data is modified, at least in part, by filling the one or more gaps in the second image data using interpolation. 2. Image data. Examples of gaps include gaps in the inverted MV grid 520 (and/or in the second image Img2 515) indicated by an asterisk in Figure 5.

在一些示例中，媒體處理系統被配置為並且可以基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域，並且在輸出該第二圖像資料之前，至少部分地藉由使用修復填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。修復可以使用插補、機器學習、神經網路或其組合。在圖18、圖21、圖22、圖28、圖33、圖34、圖35、圖36及/或圖37中圖示修復的示例。In some examples, the media processing system is configured to and can identify one or more occlusion regions in the second image material based on one or more gaps in the second plurality of motion vectors, and upon outputting the first The second image data is modified at least in part by filling one or more gaps in the second image data using inpainting. Repair can use interpolation, machine learning, neural networks, or a combination thereof. Examples of repairs are illustrated in Figures 18, 21, 22, 28, 33, 34, 35, 36 and/or 37.

在一些示例中，媒體處理系統被配置為並且可以基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域，並且在輸出該第二圖像資料之前，至少部分地藉由使用修復利用一或多個訓練過的機器學習模型填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。修復可以使用插補、機器學習、神經網路或其組合。在圖18、圖21、圖22、圖28、圖33、圖34、圖35、圖36及/或圖37中圖示修復的示例。In some examples, the media processing system is configured to and can identify one or more occlusion regions in the second image material based on one or more gaps in the second plurality of motion vectors, and upon outputting the first The second image data is modified at least in part by filling one or more gaps in the second image data using one or more trained machine learning models using inpainting. Repair can use interpolation, machine learning, neural networks, or a combination thereof. Examples of repairs are illustrated in Figures 18, 21, 22, 28, 33, 34, 35, 36 and/or 37.

在一些示例中，媒體處理系統被配置為並且可以基於該第二複數個運動向量中來自該第一圖像資料的一或多個衝突值來標識該第二圖像資料中的一或多個衝突，並且基於與該第二複數個運動向量相關聯的移動資料，從該第一圖像資料中選擇該一或多個衝突值中的一者。一或多個衝突的示例包括在反演MV網格520的儲存格8處的衝突。In some examples, the media processing system is configured to and can identify one or more of the second image material based on one or more conflicting values in the second plurality of motion vectors from the first image material. conflict, and selecting one of the one or more conflict values from the first image data based on motion data associated with the second plurality of motion vectors. An example of one or more conflicts includes a conflict at cell 8 of the inversion MV grid 520 .

在一些示例中，該第一圖像資料中對該環境的圖示從第一視角圖示環境，並且該視角變化是該第一視角與和該第二圖像資料中對該環境的第二圖示相對應的不同視角之間的變化。在一些示例中，第一複數個運動向量從第一視角指向不同視角，而第二複數個運動向量從不同視角指向第一視角。In some examples, the illustration of the environment in the first image material illustrates the environment from a first perspective, and the change in perspective is a combination of the first perspective and a second perspective of the environment in the second image material. The diagram shows the corresponding changes between different viewing angles. In some examples, a first plurality of motion vectors point from a first viewing angle to different viewing angles, and a second plurality of motion vectors point from different viewing angles to the first viewing angle.

在一些示例中，本文描述的程序（例如，程序4000及/或本文描述的其它程序）可以由計算設備或裝置執行。在一些示例中，本文描述的程序可以由圖像擷取和處理系統100、圖像擷取裝置105A、影像處理設備105B、影像處理器150、ISP 154、主機處理器152、成像系統200、HMD 310、移動手持終端410、重投影和網格反演系統2490、圖23的系統、圖24的系統、圖25的系統、圖26的系統、圖28的系統、圖29的系統、神經網路3900、計算系統4100、處理器4110或其組合來執行。In some examples, the procedures described herein (eg, procedure 4000 and/or other procedures described herein) may be executed by a computing device or apparatus. In some examples, the procedures described herein may be performed by image capture and processing system 100, image capture device 105A, image processing device 105B, image processor 150, ISP 154, host processor 152, imaging system 200, HMD 310. Mobile handheld terminal 410, reprojection and grid inversion system 2490, system of Figure 23, system of Figure 24, system of Figure 25, system of Figure 26, system of Figure 28, system of Figure 29, neural network 3900, computing system 4100, processor 4110 or a combination thereof.

計算設備可以包括任何合適的設備，諸如行動設備（例如，行動電話）、桌上型計算設備、平板計算設備、可穿戴設備（例如，VR頭戴式耳機、AR頭戴式耳機、AR眼鏡、聯網手錶或智慧手錶或其它可穿戴設備）、伺服器電腦、自主車輛或自主車輛的計算設備、機器人設備、電視機及/或具有用於執行本文描述的程序的資源能力的任何其它計算設備。在一些情況下，計算設備或裝置可以包括各種元件，諸如一或多個輸入設備、一或多個輸出設備、一或多個處理器、一或多個微處理器、一或多個微型電腦、一或多個相機、一或多個感測器及/或被配置為執行本文描述的程序的步驟的其它元件。在一些示例中，計算設備可以包括顯示器、被配置為傳送及/或接收資料的網路介面、其任何組合及/或其它元件。網路介面可以被配置為傳送及/或接收基於網際網路協定（IP）的資料或其它類型的資料。Computing devices may include any suitable device, such as mobile devices (eg, cell phones), desktop computing devices, tablet computing devices, wearable devices (eg, VR headsets, AR headsets, AR glasses, connected watch or smart watch or other wearable device), server computer, autonomous vehicle or autonomous vehicle computing device, robotic device, television, and/or any other computing device with resource capabilities for executing the procedures described herein. In some cases, a computing device or apparatus may include various elements, such as one or more input devices, one or more output devices, one or more processors, one or more microprocessors, one or more microcomputers , one or more cameras, one or more sensors, and/or other elements configured to perform the steps of the procedures described herein. In some examples, a computing device may include a display, a network interface configured to transmit and/or receive data, any combination thereof, and/or other elements. The network interface may be configured to transmit and/or receive Internet Protocol (IP)-based data or other types of data.

計算設備的元件可以在電路中實施。例如，該元件可以包括電子電路或其它電子硬體及/或可以使用電子電路或其它電子硬體來實施，該電子電路或其它電子硬體可以包括一或多個可程式設計電子電路（例如，微處理器、圖形處理單元（GPU）、數位訊號處理器（DSP）、中央處理單元（CPU）及/或其它合適的電子電路）；及/或可以包括電腦軟體、韌體或其任何組合及/或使用電腦軟體、韌體或其任何組合來實施，以執行本文描述的各種操作。Elements of a computing device may be implemented in circuits. For example, the components may include and/or may be implemented using electronic circuitry or other electronic hardware, which may include one or more programmable electronic circuits (e.g., microprocessor, graphics processing unit (GPU), digital signal processor (DSP), central processing unit (CPU) and/or other suitable electronic circuits); and/or may include computer software, firmware or any combination thereof and /Or implemented using computer software, firmware, or any combination thereof, to perform the various operations described herein.

本文描述的程序被示出為邏輯流程圖、方塊圖或概念圖，其動作表示可以在硬體、電腦指令或其組合中實施的操作序列。在電腦指令的上下文中，該些操作表示儲存在一或多個電腦可讀取儲存媒體上的電腦可執行指令，該電腦可執行指令在由一或多個處理器執行時執行所敘述的操作。大體上，電腦可執行指令包括執行特定功能或實施特定資料類型的常式、程式、物件、元件、資料結構等。描述操作的順序不旨在被解釋為限制，並且可以以任何順序及/或並行地組合任意數量的所描述的操作來實施該程序。The programs described herein are illustrated as logic flow diagrams, block diagrams, or conceptual diagrams, the actions of which represent sequences of operations that may be implemented in hardware, computer instructions, or combinations thereof. In the context of computer instructions, these operations mean computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations . Generally speaking, computer executable instructions include routines, programs, objects, components, data structures, etc. that perform specific functions or implement specific data types. The order in which the operations are described is not intended to be construed as a limitation, and the procedures may be implemented by combining any number of the described operations in any order and/or in parallel.

另外，本文描述的程序可以在被配置有可執行指令的一或多個電腦系統的控制下執行並且可以被實施為在一或多個處理器上共同執行、藉由硬體執行或其組合的代碼（例如，可執行指令、一或多個電腦程式、或一或多個應用程式）。如上文提及，該代碼可以例如以包括可由一或多個處理器執行的複數個指令的電腦程式的形式儲存在電腦可讀或機器可讀儲存媒體上。電腦可讀或機器可讀儲存媒體可以是非暫時性的。Additionally, the programs described herein may execute under the control of one or more computer systems configured with executable instructions and may be implemented for execution jointly on one or more processors, by hardware, or a combination thereof Code (e.g., executable instructions, one or more computer programs, or one or more applications). As mentioned above, the code may be stored on a computer-readable or machine-readable storage medium, for example in the form of a computer program including a plurality of instructions executable by one or more processors. Computer-readable or machine-readable storage media may be non-transitory.

圖41是示出用於實施本技術的某些態樣的系統的示例的圖式。具體地，圖41圖示計算系統4100的示例，該計算系統可以是例如構成內部計算系統、遠端計算系統、相機、或其任何元件的任何計算設備，其中該系統的元件使用連接4105彼此進行通訊。連接4105可以是使用匯流排的實體連接，或者是諸如在晶片組架構中進入處理器4110的直接連接。連接4105也可以是虛擬連接、聯網連接或邏輯連接。41 is a diagram illustrating an example of a system for implementing certain aspects of the present technology. Specifically, FIG. 41 illustrates an example of a computing system 4100 , which may be, for example, any computing device that constitutes an internal computing system, a remote computing system, a camera, or any element thereof, where the elements of the system are connected to each other using connection 4105 Communication. Connection 4105 may be a physical connection using a bus, or a direct connection into processor 4110 such as in a chipset architecture. Connection 4105 may also be a virtual connection, a network connection, or a logical connection.

在一些實施例中，計算系統4100是分散式系統，其中本案所描述的功能可以分佈在資料中心、多個資料中心、同級間網路等中。在一些實施例中，所描述的系統元件中的一者或多者表示許多此類元件，其各自都執行描述該元件的部分或全部功能。在一些實施例中，該元件可以是實體裝置或虛擬裝置。In some embodiments, computing system 4100 is a distributed system in which the functionality described herein may be distributed across a data center, multiple data centers, a peer-to-peer network, and the like. In some embodiments, one or more of the system elements described represent a number of such elements, each of which performs some or all of the functions described for that element. In some embodiments, the element may be a physical device or a virtual device.

示例性系統4100包括至少一個處理單元（CPU或處理器）4110和連接4105，該連接將包括系統記憶體4115（諸如唯讀記憶體（ROM）4120和隨機存取記憶體（RAM）4125）的各種系統元件耦合到處理器4110。計算系統4100可以包括與處理器4110直接連接、緊密接近或整合為處理器的一部分的高速記憶體的快取記憶體4112。The exemplary system 4100 includes at least one processing unit (CPU or processor) 4110 and a connection 4105 that will include system memory 4115 such as read only memory (ROM) 4120 and random access memory (RAM) 4125 Various system elements are coupled to processor 4110. Computing system 4100 may include a high-speed memory cache 4112 directly connected to, in close proximity to, or integrated as part of the processor 4110 .

處理器4110可以包括任何通用處理器以及被配置為控制處理器4110的硬體服務或軟體服務（諸如儲存在儲存裝置4130中的服務4132、4134和4136）以及其中將軟體指令併入實際的處理器設計中的專用處理器。處理器4110本質上可以是完全獨立的計算系統，其包含多個核或處理器、匯流排、記憶體控制器、快取記憶體等。多核處理器可以是對稱的或不對稱的。Processor 4110 may include any general purpose processor as well as hardware services or software services (such as services 4132, 4134, and 4136 stored in storage 4130) configured to control processor 4110 and in which software instructions are incorporated into the actual processing Specialized processors in device designs. Processor 4110 may essentially be a completely independent computing system that includes multiple cores or processors, a bus, a memory controller, cache, etc. Multicore processors can be symmetric or asymmetric.

為了實現使用者互動，計算系統4100包括一個輸入設備4145，它可以表示任意數量的輸入機構，諸如用於語音的麥克風、用於手勢或圖形輸入的觸敏螢幕、鍵盤、滑鼠、運動輸入、語音等。計算系統4100還可以包括輸出設備4135，其可以是許多輸出機構中的一者或多者。在一些情況下，多模態系統可以使得使用者能夠提供多種類型的輸入/輸出以與計算系統4100進行通訊。計算系統4100可以包括通訊介面4140，其大體可以控制和管理使用者輸入和系統輸出。通訊介面可以使用有線及/或無線收發器來執行或促進接收及/或發送有線或無線通訊，包括充分利用以下各項的有線或無線通訊：音訊插孔/插頭、麥克風插孔/插頭、通用序列匯流排（USB）埠/插頭、Apple® Lightning®埠/插頭、乙太網路埠/插頭、光纖埠/插頭、專永有線埠/插頭、BLUETOOTH®無線信號傳遞、BLUETOOTH®低功耗（BLE）無線信號傳遞、IBEACON®無線信號傳遞、射頻辨識（RFID）無線信號傳遞、近場通訊（NFC）無線信號傳遞、專用短程通訊（DSRC）無線信號傳遞、802.11 Wi-Fi無線信號傳遞、無線區域網路（WLAN）信號傳遞、可見光通訊（VLC）、微波存取全球互通（WiMAX）、紅外（IR）通訊無線信號傳遞、公用交換電話網（PSTN）信號傳遞、整合式服務數位網路絡（ISDN）信號傳遞、3G/4G/5G/LTE蜂巢資料網路無線信號傳遞、自組織網路信號傳遞、無線電波信號傳遞、微波信號傳遞、紅外信號傳遞、可見光信號傳遞、紫外光信號傳遞、沿著電磁頻譜的無線信號傳遞、或其某種組合。通訊介面4140還可以包括一或多個全球導航衛星系統（GNSS）接收器或收發器，其用於基於從與一或多個GNSS系統相關聯的一或多個衛星接收到的一或多個信號來決定計算系統4100的位置。GNSS系統包括但不限於美國全球定位系統（GPS）、俄羅斯全球導航衛星系統（GLONASS）、中國北斗導航衛星系統（BDS）和歐洲伽利略全球GNSS。對於在任何特定硬體裝置上進行操作沒有限制，因此，此處的基本功能在被開發時可以輕鬆替換為改進的硬體或韌體佈置。To enable user interaction, computing system 4100 includes an input device 4145, which may represent any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphic input, a keyboard, a mouse, motion input, Voice, etc. Computing system 4100 may also include an output device 4135, which may be one or more of a number of output mechanisms. In some cases, a multimodal system may enable a user to provide multiple types of input/output to communicate with computing system 4100 . Computing system 4100 may include a communication interface 4140 that may generally control and manage user input and system output. Communications interfaces may use wired and/or wireless transceivers to perform or facilitate the receipt and/or sending of wired or wireless communications, including wired or wireless communications that take advantage of: audio jack/plug, microphone jack/plug, universal Serial bus (USB) port/plug, Apple® Lightning® port/plug, Ethernet port/plug, fiber optic port/plug, dedicated wired port/plug, BLUETOOTH® wireless signal transmission, BLUETOOTH® low power consumption ( BLE) wireless signal transmission, IBEACON® wireless signal transmission, radio frequency identification (RFID) wireless signal transmission, near field communication (NFC) wireless signal transmission, dedicated short-range communication (DSRC) wireless signal transmission, 802.11 Wi-Fi wireless signal transmission, wireless Area Network (WLAN) signal transmission, Visible Light Communications (VLC), Worldwide Interoperability for Microwave Access (WiMAX), Infrared (IR) communications wireless signal transmission, Public Switched Telephone Network (PSTN) signal transmission, Integrated Services Digital Network (ISDN) signal transmission, 3G/4G/5G/LTE cellular data network wireless signal transmission, self-organizing network signal transmission, radio wave signal transmission, microwave signal transmission, infrared signal transmission, visible light signal transmission, ultraviolet light signal transmission, Wireless signal transmission along the electromagnetic spectrum, or some combination thereof. The communication interface 4140 may also include one or more Global Navigation Satellite System (GNSS) receivers or transceivers for performing communication based on one or more signals received from one or more satellites associated with one or more GNSS systems. signals to determine the location of computing system 4100. GNSS systems include, but are not limited to, the American Global Positioning System (GPS), the Russian Global Navigation Satellite System (GLONASS), the Chinese Beidou Navigation Satellite System (BDS), and the European Galileo Global GNSS. There are no restrictions on operating on any specific hardware device, so the basic functionality here can easily be replaced with improved hardware or firmware arrangements as they are developed.

儲存裝置4130可以是非揮發性及/或非暫時性及/或電腦可讀儲存裝置並且可以是硬碟或其它類型的電腦可讀取媒體，其可以儲存可由電腦存取的資料，該電腦可讀取媒體諸如盒式磁帶、快閃記憶卡、固態記憶體設備、數位多功能磁碟、磁帶盒、軟碟、軟碟、硬碟、磁帶、磁條/磁碟、任何其它磁性儲存媒體、快閃記憶體、憶阻器記憶體、任何其它固態記憶體、光碟唯讀記憶體（CD-ROM）光碟、可讀寫光碟（CD）光碟、數位視訊（DVD）光碟、藍光光碟（BDD）光碟、全息光碟、另一種光學媒體、安全數位（SD）卡、微型安全數位（microSD）卡、記憶棒®卡、智慧卡晶片、EMV晶片、用戶身份模組（SIM）卡、迷你/微型/奈米/微微SIM卡、另一積體電路（IC）晶片/卡、隨機存取記憶體（RAM）、靜態RAM（SRAM）、動態RAM（DRAM）、唯讀記憶體（ROM）、可程式設計唯讀記憶體（PROM）、可抹除可程式設計唯讀記憶體（EPROM）、電子可抹除可程式設計唯讀記憶體（EEPROM）、快閃記憶體EPROM（FLASHEPROM）、快取記憶體（L1/L2/L3/L4/L5/L#）、電阻式隨機存取記憶體（RRAM/ReRAM）、相變記憶體（PCM）、自旋轉移扭矩RAM（STT-RAM）、另一記憶體晶片或磁帶盒，及/或其組合。Storage device 4130 may be a non-volatile and/or non-transitory and/or computer-readable storage device and may be a hard drive or other type of computer-readable medium that may store data that may be accessed by a computer that may be read by the computer. Retrieve media such as cassette tapes, flash memory cards, solid state memory devices, digital versatile disks, tape cartridges, floppy disks, floppy disks, hard disks, tapes, magnetic strips/disks, any other magnetic storage media, fast Flash memory, memristor memory, any other solid-state memory, CD-ROM, CD-ROM, DVD, BDD , Holographic Disc, Another Optical Media, Secure Digital (SD) Card, Micro Secure Digital (microSD) Card, Memory Stick® Card, Smart Card Chip, EMV Chip, Subscriber Identity Module (SIM) Card, Mini/Micro/Nay Mi/Pico SIM card, another integrated circuit (IC) chip/card, random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), read only memory (ROM), programmable Read-only memory (PROM), erasable programmable read-only memory (EPROM), electronically erasable programmable read-only memory (EEPROM), flash memory EPROM (FLASHEPROM), cache memory (L1/L2/L3/L4/L5/L#), resistive random access memory (RRAM/ReRAM), phase change memory (PCM), spin transfer torque RAM (STT-RAM), another memory chip or tape cartridge, and/or combinations thereof.

儲存裝置4130可以包括軟體服務、伺服器、服務等，當定義這種軟體的代碼由處理器4110執行時使系統執行功能。在一些實施例中，執行特定功能的硬體服務可以包括儲存在電腦可讀取媒體中的軟體元件與必要的硬體元件（諸如處理器4110、連接4105、輸出設備4135等）的結合以執行功能。Storage 4130 may include software services, servers, services, etc., that when code defining such software is executed by processor 4110, causes the system to perform functions. In some embodiments, hardware services that perform specific functions may include a combination of software components stored in computer-readable media and necessary hardware components (such as processor 4110, connection 4105, output device 4135, etc.) to perform Function.

如本文中所使用的術語「電腦可讀取媒體」包括但不限於可攜式或非可攜式儲存裝置、光儲存裝置以及能夠儲存、包含或攜帶指令及/或資料的各種其它媒體。電腦可讀取媒體可以包括其中可以儲存資料並且不包括無線地或藉由有線連接傳播的載波及/或瞬態電子信號的非暫時性媒體。非暫時性媒體的示例可以包括但不限於磁碟或磁帶、諸如壓縮光碟（CD）或數位通用磁碟（DVD）等光學儲存媒體、快閃記憶體、記憶體或記憶體設備。電腦可讀取媒體在其上可以儲存代碼及/或機器可執行指令，它們可以表示程序、函數、副程式、程式、常式、子常式、模組、套裝軟體、類別，或者指令、資料結構或程式語句的任何組合。程式碼片段可以藉由傳遞及/或接收資訊、資料、引數、參數或記憶體內容而耦合到另一個程式碼片段或硬體電路。資訊、引數、參數、資料等可以使用任何合適的方式來傳遞、轉發或發送，這些方式包括記憶體共享、訊息傳遞、符記傳遞、網路發送等。The term "computer readable medium" as used herein includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media that can store, contain or carry instructions and/or data. Computer-readable media may include non-transitory media in which data can be stored and does not include carrier waves and/or transient electronic signals that propagate wirelessly or over wired connections. Examples of non-transitory media may include, but are not limited to, magnetic disks or tapes, optical storage media such as compact discs (CDs) or digital versatile discs (DVDs), flash memory, memory, or memory devices. A computer-readable medium on which code and/or machine-executable instructions may be stored, which may represent a program, function, subroutine, routine, routine, subroutine, module, package, class, or instructions, data Any combination of structural or program statements. A code fragment may be coupled to another code fragment or hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. can be passed, forwarded or sent using any suitable method, including memory sharing, message passing, token passing, network sending, etc.

在一些實施例中，電腦可讀儲存裝置、媒體和記憶體可以包括包含位元串流等的有線或無線信號。然而，在提及時，非暫時性電腦可讀取儲存媒體明確排除諸如能量、載波信號、電磁波和信號本身等媒體。In some embodiments, computer-readable storage devices, media, and memory may include wired or wireless signals including bit streams and the like. However, references to non-transitory computer-readable storage media specifically exclude media such as energy, carrier signals, electromagnetic waves and signals themselves.

在以上描述中提供了具體細節以提供對本文提供的實施例和示例的透徹理解。然而，本領域一般熟習此項技術者將理解，可以在沒有這些具體細節的情況下實踐這些實施例。為解釋清楚，在某些情況下，本技術可以被表示為包括個別功能方塊，該功能方塊包括包含以軟體或硬體和軟體的組合體現的方法中的設備、設備元件、步驟或常式的功能方塊。可以使用除了圖中所示及/或本文描述的那些之外的附加組件。例如，電路、系統、網路、程序和其它元件可以以方塊圖形式示出為元件，以免在不必要的細節中混淆實施例。在其它情況下，可以在沒有不必要的細節的情況下示出公知電路、程序、演算法、結構和技術以免混淆實施例。Specific details are provided in the above description to provide a thorough understanding of the embodiments and examples provided herein. However, one of ordinary skill in the art will understand that these embodiments may be practiced without these specific details. For clarity of explanation, in some cases, the technology may be represented as including individual functional blocks that include equipment, equipment elements, steps or routines in a method embodied in software or a combination of hardware and software. Functional blocks. Additional components other than those shown in the figures and/or described herein may be used. For example, circuits, systems, networks, processes, and other components may be shown in block diagram form in order to avoid obscuring the embodiments in unnecessary detail. In other instances, well-known circuits, procedures, algorithms, structures and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

上文可以將各個實施例描述為程序或方法，該程序或方法被圖示為流程圖、流圖、資料流圖、結構圖或方塊圖。儘管流程圖可以將操作描述為循序程序，但是許多操作可並行或同時執行。另外，可以重新佈置操作順序。過程在其操作完成時終止，但是可能具有圖中未包括的附加步驟。過程可以對應方法、函數、程序、子常式、副程式等。當過程對應於函數時，它的終止可以對應於函數返回到調用函數或主函數。Various embodiments may be described above as procedures or methods illustrated as flowcharts, flowcharts, data flow diagrams, structure diagrams, or block diagrams. Although a flowchart can describe operations as a sequential program, many operations can be performed in parallel or simultaneously. Additionally, the order of operations can be rearranged. A process terminates when its operations are complete, but may have additional steps not included in the figure. Procedures can correspond to methods, functions, programs, subroutines, subroutines, etc. When a procedure corresponds to a function, its termination can correspond to the function's return to the calling function or to the main function.

可以使用儲存在電腦可讀取媒體中或可從電腦可讀取媒體中獲得的電腦可執行指令來實施根據上述示例的程序和方法。此類指令可以包括例如導致或以其它方式配置通用電腦、專用電腦或處理設備以執行特定功能或功能組的指令和資料。可以經由網路存取所使用的電腦資源的多個部分。電腦可執行指令可以是例如二進位檔案、中間格式指令（諸如組合語言）、韌體、原始程式碼等。可以用於儲存指令、所使用的資訊及/或在根據所描述的示例的方法期間建立的資訊的電腦可讀取媒體的示例包括磁碟、光碟、快閃記憶體、設置有非揮發性記憶體的USB設備、聯網儲存裝置等等。Programs and methods according to the above examples may be implemented using computer-executable instructions stored in or obtainable from a computer-readable medium. Such instructions may include, for example, instructions and material that cause or otherwise configure a general purpose computer, special purpose computer or processing device to perform a particular function or group of functions. Various parts of the computer resources used can be accessed via the network. Computer executable instructions may be, for example, binary files, intermediate format instructions (such as assembly language), firmware, source code, etc. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to the described examples include magnetic disks, optical disks, flash memory, and non-volatile memory. USB devices, networked storage devices, etc.

實施根據這些揭露內容的程序和方法的設備可以包括硬體、軟體、韌體、仲介軟體、微代碼、硬體描述語言或其任何組合，並且可以採用多種外觀尺寸中的任何一種。當以軟體、韌體、仲介軟體或微代碼實施時，用於執行必要任務（例如，電腦程式產品）的程式碼或程式碼片段可以儲存在電腦可讀或機器可讀取媒體中。處理器可以執行必要的任務。外觀尺寸的典型示例包括膝上型電腦、智慧手機、行動電話、平板設備或其它小外觀尺寸的個人電腦、個人數位助理、機架式設備、獨立設備等。本文描述的功能也可以體現在周邊設備或擴展卡中。進一步舉例，這種功能性還可以在於單個晶片中執行的不同晶片或不同程序之間的電路板上實施。Devices implementing programs and methods according to these disclosures may include hardware, software, firmware, intermediary software, microcode, hardware description languages, or any combination thereof, and may adopt any of a variety of form factors. When implemented in software, firmware, intermediary software, or microcode, the code or code fragments used to perform the necessary tasks (e.g., a computer program product) may be stored on a computer-readable or machine-readable medium. The processor can perform necessary tasks. Typical examples of form factors include laptop computers, smartphones, mobile phones, tablet devices or other small form factor personal computers, personal digital assistants, rack-mounted devices, stand-alone devices, and the like. The functionality described in this article can also be implemented in peripheral devices or expansion cards. By further example, such functionality may also be implemented on a circuit board between different wafers or different processes executed within a single wafer.

指令、用於傳達此類指令的媒體、用於執行它們的計算資源以及用於支援此類計算資源的其它結構是用於提供本案中描述的功能的示例性構件。Instructions, the media used to communicate such instructions, the computing resources used to execute them, and other structures used to support such computing resources are exemplary components for providing the functionality described herein.

在前面的描述中，參考本案的具體實施例描述了本案的各態樣，但是熟習此項技術者將認識到本案不限於此。因此，儘管本文已經詳細描述了本案的說明性實施例，但是應當理解，可以其它方式不同地實施和採用創造性概念，並且所附請求項意圖被解釋為包括除了受先前技術的限制的變形之外的這種變型。上述應用的各種特徵和態樣可以單獨或聯合使用。此外，在不脫離本說明書的更廣泛的精神和範圍的情況下，實施例可以在超出本文描述的那些環境和應用的任何數量的環境和應用中使用。因此，說明書和附圖被認為是說明性的而不是限制性的。出於說明目的，按特定順序描述了方法。應當理解，在替代實施例中，可以以與所描述的順序不同的順序來執行該方法。In the foregoing description, various aspects of the present case are described with reference to specific embodiments of the present case, but those familiar with the art will realize that the present case is not limited thereto. Thus, while illustrative embodiments of the present invention have been described in detail herein, it is to be understood that the inventive concepts may be otherwise practiced and employed differently, and the appended claims are intended to be construed as including variations in addition to limitations subject to the prior art of this variation. The various features and aspects of the applications described above can be used individually or in combination. Furthermore, the embodiments may be used in any number of environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive. For illustrative purposes, the methods are described in a specific order. It will be understood that, in alternative embodiments, the methods may be performed in an order different than that described.

一般熟習此項技術者將理解，在不脫離本說明書的範圍的情況下，本文使用的小於（「＜」）和大於（「＞」）符號或術語分別可以被替換為小於或等於（「≦」）和大於或等於（「≧」）符號。Those skilled in the art will understand that, without departing from the scope of this specification, the less than ("<") and greater than (">") symbols or terms used herein may be replaced with less than or equal to ("≦"), respectively. ") and greater than or equal to ("≧") symbols.

在元件被描述為「被配置為」執行某些操作的情況下，此類配置可以例如藉由設計電子電路或其它硬體來執行操作、藉由對可程式設計電子電路（例如，微處理器，或其它合適的電子電路）進行程式設計以執行操作或其任何組合來實現。Where an element is described as being "configured to" perform certain operations, such configuration may, for example, be by designing electronic circuitry or other hardware to perform the operations, by programming programmable electronic circuitry (e.g., a microprocessor) , or other suitable electronic circuit) programmed to perform the operations or any combination thereof.

短語「耦合到」是指直接或間接地實體連接到另一個元件的任何元件，及/或與另一個元件直接或間接通訊（例如，經由有線或無線連接及/或其它合適的通訊介面連接到另一元件）的任何元件。The phrase "coupled to" refers to any element that is directly or indirectly physically connected to, and/or is in direct or indirect communication with, another element (e.g., via a wired or wireless connection and/or other suitable communications interface to another component).

敘述集合中的「至少一者」及/或集合中的「一者或多者」的請求項語言或其它語言指示該集合的一個成員或該集合的多個成員（以任何組合）滿足請求項。例如，敘述「A和B中的至少一者」的請求項語言表示A、B或A和B。在另一個示例中，敘述「A、B和C中的至少一者」的請求項語言表示A、B、C，或A和B，或A和C，或B和C，或A和B和C。集合中的「至少一者」及/或集合中的「一者或多者」不將該集合限制為該集合中列出的項目。例如，敘述「A和B中的至少一者」的請求項語言可以表示A、B或A和B，並且可以另外包括未在A和B的集合中列出的項目。Request language that states "at least one" of a set and/or "one or more" of a set or other language indicates that a member of the set or multiple members of the set (in any combination) satisfy the claim . For example, claim language stating "at least one of A and B" means A, B, or A and B. In another example, claim language stating "at least one of A, B, and C" means A, B, C, or A and B, or A and C, or B and C, or A and B and C. "At least one" of a collection and/or "one or more" of a collection does not limit the collection to the items listed in the collection. For example, claim language stating "at least one of A and B" may mean A, B, or A and B, and may additionally include items not listed in the set of A and B.

結合本文揭露的實施例描述的各種說明性邏輯區塊、模組、電路和演算法步驟可以被實施為電子硬體、電腦軟體、韌體或其組合。為了清楚地示出硬體和軟體的這種可互換性，上面已經對各種說明性元件、方塊、模組、電路和步驟對其功能進行了大概描述。將這種功能性實施為硬體還是軟體取決於強加於整個系統的特定應用和設計約束。熟習此項技術者可以針對每個特定應用以不同方式實施所描述的功能性，但是這種實施決策不應被解釋為導致脫離本案的範圍。The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, firmware, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been briefly described above as to their functionality. Whether this functionality is implemented as hardware or software depends on the specific application and design constraints imposed on the overall system. Those skilled in the art may implement the described functionality in various ways for each particular application, but such implementation decisions should not be construed as causing a departure from the scope of this disclosure.

本文描述的技術也可以在電子硬體、電腦軟體、韌體或其任何組合中實施。此類技術可以在多種設備中的任一種中實施，該設備諸如通用電腦、無線通訊設備手持終端或具有多種用途的積體電路設備，該多種用途包括在無線通訊設備手持終端和其它設備中的應用。被描述為模組或元件的任何特徵可以一起在整合邏輯裝置中實施，或者作為離散但可交交交互動操作的邏輯裝置分開實施。如果以軟體實施，則技術可以至少部分地由包括程式碼的電腦可讀取資料儲存媒體來實施，該程式碼包括在被執行時執行上述方法中的一者或多者的指令。電腦可讀取資料儲存媒體可以形成電腦程式產品的一部分，該電腦程式產品可以包括包裝材料。電腦可讀取媒體可以包括記憶體或資料儲存媒體，諸如隨機存取記憶體（RAM），諸如同步動態隨機存取記憶體（SDRAM）、唯讀記憶體（ROM）、非揮發性隨機存取記憶體（NVRAM）、電子可抹除可程式設計唯讀記憶體（EEPROM）、快閃記憶體、磁性或光學資料儲存媒體等。另外或替代地，技術可以至少部分地由電腦可讀通訊媒體（諸如傳播信號或波）來實現，該電腦可讀通訊媒體以指令或資料結構的形式攜帶或傳送程式碼，並且可以由電腦或其它處理器來存取、讀取及/或執行。The techniques described herein may also be implemented in electronic hardware, computer software, firmware, or any combination thereof. Such techniques may be implemented in any of a variety of devices, such as general-purpose computers, wireless communication device handheld terminals, or integrated circuit devices having a variety of uses, including in wireless communication device handheld terminals and other devices. Application. Any features described as modules or elements may be implemented together in an integrated logic device, or separately as discrete but interoperable logic devices. If implemented in software, the technology may be implemented, at least in part, by a computer-readable data storage medium including program code including instructions that, when executed, perform one or more of the methods described above. The computer-readable data storage medium may form part of a computer program product, which may include packaging materials. Computer readable media may include memory or data storage media such as random access memory (RAM), such as synchronous dynamic random access memory (SDRAM), read only memory (ROM), non-volatile random access memory Memory (NVRAM), electronically erasable programmable read-only memory (EEPROM), flash memory, magnetic or optical data storage media, etc. Additionally or alternatively, technology may be implemented, at least in part, by a computer-readable communication medium, such as a propagated signal or wave, that carries or transfers program code in the form of instructions or data structures, and may be implemented by a computer or other processors to access, read and/or execute.

程式碼可以由處理器執行，該處理器可以包括一或多個處理器，諸如一或多個數位訊號處理器（DSP）、通用微處理器、專用積體電路（ASIC）、現場可程式設計邏輯陣列（FPGA）或其它等效的整合或離散邏輯電路系統。此類處理器可以被配置為執行本案中描述的任何技術。通用處理器可以是微處理器；但是替代地，處理器可以是任何習知處理器、控制器、微控制器或狀態機。處理器也可以被實施為計算設備的組合，例如，DSP與微處理器的組合、複數個微處理器、一或多個微處理器結合DSP核或者任何其它此類配置。因此，如本文所使用的術語「處理器」可以代表任何前述結構、前述結構的任何組合、或適用於實施本文描述的技術的任何其它結構或裝置。另外，在一些態樣中，本文描述的功能性可以被提供於被配置用於編碼和解碼或者被結合在組合視訊轉碼器-解碼器（CODEC）中的專用軟體模組或硬體模組內。The program code may be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable Logic array (FPGA) or other equivalent integrated or discrete logic circuit system. Such processors may be configured to perform any of the techniques described in this case. A general purpose processor may be a microprocessor; but alternatively, the processor may be any conventional processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Accordingly, the term "processor" as used herein may represent any of the foregoing structures, any combination of the foregoing structures, or any other structure or device suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided in dedicated software modules or hardware modules configured for encoding and decoding or incorporated into a combined video transcoder-decoder (CODEC) within.

本案的說明性態樣包括：Illustrative aspects of this case include:

態樣1A.一種影像處理裝置，該裝置包括：至少一個記憶體；及至少一個處理器，該至少一個處理器耦合到該至少一個記憶體，該至少一個處理器被配置為：e.Aspect 1A. An image processing device, the device comprising: at least one memory; and at least one processor, the at least one processor coupled to the at least one memory, the at least one processor configured to: e.

態樣2A.根據態樣1A所述的裝置，其中該第二圖像資料包括被配置為圖示第一時間與第三時間之間的第二時間處的環境的插補圖像，其中該第一圖像資料包括圖示至少該第一時間或該第三時間中的一個時間處的環境的至少一個圖像。Aspect 2A. The apparatus of aspect 1A, wherein the second image material includes an interpolated image configured to illustrate the environment at a second time between the first time and a third time, wherein the The first image material includes at least one image illustrating the environment at at least one of the first time or the third time.

態樣3A.根據態樣1A至2A中任一項所述的裝置，其中該第一圖像資料包括包括視差移動的複數個視訊資料圖框，其中該第二圖像資料包括該複數個視訊資料圖框的減少該視差移動的穩定變體。Aspect 3A. The device of any one of Aspects 1A to 2A, wherein the first image data includes a plurality of video data frames including parallax motion, and wherein the second image data includes the plurality of video data frames Stable variant of the data frame that reduces parallax movement.

態樣4A.根據態樣1A至3A中任一項所述的裝置，其中該第一圖像資料包括人從第一角度觀看該圖像感測器，其中該第二圖像資料包括該人從與該第一角度不同的第二角度觀看該圖像感測器。Aspect 4A. The device of any one of Aspects 1A to 3A, wherein the first image data includes a person viewing the image sensor from a first angle, and wherein the second image data includes the person The image sensor is viewed from a second angle different from the first angle.

態樣5A.根據態樣1A至4A中任一項所述的裝置，其中視角變化包括根據角度並圍繞軸線的視角旋轉。Aspect 5A. The device of any one of aspects 1A to 4A, wherein the change in viewing angle includes rotation of viewing angle as a function of angle and about an axis.

態樣6A.根據態樣1A至5A中任一項所述的裝置，其中視角變化包括根據方向和距離的視角平移。Aspect 6A. The device of any one of aspects 1A to 5A, wherein the viewing angle change includes viewing angle translation as a function of direction and distance.

態樣7A.根據態樣1A至6A中任一項所述的裝置，其中視角變化包括變換。Aspect 7A. The device of any one of aspects 1A to 6A, wherein the change in viewing angle includes a transformation.

態樣8A.根據態樣1A至7A中任一項所述的裝置，其中該視角變化包括沿著軸線在該第一圖像資料中的對該環境的圖示的原始視角與物件在該環境中的位置之間的移動，其中該物件的至少一部分在該第一圖像資料中進行了圖示。Aspect 8A. The device of any one of Aspects 1A to 7A, wherein the change in perspective includes an original perspective along an axis of the illustration of the environment in the first image data and objects in the environment. Movement between locations in which at least a portion of the object is illustrated in the first image data.

態樣9A.根據態樣1A至8A中任一項所述的裝置，其中該至少一個處理器被配置為：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個間隙；及在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 9A. The apparatus of any one of aspects 1A-8A, wherein the at least one processor is configured to identify the second image based on one or more gaps in the second plurality of motion vectors. one or more gaps in the second image data; and before outputting the second image data, modify the second image at least in part by filling one or more gaps in the second image data using interpolation material.

態樣10A.根據態樣1A至9A中任一項所述的裝置，其中該至少一個處理器被配置為：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 10A. The apparatus of any one of aspects 1A-9A, wherein the at least one processor is configured to identify the second image based on one or more gaps in the second plurality of motion vectors. one or more occluded areas in the image data; and before outputting the second image data, modify the second image at least in part by filling one or more gaps in the second image data using inpainting material.

態樣11A.根據態樣1A至10A中任一項所述的裝置，其中該至少一個處理器被配置為：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復利用一或多個訓練過的機器學習模型填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 11A. The apparatus of any one of aspects 1A-10A, wherein the at least one processor is configured to identify the second image based on one or more gaps in the second plurality of motion vectors. one or more occluded areas in the image data; and before outputting the second image data, at least partially filling in one or more occluded areas in the second image data by using inpainting using one or more trained machine learning models. or multiple gaps to modify the second image data.

態樣12A.根據態樣1A至11A中任一項所述的裝置，其中該至少一個處理器被配置為：基於該第二複數個運動向量中來自該第一圖像資料的一或多個衝突值來標識該第二圖像資料中的一或多個衝突；及基於與該第二複數個運動向量相關聯的移動資料，從該第一圖像資料中選擇該一或多個衝突值中的一者。Aspect 12A. The apparatus of any one of aspects 1A to 11A, wherein the at least one processor is configured to: based on one or more of the second plurality of motion vectors from the first image data conflict values to identify one or more conflicts in the second image data; and selecting the one or more conflict values from the first image data based on motion data associated with the second plurality of motion vectors one of them.

態樣13A.根據態樣1A至12A中任一項所述的裝置，其中該深度資訊包括從第一視角看的環境的三維表示。Aspect 13A. The device of any one of aspects 1A to 12A, wherein the depth information includes a three-dimensional representation of the environment from the first perspective.

態樣14A.根據態樣1A至13A中任一項所述的裝置，其中該深度資料是從至少一個深度感測器接收的。Aspect 14A. The device of any one of aspects 1A to 13A, wherein the depth data is received from at least one depth sensor.

態樣15A.根據態樣1A至14A中任一項所述的裝置，其還包括：顯示器，其中為了輸出該第二圖像資料，該至少一個處理器被配置為使用至少該顯示器來顯示該第二圖像資料。Aspect 15A. The apparatus of any one of aspects 1A to 14A, further comprising: a display, wherein to output the second image data, the at least one processor is configured to display the second image data using at least the display. Second image data.

態樣16A.根據態樣1A至15A中任一項所述的裝置，其還包括：通訊介面，其中為了輸出該第二圖像資料，該至少一個處理器被配置為使用至少該通訊介面將至少該第二圖像資料發送到至少接收方設備。Aspect 16A. The device of any one of aspects 1A to 15A, further comprising: a communication interface, wherein in order to output the second image data, the at least one processor is configured to use at least the communication interface to At least the second image material is sent to at least the recipient device.

態樣17A.根據態樣1A至16A中任一項所述的裝置，其中該裝置包括頭戴式顯示器（HMD）、移動手持終端或無線通訊設備中的至少一者。Aspect 17A. The device of any one of Aspects 1A to 16A, wherein the device includes at least one of a head-mounted display (HMD), a mobile handheld terminal, or a wireless communication device.

態樣18A.根據態樣1A至17A中任一項所述的裝置，其中該第一圖像資料中對該環境的圖示從第一視角圖示環境，其中該視角變化是該第一視角與和該第二圖像資料中對該環境的第二圖示相對應的不同視角之間的變化。Aspect 18A. The device of any one of aspects 1A to 17A, wherein the illustration of the environment in the first image data illustrates the environment from a first perspective, wherein the change in perspective is the first perspective Changes between different viewing angles corresponding to the second representation of the environment in the second image material.

態樣19A.根據態樣1A至18A中任一項所述的裝置，其中該視角變化包括視角的視差移動或圍繞軸線的視角旋轉中的至少一者，其中該至少一個處理器被配置為：經由使用者介面接收以下一者：對視角的視差移動的距離的指示，或對視角的旋轉角度或軸線的指示。Aspect 19A. The device of any one of aspects 1A to 18A, wherein the change in perspective includes at least one of a parallax shift of perspective or a rotation of perspective about an axis, wherein the at least one processor is configured to: One of the following is received via the user interface: an indication of a distance of disparity movement of the viewing angle, or an indication of a rotation angle or axis of the viewing angle.

態樣20A.根據態樣1A至19中任一項所述的裝置，其中該至少一個處理器被配置為：基於該第一複數個運動向量的相應端點中的一或多個間隙，標識該第二複數個運動向量中的一或多個間隙在該第二圖像資料中產生一或多個間隙；及在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 20A. The apparatus of any one of aspects 1A to 19, wherein the at least one processor is configured to identify, based on one or more gaps in corresponding endpoints of the first plurality of motion vectors, One or more gaps in the second plurality of motion vectors create one or more gaps in the second image data; and before outputting the second image data, filling the second plurality of motion vectors at least partially by using interpolation. one or more gaps in the second image data to modify the second image data.

態樣21A.一種影像處理方法，該方法包括：接收包括與環境相對應的深度資訊的深度資料；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料包括對該環境的圖示；基於至少該深度資料，產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量；基於該第一複數個運動向量使用網格反演產生第二複數個運動向量，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及輸出該第二圖像資料。Aspect 21A. An image processing method, the method comprising: receiving depth data including depth information corresponding to the environment; receiving first image data captured by an image sensor, the first image data including an illustration of the environment; generating, based on at least the depth data, a first plurality of motion vectors corresponding to changes in perspective of the illustration of the environment in the first image data; using the first plurality of motion vectors based on the first plurality of motion vectors Grid inversion generates a second plurality of motion vectors indicating corresponding distances that corresponding primitives of the representation of the environment in the first image material move in response to the change in viewing angle; at least in part Second image data is generated by modifying the first image data based on the second plurality of motion vectors, wherein the second image data includes a second view of the environment from a different perspective than the first image data. icon; and output the second image data.

態樣22A.根據態樣21A所述的方法，其中該第二圖像資料包括被配置為圖示第一時間與第三時間之間的第二時間處的環境的插補圖像，其中該第一圖像資料包括圖示至少該第一時間或該第三時間中的一個時間處的環境的至少一個圖像。Aspect 22A. The method of aspect 21A, wherein the second image material includes an interpolated image configured to illustrate the environment at a second time between the first time and a third time, wherein the The first image material includes at least one image illustrating the environment at at least one of the first time or the third time.

態樣23A.根據態樣21A至22A中任一項所述的方法，其中該第一圖像資料包括包括視差移動的複數個視訊資料圖框，其中該第二圖像資料包括該複數個視訊資料圖框的減少該視差移動的穩定變體。Aspect 23A. The method of any one of aspects 21A to 22A, wherein the first image data includes a plurality of video data frames including parallax motion, and wherein the second image data includes the plurality of video data frames Stable variant of the data frame that reduces parallax movement.

態樣24A.根據態樣21A至23A中任一項所述的方法，其中該第一圖像資料包括人從第一角度觀看該圖像感測器，其中該第二圖像資料包括該人從與該第一角度不同的第二角度觀看該圖像感測器。Aspect 24A. The method of any one of aspects 21A to 23A, wherein the first image data includes a person viewing the image sensor from a first angle, and wherein the second image data includes the person The image sensor is viewed from a second angle different from the first angle.

態樣25A.根據態樣21A至24A中任一項所述的方法，其中視角變化包括根據角度並圍繞軸線的視角旋轉。Aspect 25A. The method of any one of aspects 21A to 24A, wherein the viewing angle change includes a viewing angle rotation according to an angle and about an axis.

態樣26A.根據態樣21A至25A中任一項所述的方法，其中視角變化包括根據方向和距離的視角平移。Aspect 26A. The method of any one of aspects 21A to 25A, wherein the viewing angle change includes viewing angle translation as a function of direction and distance.

態樣27A.根據態樣21A至26A中任一項所述的方法，其中視角變化包括變換。Aspect 27A. The method of any one of aspects 21A to 26A, wherein the change in viewing angle includes a transformation.

態樣28A.根據態樣21A至27A中任一項所述的方法，其中該視角變化包括沿著軸線在該第一圖像資料中的對該環境的圖示的原始視角與物件在該環境中的位置之間的移動，其中該物件的至少一部分在該第一圖像資料中進行了圖示。Aspect 28A. The method of any one of aspects 21A to 27A, wherein the change in perspective includes an original perspective along an axis of the illustration of the environment in the first image data and objects in the environment. Movement between locations in which at least a portion of the object is illustrated in the first image data.

態樣29A.根據態樣21A至28A中任一項所述的方法，其還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個間隙；及在輸出該第二圖像資料之前，藉由使用插補填充該第二圖像資料中的一或多個間隙來至少部分地修改該第二圖像資料。Aspect 29A. The method of any one of aspects 21A to 28A, further comprising identifying one or more of the second image data based on one or more gaps in the second plurality of motion vectors. a plurality of gaps; and before outputting the second image data, at least partially modifying the second image data by filling one or more gaps in the second image data using interpolation.

態樣30A.根據態樣21A至29A中任一項所述的方法，其還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 30A. The method of any one of aspects 21A-29A, further comprising identifying one or more of the second image data based on one or more gaps in the second plurality of motion vectors. a plurality of occluded regions; and before outputting the second image data, modifying the second image data at least in part by filling one or more gaps in the second image data using inpainting.

態樣31A.根據態樣21A至30A中任一項所述的方法，其還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復利用一或多個訓練過的機器學習模型填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 31A. The method of any one of aspects 21A to 30A, further comprising identifying one or more of the second image data based on one or more gaps in the second plurality of motion vectors. a plurality of occluded regions; and modifying, at least in part, by using inpainting to fill one or more gaps in the second image data using one or more trained machine learning models before outputting the second image data. the second image data.

態樣32A.根據態樣21A至31A中任一項所述的方法，其還包括：基於該第二複數個運動向量中來自該第一圖像資料的一或多個衝突值來標識該第二圖像資料中的一或多個衝突；及基於與該第二複數個運動向量相關聯的移動資料，從該第一圖像資料中選擇該一或多個衝突值中的一者。Aspect 32A. The method of any one of aspects 21A-31A, further comprising: identifying the third motion vector based on one or more conflict values from the first image data in the second plurality of motion vectors. one or more conflicts in two image data; and selecting one of the one or more conflict values from the first image data based on motion data associated with the second plurality of motion vectors.

態樣33A.根據態樣21A至32A中任一項所述的方法，其中該深度資訊包括從第一視角看的環境的三維表示。Aspect 33A. The method of any one of aspects 21A to 32A, wherein the depth information includes a three-dimensional representation of the environment from the first perspective.

態樣34A.根據態樣21A至33A中任一項所述的方法，其中該深度資料是從至少一個深度感測器接收的。Aspect 34A. The method of any one of aspects 21A to 33A, wherein the depth data is received from at least one depth sensor.

態樣35A.根據態樣21A至34A中任一項所述的方法，其中輸出該第二圖像資料包括使用至少顯示器來顯示該第二圖像資料。Aspect 35A. The method of any one of aspects 21A to 34A, wherein outputting the second image material includes using at least a display to display the second image material.

態樣36A.根據態樣21A至35A中任一項所述的方法，其中輸出該第二圖像資料包括使用至少一個通訊介面使該第二圖像資料發送到至少一個接收方設備。Aspect 36A. The method of any one of aspects 21A to 35A, wherein outputting the second image data includes causing the second image data to be sent to at least one recipient device using at least one communication interface.

態樣37A.根據態樣21A至36A中任一項所述的方法，其中該方法是使用包括頭戴式顯示器（HMD）、移動手持終端或無線通訊設備中的至少一者的裝置來執行的。Aspect 37A. The method of any one of aspects 21A to 36A, wherein the method is performed using a device including at least one of a head mounted display (HMD), a mobile handheld terminal, or a wireless communication device .

態樣38A.根據態樣21A至37A中任一項所述的方法，其中該第一圖像資料中對該環境的圖示從第一視角圖示環境，其中該視角變化是該第一視角與和該第二圖像資料中對該環境的第二圖示相對應的不同視角之間的變化。Aspect 38A. The method of any one of Aspects 21A to 37A, wherein the illustration of the environment in the first image data illustrates the environment from a first perspective, wherein the change in perspective is the first perspective Changes between different viewing angles corresponding to the second representation of the environment in the second image material.

態樣39A.根據態樣21A至38A中任一項所述的方法，其中該視角變化包括視角的視差移動或圍繞軸線的視角旋轉中的至少一者，該方法還包括：經由使用者介面接收以下一者：對視角的視差移動的距離的指示，或對視角的旋轉角度或軸線的指示。Aspect 39A. The method of any one of aspects 21A to 38A, wherein the change in viewing angle includes at least one of a parallax shift of the viewing angle or a rotation of the viewing angle about an axis, the method further comprising: receiving, via a user interface Either: an indication of the distance by which the parallax of the viewing angle has moved, or an indication of the angle or axis of rotation of the viewing angle.

態樣40A.根據態樣21A至39A中任一項所述的方法，其還包括：基於該第一複數個運動向量的相應端點中的一或多個間隙，標識該第二複數個運動向量中的一或多個間隙在該第二圖像資料中產生一或多個間隙；及在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 40A. The method of any one of aspects 21A-39A, further comprising identifying the second plurality of motions based on one or more gaps in corresponding endpoints of the first plurality of motion vectors. One or more gaps in the vector create one or more gaps in the second image data; and before outputting the second image data, at least partially filling in the second image data by using interpolation one or more gaps to modify the second image data.

態樣41A：一種其上儲存有指令的非暫時性電腦可讀取媒體，該等指令在由一或多個處理器執行時使該一或多個處理器：接收包括與環境相對應的深度資訊的深度資料；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料包括對該環境的圖示；基於至少該深度資料，產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量；基於該第一複數個運動向量使用網格反演產生第二複數個運動向量，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及輸出該第二圖像資料。Aspect 41A: A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to: receive depth information including depth corresponding to the environment Depth data of information; receiving first image data captured by an image sensor, the first image data including an illustration of the environment; based on at least the depth data, generating information related to the first image data A first plurality of motion vectors corresponding to changes in the viewing angle of the illustration of the environment; using grid inversion based on the first plurality of motion vectors to generate a second plurality of motion vectors, the second plurality of motion vectors indicating the A corresponding distance that a corresponding primitive of the representation of the environment in the first image data moves in response to the change in viewing angle; generating the second image data at least in part by modifying the first image data based on the second plurality of motion vectors. Image data, wherein the second image data includes a second representation of the environment from a different perspective than the first image data; and outputting the second image data.

態樣42A：根據態樣41A所述的非暫時性電腦可讀取媒體，其還包括根據態樣2A至20A中任一項及/或態樣22A至40A中任一項所述的操作。Aspect 42A: The non-transitory computer-readable medium of aspect 41A, further comprising the operations of any one of aspects 2A to 20A and/or any one of aspects 22A to 40A.

態樣43A：一種影像處理裝置，該裝置包括：用於接收由圖像感測器擷取的第一圖像資料的構件，該第一圖像資料包括對該環境的圖示；用於基於至少該深度資料產生與該第一圖像資料中的對該環境的圖示的視角變化相對應的第一複數個運動向量的構件；用於基於該第一複數個運動向量使用網格反演產生第二複數個運動向量的構件，該第二複數個運動向量指示該第一圖像資料中的對該環境的圖示的相應圖元針對該視角變化移動的相應距離；用於至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料的構件，其中該第二圖像資料包括從與該第一圖像資料不同的視角對該環境的第二圖示；及用於輸出該第二圖像資料的構件。Aspect 43A: An image processing device, the device comprising: means for receiving first image data captured by an image sensor, the first image data including an illustration of the environment; Means for at least the depth data generating a first plurality of motion vectors corresponding to changes in perspective of the representation of the environment in the first image data; for using grid inversion based on the first plurality of motion vectors means for generating a second plurality of motion vectors indicative of respective distances that corresponding primitives of the representation of the environment in the first image material move in response to the change in viewing angle; for at least partially Means for generating second image data by modifying the first image data based on the second plurality of motion vectors, wherein the second image data includes a view of the environment from a different perspective than the first image data. a second illustration; and a component for outputting the second image data.

態樣44A：根據態樣43A所述的裝置，其還包括用於執行根據態樣2A至20A中任一項及/或態樣22A至40A中任一項所述的操作的構件。Aspect 44A: The apparatus of aspect 43A, further comprising means for performing the operations of any one of aspects 2A-20A and/or any one of aspects 22A-40A.

態樣1B.一種影像處理裝置，該裝置包括：至少一個記憶體；及一或多個處理器，該一或多個處理器耦合到該至少一個記憶體，該一或多個處理器被配置為：接收由深度感測器擷取的深度資料，該深度資料包括從第一視角看的環境的三維表示；基於至少該深度資料，決定與從該第一視角到第二視角的改變相對應的第一複數個運動向量；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料從第三視角圖示該環境；基於該第一複數個運動向量使用網格反演來決定與從該第三視角到第四視角的改變相對應的第二複數個運動向量；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，該第二圖像資料從該第四視角圖示該環境；及輸出該第二圖像資料。Aspect 1B. An image processing device, the device comprising: at least one memory; and one or more processors, the one or more processors coupled to the at least one memory, the one or more processors configured for: receiving depth data captured by a depth sensor, the depth data including a three-dimensional representation of the environment as viewed from a first perspective; based on at least the depth data, determining to correspond to a change from the first perspective to a second perspective a first plurality of motion vectors; receiving first image data captured by an image sensor, the first image data illustrating the environment from a third perspective; using a grid based on the first plurality of motion vectors Inverting to determine a second plurality of motion vectors corresponding to the change from the third viewing angle to the fourth viewing angle; generating a second plurality of motion vectors at least in part by modifying the first image data based on the second plurality of motion vectors. image data, the second image data illustrating the environment from the fourth perspective; and outputting the second image data.

態樣2B.根據態樣1B所述的裝置，其中該第二圖像資料包括被配置為圖示第一時間與第三時間之間的第二時間處的環境的插補圖像，其中該第一圖像資料包括圖示該第一時間處的環境的第一圖像和圖示該第三時間處的環境的第二圖像。Aspect 2B. The apparatus of aspect 1B, wherein the second image material includes an interpolated image configured to illustrate the environment at a second time between the first time and a third time, wherein the The first image material includes a first image illustrating the environment at the first time and a second image illustrating the environment at the third time.

態樣3B.根據態樣1B至2B中任一項所述的裝置，其中該第一圖像資料包括包括視差移動的視訊資料，其中該第二圖像資料包括該視訊資料的沒有該視差移動的穩定變體。Aspect 3B. The apparatus of any one of Aspects 1B to 2B, wherein the first image data includes video data including parallax movement, and wherein the second image data includes the video data without the parallax movement. stable variant.

態樣4B.根據態樣1B至3B中任一項所述的裝置，其中該第一圖像資料包括圖示人從第一角度觀看該圖像感測器，其中該第二圖像資料包括圖示該人從與該第一角度不同的第二角度觀看該圖像感測器。Aspect 4B. The device of any one of aspects 1B to 3B, wherein the first image data includes an illustration of a person viewing the image sensor from a first angle, and wherein the second image data includes The figure shows the person viewing the image sensor from a second angle different from the first angle.

態樣5B.根據態樣1B至4B中任一項所述的裝置，其中該第四視角是該第一視角。Aspect 5B. The device of any one of aspects 1B to 4B, wherein the fourth viewing angle is the first viewing angle.

態樣6B.根據態樣1B至5B中任一項所述的裝置，其中該第四視角是該第二視角。Aspect 6B. The device of any one of aspects 1B to 5B, wherein the fourth viewing angle is the second viewing angle.

態樣7B.根據態樣1B至6B中任一項所述的裝置，其中從該第一視角到該第二視角的改變包括根據角度的視角旋轉，其中從該第三視角到該第四視角的改變包括根據該角度的視角旋轉。Aspect 7B. The device of any one of aspects 1B to 6B, wherein the change from the first viewing angle to the second viewing angle includes a viewing angle rotation according to an angle, wherein from the third viewing angle to the fourth viewing angle The changes include rotation of the perspective based on that angle.

態樣8B.根據態樣1B至7B中任一項所述的裝置，其中從該第一視角到該第二視角的改變包括根據方向和距離的視角平移，其中從該第三視角到該第四視角的改變包括根據該方向和該距離的該視角平移。Aspect 8B. The device of any one of aspects 1B to 7B, wherein the change from the first viewing angle to the second viewing angle includes a viewing angle translation as a function of direction and distance, wherein the change from the third viewing angle to the third viewing angle The change of the four viewing angles includes a translation of the viewing angle according to the direction and the distance.

態樣9B.根據態樣1B至8B中任一項所述的裝置，其中從該第一視角到該第二視角的改變包括變換，其中從該第三視角到該第四視角的改變包括該變換。Aspect 9B. The device of any one of aspects 1B to 8B, wherein the change from the first viewing angle to the second viewing angle includes a transformation, wherein the change from the third viewing angle to the fourth viewing angle includes the Transform.

態樣10B.根據態樣1B至9B中任一項所述的方法，其中該一或多個處理器被配置為：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個間隙；及在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 10B. The method of any one of aspects 1B-9B, wherein the one or more processors are configured to identify the first plurality of motion vectors based on one or more gaps. one or more gaps in the second image data; and before outputting the second image data, modify the second image data at least in part by filling one or more gaps in the second image data using interpolation. Image data.

態樣11B.根據態樣1B至10B中任一項所述的方法，其中該一或多個處理器被配置為：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 11B. The method of any one of aspects IB-10B, wherein the one or more processors are configured to identify the first plurality of motion vectors based on one or more gaps. one or more occluded areas in the second image data; and before outputting the second image data, modify the second image data at least in part by filling one or more gaps in the second image data using inpainting. Image data.

態樣12B.一種影像處理方法，該方法包括：接收由深度感測器擷取的深度資料，該深度資料包括從第一視角看的環境的三維表示；基於至少該深度資料，決定與從該第一視角到第二視角的改變相對應的第一複數個運動向量；接收由圖像感測器擷取的第一圖像資料，該第一圖像資料從第三視角圖示該環境；基於該第一複數個運動向量使用網格反演來決定與從該第三視角到第四視角的改變相對應的第二複數個運動向量；至少部分地藉由根據該第二複數個運動向量修改該第一圖像資料來產生第二圖像資料，該第二圖像資料從該第四視角圖示該環境；及輸出該第二圖像資料。Aspect 12B. An image processing method, the method comprising: receiving depth data captured by a depth sensor, the depth data including a three-dimensional representation of the environment as viewed from a first perspective; and based on at least the depth data, determining and selecting from the depth data. A first plurality of motion vectors corresponding to the change from the first perspective to the second perspective; receiving first image data captured by the image sensor, the first image data illustrating the environment from a third perspective; using grid inversion based on the first plurality of motion vectors to determine a second plurality of motion vectors corresponding to the change from the third viewing angle to the fourth viewing angle; at least in part by determining based on the second plurality of motion vectors modifying the first image data to generate second image data illustrating the environment from the fourth perspective; and outputting the second image data.

態樣13B.根據態樣12B所述的方法，其中該第二圖像資料包括被配置為圖示第一時間與第三時間之間的第二時間處的環境的插補圖像，其中該第一圖像資料包括圖示該第一時間處的環境的第一圖像和圖示該第三時間處的環境的第二圖像。Aspect 13B. The method of aspect 12B, wherein the second image material includes an interpolated image configured to illustrate the environment at a second time between the first time and a third time, wherein the The first image material includes a first image illustrating the environment at the first time and a second image illustrating the environment at the third time.

態樣14B.根據態樣12B至13B中任一項所述的方法，其中該第一圖像資料包括包括視差移動的視訊資料，其中該第二圖像資料包括該視訊資料的沒有該視差移動的穩定變體。Aspect 14B. The method of any one of aspects 12B to 13B, wherein the first image data includes video data including parallax motion, and wherein the second image data includes the video data without the parallax motion. stable variant.

態樣15B.根據態樣12B至14B中任一項所述的方法，其中該第一圖像資料包括圖示人從第一角度觀看該圖像感測器，其中該第二圖像資料包括圖示該人從與該第一角度不同的第二角度觀看該圖像感測器。Aspect 15B. The method of any one of aspects 12B to 14B, wherein the first image data includes an illustration of a person viewing the image sensor from a first angle, and wherein the second image data includes The figure shows the person viewing the image sensor from a second angle different from the first angle.

態樣16B.根據態樣12B至15B中任一項所述的方法，其中該第四視角是該第一視角。Aspect 16B. The method of any one of aspects 12B to 15B, wherein the fourth viewing angle is the first viewing angle.

態樣17B.根據態樣12B至16B中任一項所述的方法，其中該第四視角是該第二視角。Aspect 17B. The method of any one of aspects 12B to 16B, wherein the fourth viewing angle is the second viewing angle.

態樣18B.根據態樣12B至17B中任一項所述的方法，其中從該第一視角到該第二視角的改變包括根據角度的視角旋轉，其中從該第三視角到該第四視角的改變包括根據該角度的視角旋轉。Aspect 18B. The method of any one of aspects 12B to 17B, wherein the change from the first viewing angle to the second viewing angle includes a viewing angle rotation according to an angle, wherein from the third viewing angle to the fourth viewing angle The changes include rotation of the perspective based on that angle.

態樣19B.根據態樣12B至18B中任一項所述的方法，其中從該第一視角到該第二視角的改變包括根據方向和距離的視角平移，其中從該第三視角到該第四視角的改變包括根據該方向和該距離的該視角平移。Aspect 19B. The method of any one of aspects 12B to 18B, wherein the change from the first viewing angle to the second viewing angle includes a viewing angle translation as a function of direction and distance, wherein the change from the third viewing angle to the third viewing angle The change of the four viewing angles includes a translation of the viewing angle according to the direction and the distance.

態樣20B.根據態樣12B至19B中任一項所述的方法，其中從該第一視角到該第二視角的改變包括變換，其中從該第三視角到該第四視角的改變包括該變換。Aspect 20B. The method of any one of aspects 12B to 19B, wherein the change from the first perspective to the second perspective includes a transformation, wherein the change from the third perspective to the fourth perspective includes the Transform.

態樣21B.根據態樣12B至20B中任一項所述的方法，其還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個間隙；及在輸出該第二圖像資料之前，至少部分地藉由使用插補填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 21B. The method of any one of aspects 12B to 20B, further comprising: identifying one or more of the second image data based on one or more gaps in the second plurality of motion vectors. a plurality of gaps; and modifying the second image data at least in part by filling one or more gaps in the second image data using interpolation before outputting the second image data.

態樣22B.根據態樣12B至21B中任一項所述的方法，其還包括：基於該第二複數個運動向量中的一或多個間隙來標識該第二圖像資料中的一或多個遮擋區域；及在輸出該第二圖像資料之前，至少部分地藉由使用修復填充該第二圖像資料中的一或多個間隙來修改該第二圖像資料。Aspect 22B. The method of any one of aspects 12B to 21B, further comprising: identifying one or more of the second image data based on one or more gaps in the second plurality of motion vectors. a plurality of occluded regions; and before outputting the second image data, modifying the second image data at least in part by filling one or more gaps in the second image data using inpainting.

態樣23B.一種其上儲存有指令的非暫時性電腦可讀取媒體，該等指令在由一或多個處理器執行時使該一或多個處理器執行根據態樣1B至22B中任一項所述的操作。Aspect 23B. A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to perform any of Aspects 1B through 22B. one of the operations described.

態樣24B.一種影像處理裝置，該裝置包括用於執行根據態樣1B至22B中任一項所述的操作的一或多個構件。Aspect 24B. An image processing device comprising one or more means for performing the operation according to any one of aspects 1B to 22B.

100:圖像擷取和處理系統 105A:圖像擷取裝置 105B:影像處理設備 110:場景 115:鏡頭 120:控制機構 125A:曝光控制機構 125B:聚焦控制機構 125C:變焦控制機構 130:圖像感測器 140:隨機存取記憶體（RAM） 145:唯讀記憶體（ROM） 150:影像處理器 152:主機處理器 154:ISP 156:輸入/輸出（I/O）埠 160:輸入/輸出（I/O）設備 200:成像系統 205:感測器 207:虛擬內容產生器 210:應用程式引擎 215:圖像重投影引擎 220:運動向量引擎 225:網格反演引擎 230:時間扭曲引擎 235:深度感測器支援引擎 240:3D穩定化引擎 245:3D變焦引擎 250:重投影SAT引擎 255:頭部姿態校正引擎 260:擴展現實（XR）後期重投影引擎 265:特效引擎 270:輸出設備 275:收發器 280:回饋引擎 285:媒體資料 290:修改後的媒體資料 300:透視圖 310:HMD 320:使用者 330A:第一相機 330B:第二相機 330C:第三相機 330D:第四相機 335:耳塞 340:顯示器 350:透視圖 400:透視圖 410:移動手持終端 420:前表面 430A:第一相機 430B:第二相機 430C:第三相機 430D:第四相機 435A:揚聲器 435B:揚聲器 440:顯示器 450:透視圖 460:後表面 505:運動向量（MV）網格 510:第一圖像Img1 515:第二圖像Img2 520:反演MV網格 600:概念圖 605:世界場景 610:相機圖像 615:重投影圖像 620:深度資料 700:概念圖 705:時間扭曲 710:第一圖像 715:第二圖像 720:運動向量圖 800:概念圖 810:一組圖像感測器 815:一組深度感測器 820:偏移 900:概念圖 905:3D穩定化 1000:概念圖 1005:3D變焦 1020:深度資料 1100:概念圖 1105:重投影 1110:長焦感測器 1115:廣角感測器 1120:偏移 1125:感測器 1130:長焦圖像 1135:廣角圖像 1140:長焦圖像 1160:深度資料 1200:概念圖 1205:頭部姿態校正 1210:輸入圖像 1215:重投影圖像 1220:深度資料 1300:概念圖 1305:XR後期重投影 1310:輸入圖像 1315:輸入圖像 1320:HMD 1400:概念圖 1405:特效 1410:輸入圖像 1415:圖像 1500:概念圖 1510:擷取的圖像 1515:重投影的圖像 1600:方塊圖 1605:3D變換 1610:方塊圖 1615:MV計算 1620:運動向量（MV） 1625:網格反演 1630:反演運動向量 1700:方塊圖 1705:扭曲引擎 1710:擷取的圖像 1715:重投影的圖像 1730:反演MV 1800:概念圖 1805:重投影的圖像 1810:遮擋圖 1815:修復的圖像 1825:變焦圖像 1830:修復的圖像 1835:遮擋區域 1900:方塊圖 1905:重投影和網格反演系統 1910:MV網格 1915:第一選項 1920:第二選項 1930:箭頭 1935:箭頭 1940:圖元/箭頭 2000:概念圖 2010:圖元 2015:圖元 2020:圖元 2025:主行走引擎 2030:三角控制引擎 2035:圖元插補引擎 2100:概念圖 2110:圖像 2115:重投影的圖像 2200:概念圖 2220:流程 2300:概念圖 2400:方塊圖 2405:相機 2410:圖像感測器 2415:動態隨機存取記憶體（DRAM） 2420:光流引擎 2425:網格反演引擎 2430:扭曲引擎 2440:影像處理引擎 2500:方塊圖 2505:時間去模糊引擎 2520:進一步變換 2535:重投影引擎 2600:方塊圖 2700:概念圖 2705:投影對準 2710:對準 2800:方塊圖 2900:概念圖 3000:概念圖 3005:附加示例 3100:概念圖 3105:數位變焦 3110:圖示 3200:概念圖 3300:概念圖 3305:偏移距離 3310:偏移角度 3315:深度資料 3320:遮擋圖 3400:概念圖 3500:概念圖 3505:遮擋區域 3605:遮擋區域 3700:概念圖 3800:概念圖 3900:神經網路 3910:輸入層 3912A:隱藏層 3912B:隱藏層 3912N:隱藏層 3914:輸出層 3916:節點 4000:程序 4005:操作 4010:操作 4015:操作 4020:操作 4025:操作 4030:操作 4100:計算系統 4105:連接 4110:處理器 4112:快取記憶體 4115:系統記憶體 4120:唯讀記憶體（ROM） 4125:隨機存取記憶體（RAM） 4130:儲存裝置 4132:服務 4134:服務 4135:輸出設備 4136:服務 4140:通訊介面 4145:其它輸入設備 RGB:視覺 RGB3:主圖像感測器 100:Image capture and processing system 105A:Image capture device 105B:Image processing equipment 110: Scene 115: Lens 120:Control mechanism 125A: Exposure control mechanism 125B: Focus on control mechanism 125C:Zoom control mechanism 130:Image sensor 140: Random access memory (RAM) 145: Read-only memory (ROM) 150:Image processor 152: Host processor 154:ISP 156: Input/output (I/O) port 160: Input/output (I/O) device 200:Imaging system 205: Sensor 207:Virtual content generator 210:Application Engine 215:Image reprojection engine 220: Motion vector engine 225:Grid inversion engine 230:Time Warp Engine 235: Depth sensor support engine 240:3D stabilization engine 245:3D zoom engine 250:Reprojection SAT engine 255:Head posture correction engine 260: Extended Reality (XR) post-reprojection engine 265:Special effects engine 270:Output device 275:Transceiver 280:Feedback engine 285:Media materials 290: Modified media information 300:Perspective 310:HMD 320:User 330A:The first camera 330B: Second camera 330C: Third camera 330D: fourth camera 335:Earplugs 340:Display 350:Perspective 400:Perspective 410: Mobile handheld terminal 420: Front surface 430A:First camera 430B: Second camera 430C: Third camera 430D: The fourth camera 435A: Speaker 435B: Speaker 440:Display 450:Perspective 460:Rear surface 505: Motion Vector (MV) Grid 510: First image Img1 515: Second image Img2 520:Inversion MV grid 600:Concept map 605: World scene 610:Camera image 615:Reproject image 620:In-depth information 700:Concept map 705: Time Warp 710: First image 715: Second image 720: Motion vector diagram 800:Concept map 810: A set of image sensors 815: A set of depth sensors 820:Offset 900:Concept map 905:3D stabilization 1000:Concept map 1005:3D zoom 1020:In-depth information 1100:Concept map 1105:Reprojection 1110: Telephoto sensor 1115:Wide angle sensor 1120:Offset 1125: Sensor 1130: Telephoto image 1135:Wide angle image 1140:Telephoto image 1160:In-depth information 1200:Concept map 1205: Head posture correction 1210:Input image 1215:Reproject image 1220:In-depth information 1300:Concept map 1305:XR post-reprojection 1310:Input image 1315:Input image 1320:HMD 1400:Concept map 1405:Special effects 1410:Input image 1415:Image 1500:Concept map 1510: Captured image 1515:Reprojected image 1600:Block diagram 1605:3D transformation 1610:Block diagram 1615:MV calculation 1620: Motion vector (MV) 1625:Grid inversion 1630:Invert motion vector 1700:Block diagram 1705:Warp Engine 1710: Captured image 1715:Reprojected image 1730:Reverse MV 1800:Concept map 1805:Reprojected image 1810: Occlusion map 1815: Repaired image 1825:Zoom image 1830: Repaired image 1835: Occluded area 1900:Block diagram 1905: Reprojection and grid inversion system 1910:MV Grid 1915:First option 1920:Second option 1930:arrow 1935:Arrow 1940: Graph/Arrow 2000:Concept map 2010:Graphics 2015:Graphics 2020:Graphics 2025: Main travel engine 2030: Triangle Control Engine 2035: Graph element interpolation engine 2100:Concept map 2110:Image 2115:Reprojected image 2200:Concept map 2220:Process 2300:Concept map 2400:Block diagram 2405:Camera 2410:Image sensor 2415: Dynamic Random Access Memory (DRAM) 2420: Optical flow engine 2425:Grid inversion engine 2430:Warp Engine 2440:Image processing engine 2500:Block diagram 2505: Temporal deblurring engine 2520: Further transformation 2535:Reprojection engine 2600:Block diagram 2700:Concept map 2705: Projection alignment 2710:Alignment 2800:Block diagram 2900:Concept map 3000:Concept map 3005: Additional examples 3100:Concept map 3105:Digital zoom 3110: Illustration 3200:Concept map 3300:Concept map 3305:Offset distance 3310:Offset angle 3315:In-depth information 3320: Occlusion map 3400:Concept map 3500:Concept map 3505: Occluded area 3605: Occluded area 3700:Concept map 3800:Concept map 3900:Neural Network 3910:Input layer 3912A:Hidden layer 3912B:Hidden layer 3912N:Hidden layer 3914:Output layer 3916:node 4000:Program 4005: Operation 4010: Operation 4015:Operation 4020: Operation 4025: Operation 4030: Operation 4100:Computing Systems 4105:Connect 4110: Processor 4112: cache memory 4115:System memory 4120: Read-only memory (ROM) 4125: Random access memory (RAM) 4130:Storage device 4132:Service 4134:Service 4135:Output device 4136:Service 4140: Communication interface 4145:Other input devices RGB:Visual RGB3: Main image sensor

下文參考以下附圖詳細地描述本案的說明性實施例：Illustrative embodiments of the present case are described in detail below with reference to the following figures:

圖1是示出根據一些示例的圖像擷取和處理系統的示例性架構的方塊圖；1 is a block diagram illustrating an exemplary architecture of an image capture and processing system according to some examples;

圖2是示出根據一些示例的用於執行用於各種應用的重投影操作的成像系統的示例性架構的方塊圖；2 is a block diagram illustrating an exemplary architecture of an imaging system for performing reprojection operations for various applications, according to some examples;

圖3A是示出根據一些示例的用作擴展現實（XR）系統的頭戴式顯示器（HMD）的透視圖；3A is a perspective view illustrating a head-mounted display (HMD) used as an extended reality (XR) system, according to some examples;

圖3B是示出根據一些示例的由使用者穿戴的圖3A的頭戴式顯示器（HMD）的透視圖；3B is a perspective view illustrating the head mounted display (HMD) of FIG. 3A worn by a user, according to some examples;

圖4A是示出根據一些示例的包括前置相機並且可以用作擴展現實（XR）系統的行動手持終端的前表面的透視圖；4A is a perspective view illustrating the front surface of a mobile handheld terminal that includes a front-facing camera and may be used as an extended reality (XR) system, according to some examples;

圖4B是示出根據一些示例的包括後置相機並且可以用作擴展現實（XR）系統的行動手持終端的後表面的透視圖；4B is a perspective view illustrating the rear surface of a mobile handheld terminal that includes a rear camera and may be used as an extended reality (XR) system, according to some examples;

圖5是示出根據一些示例的網格反演的示例的方塊圖；Figure 5 is a block diagram illustrating an example of grid inversion according to some examples;

圖6是示出根據一些示例的基於深度的重投影的示例的概念圖；6 is a conceptual diagram illustrating an example of depth-based reprojection according to some examples;

圖7是示出根據一些示例的由時間扭曲引擎執行的時間扭曲的示例的概念圖；7 is a conceptual diagram illustrating an example of time warping performed by a time warping engine, according to some examples;

圖8是示出根據一些示例的由深度感測器支援引擎執行的深度感測器支援的示例的概念圖；8 is a conceptual diagram illustrating an example of depth sensor support performed by a depth sensor support engine, according to some examples;

圖9是示出根據一些示例的由3D穩定化引擎執行的3D穩定化的示例的概念圖；9 is a conceptual diagram illustrating an example of 3D stabilization performed by a 3D stabilization engine, according to some examples;

圖10是示出根據一些示例的由3D變焦引擎執行的3D變焦（或電影變焦）的示例的概念圖；10 is a conceptual diagram illustrating an example of 3D zoom (or movie zoom) performed by a 3D zoom engine, according to some examples;

圖11是示出根據一些示例的由重投影SAT引擎執行的重投影的示例的概念圖；11 is a conceptual diagram illustrating an example of reprojection performed by a reprojection SAT engine, according to some examples;

圖12是示出根據一些示例的由頭部姿態校正引擎執行的頭部姿態校正的示例的概念圖；12 is a conceptual diagram illustrating an example of head posture correction performed by a head posture correction engine, according to some examples;

圖13是示出根據一些示例的由XR後期重投影引擎執行的XR後期重投影的示例的概念圖；13 is a conceptual diagram illustrating an example of XR post-reprojection performed by an XR post-reprojection engine, according to some examples;

圖14是示出根據一些示例的由特效引擎執行的特效的示例的概念圖；14 is a conceptual diagram illustrating an example of special effects performed by a special effects engine, according to some examples;

圖15是示出根據一些示例的基於矩陣運算的圖像重投影變換的概念圖；Figure 15 is a conceptual diagram illustrating an image reprojection transformation based on matrix operations according to some examples;

圖16是示出根據一些示例的基於深度資料的網格反演變換以及3D變換的方塊圖；Figure 16 is a block diagram illustrating depth data-based mesh inversion transformation and 3D transformation according to some examples;

圖17是示出根據一些示例的基於運動向量的圖像重投影變換的方塊圖；17 is a block diagram illustrating a motion vector-based image reprojection transformation according to some examples;

圖18是示出根據一些示例的修復以解決遮擋的示例的概念圖；Figure 18 is a conceptual diagram illustrating an example of repair to address occlusions according to some examples;

圖19是示出根據一些示例的重投影和網格反演系統的架構的方塊圖；Figure 19 is a block diagram illustrating the architecture of a reprojection and grid inversion system according to some examples;

圖20是示出根據一些示例的三角行走操作的示例的概念圖；20 is a conceptual diagram illustrating an example of a triangular walking operation according to some examples;

圖21是示出根據一些示例的遮擋掩蔽的示例的概念圖；Figure 21 is a conceptual diagram illustrating an example of occlusion masking according to some examples;

圖22是示出根據一些示例的孔洞填充的示例的概念圖；22 is a conceptual diagram illustrating an example of hole filling according to some examples;

圖23是示出根據一些示例的由時間扭曲引擎執行的時間扭曲的附加示例的概念圖；23 is a conceptual diagram illustrating additional examples of time warping performed by a time warping engine, according to some examples;

圖24是示出根據一些示例的重投影引擎的在一些示例中用於時間扭曲引擎的示例性架構的方塊圖；24 is a block diagram illustrating an exemplary architecture for a time warp engine in some examples, according to some examples;

圖25是示出根據一些示例的具有時間去模糊的重投影引擎的在一些示例中用於具有時間去模糊的時間扭曲引擎的示例性架構的方塊圖；25 is a block diagram illustrating an exemplary architecture for a time warp engine with temporal deblurring in some examples, according to some examples;

圖26是示出根據一些示例的用於飛行時間（ToF）感測器的深度感測器支援引擎的示例性架構的方塊圖；26 is a block diagram illustrating an exemplary architecture of a depth sensor support engine for a time-of-flight (ToF) sensor, according to some examples;

圖27是示出根據一些示例的由深度感測器支援引擎執行的深度感測器支援的附加示例的概念圖；27 is a conceptual diagram illustrating additional examples of depth sensor support performed by a depth sensor support engine, according to some examples;

圖28是示出根據一些示例的包括圖像重投影引擎及/或3D穩定化引擎的成像系統的示例性架構的方塊圖；28 is a block diagram illustrating an exemplary architecture of an imaging system including an image reprojection engine and/or a 3D stabilization engine, according to some examples;

圖29是示出根據一些示例的與沒有時間扭曲引擎處理的圖像相比使用時間扭曲引擎執行的時間扭曲的附加示例的概念圖；29 is a conceptual diagram illustrating additional examples of time warping performed using a time warping engine compared to images processed without a time warping engine, according to some examples;

圖30是示出根據一些示例的由3D穩定化引擎執行的3D穩定化的附加示例的概念圖；30 is a conceptual diagram illustrating additional examples of 3D stabilization performed by a 3D stabilization engine, according to some examples;

圖31是示出根據一些示例的由3D變焦引擎執行的3D變焦（或電影變焦）的附加示例的概念圖；31 is a conceptual diagram illustrating additional examples of 3D zoom (or movie zoom) performed by a 3D zoom engine, according to some examples;

圖32是示出根據一些示例的由重投影SAT引擎執行的重投影的附加示例的概念圖；32 is a conceptual diagram illustrating additional examples of reprojection performed by a reprojection SAT engine, according to some examples;

圖33是示出根據一些示例的由頭部姿態校正引擎執行的頭部姿態校正的附加示例的概念圖；33 is a conceptual diagram illustrating additional examples of head pose correction performed by a head pose correction engine, according to some examples;

圖34是示出根據一些示例的網格反演的附加示例的概念圖；34 is a conceptual diagram illustrating additional examples of grid inversion according to some examples;

圖35是示出根據一些示例的基於深度學習的修復的使用的示例的概念圖；35 is a conceptual diagram illustrating an example of the use of deep learning-based repair according to some examples;

圖36是示出根據一些示例的不使用深度學習的修復的使用的示例的概念圖；36 is a conceptual diagram illustrating an example of the use of inpainting without deep learning, according to some examples;

圖37是示出根據一些示例的邊緣濾波器和深度濾波器在邊緣上的使用的示例的概念圖；37 is a conceptual diagram illustrating an example of the use of edge filters and depth filters on edges, according to some examples;

圖38是示出根據一些示例的重投影的示例的概念圖；38 is a conceptual diagram illustrating an example of reprojection according to some examples;

圖39是示出根據一些示例的可用於媒體處理操作的神經網路的示例的方塊圖；39 is a block diagram illustrating an example of a neural network that may be used for media processing operations, according to some examples;

圖40是示出根據一些示例的媒體處理程序的流程圖；及Figure 40 is a flowchart illustrating a media processing routine according to some examples; and

圖41是示出用於實施本文描述的某些態樣的計算系統的示例的圖式。41 is a diagram illustrating an example of a computing system for implementing certain aspects described herein.

國內寄存資訊(請依寄存機構、日期、號碼順序註記) 無國外寄存資訊(請依寄存國家、機構、日期、號碼順序註記) 無 Domestic storage information (please note in order of storage institution, date and number) without Overseas storage information (please note in order of storage country, institution, date, and number) without

100:圖像擷取和處理系統 100:Image capture and processing system

105A:圖像擷取裝置 105A:Image capture device

105B:影像處理設備 105B:Image processing equipment

110:場景 110: Scene

115:鏡頭 115: Lens

120:控制機構 120:Control mechanism

125A:曝光控制機構 125A: Exposure control mechanism

125B:聚焦控制機構 125B: Focus on control mechanism

125C:變焦控制機構 125C:Zoom control mechanism

130:圖像感測器 130:Image sensor

140:隨機存取記憶體(RAM) 140: Random Access Memory (RAM)

145:唯讀記憶體(ROM) 145: Read-only memory (ROM)

150:影像處理器 150:Image processor

152:主機處理器 152: Host processor

154:ISP 154:ISP

160:輸入/輸出(I/O)設備 160: Input/output (I/O) device

Claims

An image processing device, the device includes: at least one memory; and At least one processor, the at least one processor coupled to the at least one memory, the at least one processor configured to: receiving depth data including depth information corresponding to an environment; receiving first image data captured by an image sensor, the first image data including an illustration of the environment; Generating a first plurality of motion vectors corresponding to a change in viewing angle of the representation of the environment in the first image data based on at least the depth data; Grid inversion is used to generate a second plurality of motion vectors based on the first plurality of motion vectors, the second plurality of motion vectors indicating that corresponding primitives of the representation of the environment in the first image data are for the The corresponding distance moved when the perspective changes; Second image data is generated at least in part by modifying the first image data based on the second plurality of motion vectors, wherein the second image data includes an image of the first image data from a different perspective than the first image data. a second representation of the environment; and Output the second image data.

The apparatus of claim 1, wherein the second image data includes an interpolated image configured to illustrate the environment at a second time between a first time and a third time, wherein the first The image material includes at least one image illustrating the environment at at least one of the first time or the third time.

The device according to claim 1, wherein the first image data includes a plurality of video data frames including a parallax movement, and wherein the second image data includes a plurality of video data frames that reduces the parallax movement. Stable variant.

The device according to claim 1, wherein the first image data includes a person viewing the image sensor from a first angle, and wherein the second image data includes the person viewing the image sensor from a second angle different from the first angle. View the image sensor at an angle.

The device of claim 1, wherein a change in viewing angle includes a rotation of viewing angle according to an angle and about an axis.

The device according to claim 1, wherein a viewing angle change includes a viewing angle translation according to a direction and a distance.

The device of claim 1, wherein a change in viewing angle includes a transformation.

The apparatus of claim 1, wherein the change in perspective includes a change along an axis between an original perspective of the representation of the environment in the first image data and a position of an object in the environment. Movement, wherein at least a portion of the object is illustrated in the first image material.

The device according to claim 1, wherein the at least one processor is configured to: identifying one or more gaps in the second image data based on one or more gaps in the second plurality of motion vectors; and Before outputting the second image data, the second image data is modified at least in part by filling one or more gaps in the second image data using interpolation.

The device according to claim 1, wherein the at least one processor is configured to: identifying one or more occlusion regions in the second image data based on one or more gaps in the second plurality of motion vectors; and Before outputting the second image data, the second image data is modified at least in part by filling one or more gaps in the second image data using repair.

The device according to claim 1, wherein the at least one processor is configured to: identifying one or more occlusion regions in the second image data based on one or more gaps in the second plurality of motion vectors; and Modifying the second image data at least in part by filling one or more gaps in the second image data using one or more trained machine learning models before outputting the second image data. .

The device according to claim 1, wherein the at least one processor is configured to: identifying one or more conflicts in the second image data based on one or more conflict values in the second plurality of motion vectors from the first image data; and One of the one or more conflict values is selected from the first image data based on motion data associated with the second plurality of motion vectors.

The device of claim 1, wherein the depth information includes a three-dimensional representation of an environment from a first perspective.

The device of claim 1, wherein the depth data is received from at least one depth sensor.

The device according to claim 1, further comprising: A display, wherein to output the second image data, the at least one processor is configured to display the second image data using at least the display.

The device according to claim 1, further comprising: A communication interface, wherein in order to output the second image data, the at least one processor is configured to send at least the second image data to at least one recipient device using at least the communication interface.

The device according to claim 1, wherein the device includes at least one of a head-mounted display (HMD), a mobile handheld terminal or a wireless communication device.

An image processing method, the method includes: receiving depth data including depth information corresponding to an environment; receiving first image data captured by an image sensor, the first image data including an illustration of the environment; generating a first plurality of motion vectors corresponding to a change in viewing angle of the representation of the environment in the first image data based on at least the depth data; Grid inversion is used to generate a second plurality of motion vectors based on the first plurality of motion vectors, the second plurality of motion vectors indicating the corresponding primitives of the representation of the environment in the first image data for The corresponding distance moved by the change in perspective; Second image data is generated at least in part by modifying the first image data based on the second plurality of motion vectors, wherein the second image data includes an image of the first image data from a different perspective than the first image data. a second representation of the environment; and Output the second image data.

The method of claim 18, wherein the second image data includes an interpolated image configured to illustrate the environment at a second time between a first time and a third time, wherein the first The image material includes at least one image illustrating the environment at at least one of the first time or the third time.

The method of claim 18, wherein the first image data includes a plurality of video data frames including a parallax movement, and wherein the second image data includes a stabilization of the plurality of video data frames that reduces the parallax movement. Variants.

The method of claim 18, wherein the first image data includes a person viewing the image sensor from a first angle, and wherein the second image data includes the person viewing the image sensor from a second angle different from the first angle. View the image sensor at an angle.

The method of claim 18, wherein a viewing angle change includes a viewing angle rotation according to an angle and about an axis.

The method of claim 18, wherein a viewing angle change includes a viewing angle translation according to a direction and a distance.

According to the method of claim 18, wherein a change of perspective includes a transformation.

The method of claim 18, wherein the change of perspective includes movement along an axis between an original perspective of the representation of the environment in the first image data and a position of an object in the environment , wherein at least a portion of the object is illustrated in the first image data.

According to the method of claim 18, it further includes: identifying one or more gaps in the second image data based on one or more gaps in the second plurality of motion vectors; and Before outputting the second image data, the second image data is modified at least in part by filling one or more gaps in the second image data using interpolation.

According to the method of claim 18, it further includes: identifying one or more occlusion regions in the second image data based on one or more gaps in the second plurality of motion vectors; and Before outputting the second image data, the second image data is modified at least in part by filling one or more gaps in the second image data using repair.

According to the method of claim 18, it further includes: identifying one or more occlusion regions in the second image data based on one or more gaps in the second plurality of motion vectors; and Modifying the second image data at least in part by filling one or more gaps in the second image data using one or more trained machine learning models before outputting the second image data. .

According to the method of claim 18, it further includes: identifying one or more conflicts in the second image data based on one or more conflict values in the second plurality of motion vectors from the first image data; and One of the one or more conflict values is selected from the first image data based on motion data associated with the second plurality of motion vectors.

The method of claim 18, wherein outputting the second image data includes using at least one display to display the second image data.