CN116645463A

CN116645463A - Method for generating three-dimensional model and data processing device for executing the method

Info

Publication number: CN116645463A
Application number: CN202310147079.XA
Authority: CN
Inventors: 成昌勋; 金炳德
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2022-02-23
Filing date: 2023-02-22
Publication date: 2023-08-25

Abstract

A method and data processing apparatus for generating a three-dimensional model are provided. An input image is obtained to include color data and depth data of a target. An image chart is generated based on color data of the input images such that the image chart includes correlation values between the input images. The registration order is determined based on the image chart. Pose information of each input image is sequentially generated based on the registration order of the input images and the depth data. The input images are registered as registered images in order such that each registered image includes pose information. A three-dimensional model of the object is reconstructed based on the registered images. By determining the registration order based on the color data and sequentially generating pose information of the input image with the registration images according to the registration order, pose information can be generated and an exhaustive three-dimensional model can be reconstructed.

Description

Method for generating three-dimensional model and data processing device for executing the method

Cross Reference to Related Applications

The present application claims priority from korean patent application No.10-2022-0023364 filed by the Korean Intellectual Property Office (KIPO) at 2 months of 2022 and korean patent application No. 10-2022-0074044 filed by KIPO at 20 months of 2022, the disclosures of which are incorporated herein by reference in their entireties.

Technical Field

Example embodiments relate generally to semiconductor integrated circuits and, more particularly, relate to a method of generating a three-dimensional model and a data processing apparatus performing the method.

Background

Recently, reconstructing a three-dimensional real world attracts much attention in the field of mobile systems. Mixed reality systems such as Augmented Reality (AR), virtual Reality (VR), and the like may combine virtual objects and the real world. The growing market for digital cartography and the universe of primitives will require more advanced technology. It is not easy to reconstruct an exhaustive and reliable three-dimensional model from images taken by an image sensor.

Disclosure of Invention

Some example embodiments provide a method of generating a three-dimensional model and a data processing apparatus capable of efficiently reconstructing an exhaustive three-dimensional model.

According to some example embodiments, a method of generating a three-dimensional model includes: obtaining a plurality of input images such that each input image of the plurality of input images includes color data and depth data of a target; generating an image graph based on color data of the plurality of input images such that the image graph includes correlation values between the plurality of input images; determining a registration order of the plurality of input images based on the image chart; generating pose information of each input image in order with respect to the plurality of input images based on a registration order and depth data of the plurality of input images; registering the plurality of input images as registered images in order such that each registered image includes pose information; and reconstructing a three-dimensional model of the object based on the registered image.

According to some example embodiments, a method of generating a three-dimensional model includes: obtaining a plurality of input images such that each input image of the plurality of input images includes color data and depth data of a target; extracting two-dimensional feature points included in each input image based on the color data of each input image; generating an image graph based on the matching information of the two-dimensional feature points such that the image graph includes correlation values between the plurality of input images; determining a registration order of the plurality of input images based on the image chart; generating virtual depth data based on the depth data of the registered input image, generating pose information of a current input image to be registered next to the registered input image based on the virtual depth data and the depth data of the registered input image; registering the plurality of input images as registered images in order such that each registered image includes pose information; and reconstructing a three-dimensional model of the object based on the registered image.

According to some example embodiments, a data processing apparatus includes processing circuitry configured to: receiving a plurality of input images such that each input image of the plurality of input images includes color data and depth data of a target; generating an image graph based on the color data of the plurality of input images such that the image graph includes correlation values between the plurality of input images; determining a registration order of the plurality of input images based on the image chart; generating pose information of each input image in order with respect to the plurality of input images based on the registration order of the plurality of input images and the depth data; registering the plurality of input images in order as registered images such that each registered image includes the pose information; and reconstructing a three-dimensional model of the object based on the registered image.

Methods and data processing apparatus according to some example embodiments may generate exact or near exact pose information and reconstruct an exhaustive three-dimensional model by determining a registration order based on color data and generating pose information of an input image in order with the registration image according to the registration order.

Additionally, methods and data processing apparatus according to some example embodiments may estimate exact or near exact pose information relative to both large and small motions by utilizing both color data and depth data.

Further, the method and data processing apparatus according to some example embodiments may conveniently supplement a absent image and remove a noise image by determining a registration order based on an image chart representing a correlation between input images.

Drawings

Example embodiments of the present disclosure will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.

FIG. 1 is a flowchart illustrating a method of generating a three-dimensional model according to some example embodiments.

FIG. 2 is a block diagram illustrating a data processing apparatus performing a method of generating a three-dimensional model according to some example embodiments.

Fig. 3 is a block diagram illustrating a system according to some example embodiments.

Fig. 4A and 4B are diagrams illustrating example embodiments of providing input images to be combined to generate a three-dimensional model, according to some example embodiments.

FIG. 5 is a diagram illustrating a tracking-based image sequence and image set in a method of generating a three-dimensional model according to some example embodiments.

FIG. 6 is a flowchart illustrating an example embodiment of generating an image chart in a method of generating a three-dimensional model according to some example embodiments.

Fig. 7 is a diagram illustrating an example of extracting feature points in a method of generating a three-dimensional model according to some example embodiments.

Fig. 8 and 9 are diagrams illustrating example embodiments of image charts in a method of generating a three-dimensional model according to some example embodiments.

Fig. 10 is a flowchart illustrating an example embodiment of determining a registration order in a method of generating a three-dimensional model according to some example embodiments.

Fig. 11, 12 and 13 are diagrams for describing an example embodiment of determining the registration order of fig. 10.

Fig. 14A and 14B are diagrams for describing pose information in a method of generating a three-dimensional model according to some example embodiments.

Fig. 15 is a flowchart illustrating an example embodiment of generating pose information in a method of generating a three-dimensional model according to some example embodiments.

FIG. 16 is a flowchart illustrating an example embodiment of generating depth data in a method of generating a three-dimensional model according to some example embodiments.

Fig. 17 and 18 are flowcharts illustrating example embodiments of generating pose information of a current input image in a method of generating a three-dimensional model according to some example embodiments.

FIG. 19 is a flowchart illustrating an example embodiment of reconstructing a three-dimensional model in a method of generating a three-dimensional model according to some example embodiments.

Fig. 20A, 20B, 21, and 22 are diagrams illustrating effects of a method of generating a three-dimensional model according to some example embodiments.

Fig. 23 is a block diagram illustrating an example of an image capturing apparatus included in a system apparatus according to some example embodiments.

Fig. 24 is a diagram illustrating an example embodiment of a sensing unit included in the image photographing device of fig. 23.

Fig. 25 is a diagram illustrating an example embodiment of a pixel array included in the sensing unit of fig. 24.

Fig. 26 is a diagram illustrating an example embodiment of a sensing unit included in the image photographing device of fig. 23.

Fig. 27A and 27B are diagrams illustrating some example embodiments of a pixel array included in the sensing unit of fig. 26.

Fig. 28A, 28B, 28C, and 28D are circuit diagrams illustrating some example embodiments of unit pixels included in a pixel array.

Fig. 29 is a diagram showing an example embodiment of a pixel array included in a depth sensor.

Fig. 30 is a circuit diagram illustrating an example embodiment of a depth pixel included in the pixel array of fig. 29.

Fig. 31 is a timing diagram illustrating a time-of-flight (ToF) operation of the depth pixel of fig. 30.

Fig. 32 is a block diagram illustrating a camera system according to some example embodiments.

FIG. 33 is a block diagram illustrating a computer system according to some example embodiments.

Detailed Description

Various example embodiments will be described more fully hereinafter with reference to the accompanying drawings, in which some example embodiments are shown. In the drawings, like numbers refer to like elements throughout. Duplicate descriptions may be omitted.

Referring to fig. 1, a plurality of input images may be obtained such that each of the plurality of input images includes color data and depth data of a target (S100). Different ones of the plurality of input images may include data common to at least a portion of the object. The target may be a single object or a collection of objects such as indoor structures. As will be described below with reference to fig. 23 to 31, the color data may represent two-dimensional color information of the target, and the depth data may represent distance information between the target and the camera or the image sensor.

In some example embodiments, the plurality of input images may be captured by a plurality of cameras as will be described below with reference to fig. 4A, or may be captured in sequence by a single camera as will be described below with reference to fig. 4B.

An image chart may be generated based on color data of the plurality of input images such that the image chart includes correlation values between the plurality of input images (S200). In some example embodiments, the two-dimensional feature points included in each input image may be extracted based on the color data of each input image, and the correlation value may be determined based on the mapping relationship of the two-dimensional feature points. Extraction of the two-dimensional feature points and generation of the matching information indicating the mapping relationship may be performed using various schemes. Some example embodiments of generating an image chart based on two-dimensional feature points will be described below with reference to fig. 6 to 9.

A registration order of the plurality of input images may be determined based on the image chart (S300). The registration order may be determined such that more input images associated with other input images may be registered earlier. Some example embodiments of determining the registration order will be described below with reference to fig. 10 to 13.

Pose information for each input image may be sequentially generated with respect to the plurality of input images based on a registration order and depth data of the plurality of input images (S400). The plurality of input images may be sequentially registered as registered images such that each registered image includes pose information (S500).

As will be described below with reference to fig. 14A and 14B, pose information may include a position and an orientation of a camera capturing a corresponding input image. The registration image refers to an input image having pose information generated, and may include color data, depth data, and pose information.

In some example embodiments, virtual depth data may be generated based on depth data of a registered input image, and pose information of a current input image may be generated based on the virtual depth data and depth data of the current input image to be registered next to the registered input image. Some example embodiments of generating pose information will be described below with reference to fig. 15 to 18.

A three-dimensional model of the target may be reconstructed based on the registered image (S600). As will be described below with reference to fig. 19, the reconstruction of the three-dimensional model may include a sparse reconstruction corresponding to a three-dimensional point cloud and a dense reconstruction corresponding to a surface reconstruction of the three-dimensional model.

Referring to fig. 2, the data processing apparatus 500 may include a controller CTRL 10, an input circuit INP 20, an image graph generator IGG 30, a pose estimator PE 40, a registration agent REG50, a model generator MDG 60, and a memory device MEM 70.

The input circuit 20 may receive a plurality of input images (e.g., m input images I ₁ ～I _m ) Such that each of the plurality of input images includes color data and depth data of the target. The input circuit may have various configurations configured to communicate with an external device such as a camera. The received input image iimg may be stored in the memory device 70.

The image chart generator 30 may generate an image chart based on the color data of the plurality of input images such that the image chart includes correlation values between the plurality of input images iimg. The model generator 60 may access the memory device 70 to read the input image IMG and store the generated image chart IMGR.

The controller 10 may control the overall operation of the data processing apparatus 500. Additionally, the controller 10 may determine a registration order of the plurality of input images iimg based on the image chart IMGR. In some example embodiments, the controller 10 may load the image chart IMGR inside the controller 10 and control the operation of the data processing apparatus 500 using the loaded image chart IMGR. According to the registration order, the controller 10 may provide the registered input image and the current input image to be registered next to the pose estimator 40.

The pose estimator 40 may generate pose information PINF of each input image in order with respect to the plurality of input images I IMG based on the registration order and depth data of the plurality of input images I IMG.

The registration agent 50 may register the plurality of input images iimg as registered images RIMG in order such that each registered image includes pose information PINF. In some example embodiments, registration agent 50 may generate registration image RIMG by adding pose information PINF to the corresponding input image. In some example embodiments, registration agent 50 may generate each registered image by converting each input image relative to a reference coordinate system based on pose information PINF. The generated registration images RIMG may be stored in the memory device 70 in order.

Model generator 60 may reconstruct three-dimensional model 3DM of the target based on registration image RIMG, and the generated three-dimensional model 3DM may be stored in memory device 70. In addition, the three-dimensional model 3DM may be provided to an external device through an interface in the input circuit 20.

The memory device 70 may store the plurality of input images I IMG, image charts IMGR, registered images RIMG, three-dimensional models 3DM, and the like. The memory device 70 may be a memory device dedicated to the data processing device 500 or a common memory device of a system including the data processing device 500.

As such, the method and data processing apparatus 500 according to some example embodiments may generate exact pose information PINF and reconstruct an exhaustive three-dimensional model 3DM by determining a registration order based on color data and sequentially generating pose information PINF of an input image using a registration image RIMG according to the registration order.

Referring to fig. 3, the system 1000 may include a camera module (CAM) 1114, a Transceiver (TRX) 1140, a control unit 1160, and a user interface 1150.

The camera module 1114 may include at least one camera or image sensor configured to capture and provide an input image. The camera module 1114 may include a plurality of cameras that respectively provide one or more input images. Alternatively, the camera module 1114 may include a single camera that provides the input image.

The transceiver 1140 may provide any connectivity required by the system 1000 or desired by the system 1000 for connectivity. The connectivity may include wired and/or wireless connections to other networks, such as the internet, cellular networks, and the like.

The user interface 1150 may include an input device (KPD) 1152 such as a keyboard, keypad, etc., and an output Device (DSP) 1112 such as a display device capable of displaying images captured by the camera module 1114. If appropriate for the particular design, a virtual keyboard may be integrated into display device 1112 with touch screen/sensor technology to omit input device 1152.

The control unit 1116 may include a general purpose Processor (PRC) 1161, a hardware device (HW) 1162, a firmware device (FW) 1163, a memory (MEM) 1164, an Image Signal Processor (ISP) 1166, a Graphics Engine (GENG) 1167, and a bus 1177. Control unit 1160 may perform methods of generating a three-dimensional model according to some example embodiments. That is, the control unit 1160 may be configured to perform the functions of the data processing apparatus 500 described in fig. 2, for example.

Here, it should be noted that some example embodiments may be implemented differently in hardware, firmware, and/or software.

In some example embodiments, a method of generating a three-dimensional model according to some example embodiments may be performed using the image signal processor 1166. In some example embodiments, the method of generating a three-dimensional model may be performed according to program instructions executed by a processing device. Program instructions may be stored in the memory 1164 as software SW 1165 and executed by the general-purpose processor 1161 and/or the image signal processor 1166.

To execute program instructions, for example, general purpose processor 1161 may retrieve or fetch program instructions from internal registers, internal caches, or memory 1164, and decode and execute the program instructions. During or after execution of the program instructions, the general purpose processor 1161 may write one or more results (which may be intermediate or final results) of the program instructions to an internal register, internal cache, or memory 1164.

The system 1000 may be a computer system in one of many possible forms. For example, system 1000 may be an embedded computer system, a system on a chip (SOC), a single board computer System (SBC), such as, for example, a computer on a module (COM) or a system on a module (SOM), a desktop computer system, a notebook or notebook computer system, an interactive kiosk, a mainframe, a computer system network, a mobile telephone, a Personal Digital Assistant (PDA), a server, a tablet computer system, or a combination of two or more of these.

Program instructions for implementing a method of generating a three-dimensional model according to some example embodiments may be stored in one or more non-transitory storage media readable by a computer. The computer-readable non-transitory storage medium may optionally include one or more semiconductor-based or other Integrated Circuits (ICs), such as a Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), a Hard Disk Drive (HDD), a hybrid hard disk drive (HHD), an Optical Disk Drive (ODD), a magneto-optical disk drive, a Floppy Disk Drive (FDD), a magnetic tape, a Solid State Drive (SSD), a RAM drive, a secure digital card or drive), any other suitable computer-readable non-transitory storage medium, or any suitable combination of two or more of these. The computer-readable non-transitory storage medium may optionally be volatile, nonvolatile, or a combination of volatile and nonvolatile.

Fig. 4A and 4B are diagrams illustrating some example embodiments of providing input images to be combined to generate a three-dimensional model, according to some example embodiments.

Fig. 4A shows an example in which an array of cameras including the first camera CAM1 and the second camera CAM2 is disposed along an Axis (AX). The input image may include images I captured by cameras CAM1 and CAM2, respectively ₁ And I ₂ . For ease of illustration, only two cameras CAM1 and CAM2 are shown in fig. 4A. However, one skilled in the art will recognize that three or more cameras may be used in other example embodiments.

In some example embodiments, each of cameras CAM1 and CAM2 (or alternatively, at least one of them) may include an image sensor configured to capture a separate image or a series of images (e.g., video). For example, cameras CAM1 and CAM2 may include Charge Coupled Device (CCD) image sensors or Complementary Metal Oxide Semiconductor (CMOS) active pixel image sensors.

Each camera in the camera array has a particular field of view (FOV) according to a variety of factors, such as: relative camera position, focal length, magnification used, camera size, etc. As shown in fig. 4A, the first camera CAM1 has a first field of view FOV1, and the second camera CAM2 has a second field of view FOV2 different from the first field of view FOV 1.

In this regard, the field of view of a camera may refer to a horizontal, vertical, or diagonal range of a particular scene imaged by the camera. Objects within the field of view of the camera may be captured by the image sensor of the camera and objects outside the field of view may not appear on the image sensor.

The camera may have an orientation that represents an angle or direction in which the camera points. As shown in fig. 4A, camera CAM1 may have a first orientation ORT1 and camera CAM2 may have a second orientation ORT2 that is different from the first orientation ORT 1.

First images I taken by cameras CAM1 and CAM2, respectively, according to inter-camera spacing ICS, fields of view FOV1 and FOV2, and orientations ORT1 and ORT2 ₁ And a second image I ₂ The overlap of (c) may vary. Therefore, it is necessary toBy pre-dividing the image I ₁ And I ₂ Synchronization or coordinate synchronization with the same two-dimensional plane to efficiently merge images I ₁ And I ₂ 。

As shown in fig. 4B, the input image later combined by some example embodiments of the inventive concept may include a first image I taken in sequence by a single Camera (CAM) ₁ And a second image I ₂ . For example, image I ₁ And I ₂ The image may be an image photographed in a serial photographing mode or an image that is oversampled to enhance image quality. In these cases, in image I ₁ And I ₂ Time intervals can occur between and image I ₁ And I ₂ May vary due to hand movements of the user, etc. As in the case of fig. 4A, it is necessary to perform the image I by previously integrating ₁ And I ₂ Synchronization (or coordinate synchronization) with the same two-dimensional plane to efficiently merge images I ₁ And I ₂ 。

A tracking-based method of generating a three-dimensional model is performed based on the sequence of images. In the tracking-based method, pose information is generated by processing two images ordered in time. Thus, if there is degradation or omission of the input image, the acquisition of the image sequence must be resumed, otherwise the quality of the generated three-dimensional model will degrade.

In contrast, according to some example embodiments, any image set as shown in fig. 5 may be used to generate a three-dimensional model because the processing order or registration order is determined based on the correlation between the input images regardless of the time at which the input images were taken. In this way, the method and the data processing apparatus according to some example embodiments can conveniently supplement a absent image and remove a noise image by determining a registration order based on an image chart representing a correlation between input images.

As one of the conventional approaches, simultaneous localization and mapping (SLAM) focuses on real-time tracking and reconstruction from a continuous image set. SLAM systems are optimized or improved to use the point-cloud spares for camera tracking, so they can only produce sparse reconstructions, while dense reconstructions are not possible. The KinectFusion algorithm provides a volume-dense reconstruction of small-sized scenes by combining a large amount of individual depth information into a single volumetric reconstruction. However, the KinectFusion algorithm uses only depth information to estimate pose information through frame-to-model alignment (frame-to-model alignment), which can easily fail pose estimation when there is a large movement between consecutive frames. Relatively exact pose information can be estimated by a conventional method combining color data and an Iterative Closest Point (ICP) scheme, but the accuracy of the pose information is reduced because the accuracy of the three-dimensional point cloud is reduced when the camera motion is small. As with other conventional approaches, motion delta structures (incremental structure to motion, SFM) or visual SFM may exhibit better performance, but SFM approaches are limited in their ability to implement small-scale reconstruction systems, such as hand-held three-dimensional scanning systems, because a large baseline between image pairs is needed or desired due to the accuracy of three-dimensional point and pose estimation.

According to some example embodiments, the robust reconstruction based on color data may be combined with an exact reconstruction based on color data and depth data. In particular, some example embodiments may be applied to handheld scanning systems.

As such, methods and data processing apparatus according to some example embodiments may estimate exact pose information relative to both large and small motions by utilizing both color data and depth data.

Referring to fig. 2 and 6, the image chart generator 30 may extract two-dimensional feature points included in each input image based on color data of each input image (S210). The extraction of the two-dimensional feature points will be described below with reference to fig. 7. The image chart generator 30 may generate matching information indicating a mapping relationship of two-dimensional feature points included in different input images of the plurality of input images (S220), and determine a correlation value based on the matching information (S230). The determination of the correlation value will be described below with reference to fig. 8 and 9.

For tracking and/or identifying objects, image matching may be achieved by extracting feature points in the images to be combined. Feature points may be understood as key points or points of interest.

In order to match the corresponding image portions, it is necessary to extract appropriate feature points that can be easily identified (or detected) and distinguished from the image background. For example, even if the shape and/or position of the object, camera parameters, lighting, etc. changes, the conditions associated with the appropriate feature points may include a high degree of discrimination. One example of a suitable feature point is a corner point, but many different methods may be used. However, most feature point extraction methods are based on corner point(s) extraction, such as harris angles and scale invariant feature point (SIFT) angles, also shown in fig. 7.

In some example embodiments, feature point detection and point matching may be performed on a grayscale version of an input image, and specific contrast may be applied to the input image in a separate operation or through a lookup table. In some example embodiments, feature point detection may be performed globally on an image using local contrast enhancement. Local contrast enhancement increases the "local" contrast while preventing or impeding the increase in "global" contrast, thereby protecting large-scale shadow/highlight detail. For example, the local contrast gradient may indicate edges, corners, or "blobs" corresponding to the feature. Feature detection algorithms may be used to detect features of the image, such as Scale Invariant Feature Transforms (SIFT), speeded Up Robust Features (SURF), or directional FAST and rotational BRIEF (ORB), where FAST stands for "feature from acceleration segment test", BRIEF stands for "binary robust independent basic feature". In some example embodiments, the feature point detection process may detect one or more feature points. For example, the feature points may be detected by taking the differences of a plurality of gaussian smoothing operations. Further, the position of the feature point of each search area and the contrast value of each feature point may be stored.

In some example embodiments, the size of the region for matching feature points in different images may be set according to the size of the image. In some example embodiments, the geometry of the camera system may be known, and based on the known camera system geometry, the approximate number of pixels of the search area and the overlap area of the image may be known in advance. For example, the position and orientation of the cameras of the camera system may be fixed relative to each other, and the overlap between the cameras may be known.

In some example embodiments, the determination of corresponding pairs of feature points respectively included in different input images may be performed using a nearest neighbor search algorithm. For example, the nearest neighbor search algorithm may identify patterns of feature points within each search area of the overlapping region of one image that match corresponding patterns of feature points within each search area of the overlapping region of another image. In some example embodiments, the nearest neighbor algorithm may use a search radius around each feature point to determine the corresponding feature point pair. For example, the search area may have a radius of 32 pixels, 64 pixels, or any suitable radius, or the search area may have 32 pixels by 32 pixels, 64 pixels by 64 pixels, or any suitable size. In some example embodiments, a secondary refinement may be used to realign the corresponding pairs of feature points prior to the final homography calculation.

Fig. 8 and 9 are diagrams illustrating some example embodiments of image charts in a method of generating a three-dimensional model according to some example embodiments.

FIG. 8 shows a first image I ₀ To fifth image I ₄ Respectively included feature points X _ij Is an example of matching information MINF. Here, the feature points in the same line may correspond to the same position in the photographed scene. For example, it should be appreciated from the matching information MINF that the first image I ₀ Feature point X of (2) ₀₂ Second image I ₁ Feature point X of (2) ₁₁ Third image I ₂ Feature point X of (2) ₂₂ And fourth image I ₃ Feature point X of (2) ₃₂ Corresponding to each other, fifth image I ₄ Without any provision forCorresponding feature points. The correlation value between the different input images can be determined by the matching information MIFN.

FIG. 9 shows a method comprising a first input image I to be merged for a three-dimensional model ₁ To ninth input image I ₉ Examples of image graphs IMGR for correlation values between.

The image graph generator 30 of fig. 2 may determine the correlation value based on the matching information described with reference to fig. 8. In some example embodiments, the image graph generator 30 may determine two input images I _i And I _j The number of matching pairs of two-dimensional feature points included therein as two input images I _i And I _j Correlation value M between _i,j . Because the correlation value represents two input images I _i And I _j The relation between them, so the correlation value M _i,j Equal to the correlation value M _j,i . For example, in the example of FIG. 9, M _3,4 ＝M _4,3 ＝145。

In some example embodiments, the controller 10 may determine a noise input image of the plurality of input images based on the image graph IMGR such that a correlation value between the noise input image and other input images is less than a threshold value, and discard the noise image. In the example of FIG. 9, the threshold may be set to 25, and the first input image I may be ₁ Is determined as a noisy input image because of the first input image I ₁ With other input images I ₂ ～I ₉ The correlation values 2, 15, 13, 11, 19, 7, 11 and 5 between are smaller than the threshold value 25.

Fig. 10 is a flowchart illustrating an example embodiment of a determined registration order in a method of generating a three-dimensional model according to some example embodiments, and fig. 11, 12, and 13 are diagrams for describing the example embodiment of the determined registration order of fig. 10.

Referring to fig. 2 and 10, the controller 10 may determine two input images corresponding to the maximum value or the highest correlation value among the correlation values to be registered first (S310) (e.g., determine the two input images to be registered first based on the correlation values).

For example, as shown in fig. 11, a fifth input image I ₅ And a sixth input image I ₆ Phase of (2)Off value M _5,6 May be the maximum or highest of all correlation values. The controller 10 may determine that the fifth input image I needs to be registered first ₅ And a sixth input image I ₆ 。

The controller 10 may determine a sum of correlation values between each input image that has not been registered and the registered input image (S320) and determine an input image corresponding to a maximum value or a highest value of the sum as a current input image to be registered next to the registered input image (S330).

For example, fig. 12 shows when the fifth input image I has been registered ₅ And a sixth input image I ₆ Time image I ₁ 、I ₂ 、I ₃ 、I ₄ 、I ₇ 、I ₈ And I ₉ Each of which is registered with the input image I ₅ And I ₆ Sum of correlation values between the two. For example, with a fourth input image I which has not been registered yet ₄ The corresponding sum CSM may be 62+171=233. And eighth input image I ₈ The corresponding sum CSM289 is the maximum value or highest value of the sum CSM, and the controller 10 may determine that the image I is the fifth input image I that has been registered ₅ And a sixth input image I ₆ Thereafter registering the eighth input image I ₈ 。

Then, FIG. 13 shows when the fifth input image I has been registered ₅ Sixth input image I ₆ And an eighth input image I ₈ Image I ₁ 、I ₂ 、I ₃ 、I ₄ 、I ₇ And I ₉ Each of which is registered with the input image I ₅ 、I ₆ And I ₈ Sum of correlation values between the two. For example, with a fourth input image I which has not been registered yet ₄ The corresponding sum CSM may be 62+171+121=354. And a seventh input image I ₇ The corresponding sum CSM 364 is the maximum value or highest value of the sum CSM, and the controller 10 can determine the value of the sum CSM in the registered fifth input image I ₅ Sixth input image I ₆ And an eighth input image I ₈ Thereafter registering the seventh input image I ₇ 。

The order of the input images used to estimate the pose information PINF can greatly affect the accuracy of pose estimation. According to some example embodiments, error accumulation of pose estimation due to improper order of pose estimation may be prevented and/or reduced by determining a registration order or an order of pose estimation based on color data. In this way, by determining the registration order based on the color data and sequentially generating pose information of the input image with the registration images according to the registration order, exact or nearly exact pose information can be generated and an exhaustive three-dimensional model can be reconstructed.

Fig. 14A shows an example of a reference coordinate system (or world coordinate system) WCS and a camera coordinate system CCS corresponding to one input image. The position or coordinate value of the point P can be expressed as (X) by the reference coordinate system WCS _w ，Y _w ，Z _w ) And is represented by a camera coordinate system CCS as (X _c ，Y _c ，Z _c ). The coordinate values with respect to the different coordinate systems can be obtained by using a geometric transformation matrix T as shown in FIG. 14B _c ^w And performing conversion. In FIG. 14B, r ₁₁ 、r ₁₂ 、r ₁₃ 、r ₂₁ 、r ₂₂ 、r ₂₃ 、r ₃₁ 、r ₃₂ And r ₃₃ Representing the rotation of the camera coordinate system CCS with respect to the reference coordinate system WCS, and t _x 、t _y And t _z Representing the rotation of the camera coordinate system CCS with respect to the reference coordinate system WCS. The rotation and translation correspond to the orientation and position of the camera when the camera takes the corresponding input image. The pose information PINF can pass through the geometric transformation matrix T _c ^w And (3) representing.

Referring to fig. 2 and 15, the pose estimator 40 may generate virtual depth data based on the depth data of the registered input image (S410). The pose estimator 40 may generate pose information of a current input image to be registered next to the registered input image based on the virtual depth data and the depth data of the current input image (S420).

In the example of fig. 13, a fifth input image I ₅ Sixth input image I ₆ And an eighth input image I ₈ Corresponds to the registered input image that has been registered, and a seventh input image I ₇ Corresponding to the current input image registered by generating the pose information PINF. In this case, the pose estimator 40 may be based on the registered input image I ₅ 、I ₆ And I ₈ Depth data generation for seventh input image I ₇ Is described.

In some example embodiments, the pose estimator 40 may perform an Iterative Closest Point (ICP) algorithm based on the virtual depth data and the depth data of the current input image. ICP algorithm plays an important role in the matching and compensation of correction errors. The ICP algorithm can match two point clouds by repeatedly and alternately performing corresponding point searches and pose estimates. Basically, the ICP algorithm can be performed by calculating a solution such that the cost function is minimized or reduced. The example embodiments are not limited to a particular ICP algorithm, and various ICP algorithms may be applied to some example embodiments.

In tracking-based pose estimation, the pose of a current input image is estimated based on depth data of one input image having the estimated pose and temporally adjacent to the current input image. In contrast, according to some example embodiments, the pose of the current input image is estimated based on virtual depth data generated based on depth data of a registered image having pose information PINF. As described above, the registered input image may have exact or near exact pose information PINF generated according to the registration order based on the color data. The depth data of the registered input image may be used to enhance the accuracy of the pose estimation of the current input image. ICP based on such virtual depth data may be referred to as "bundled ICP".

Referring to fig. 2 and 16, the pose estimator 40 may convert the registered input image into a three-dimensional image with respect to the same coordinate system (S411), and generate per-pixel depth values of the virtual depth data based on average values of depth values respectively corresponding to the three-dimensional image (S412).

The generation of such virtual depth data can be represented by expression 1.

Expression 1

In expression 1, I _reg Refers to registered input images, I _i Refers to the ith input image, X _w ⁱ Refers to the pixel of the ith input image, X _w Refers to pixels of virtual depth data, D (X _w ⁱ ) Refers to the depth value per pixel, I, of a three-dimensional image _vir ^d (X _w ) Referring to the per-pixel depth value of the virtual depth data, wi indicates a weighting value corresponding to the i-th input image.

In some example embodiments, the same coordinate system may be a reference coordinate system. In some example embodiments, the same coordinate system may be a coordinate system corresponding to initial pose information as will be described below with reference to fig. 17.

In some example embodiments, the average of the depth values may be obtained by arithmetic averaging. In other words, the weighting value Wi in expression 1 may be 1/N, where N is the number of registered input images included in the calculation of expression 1.

In some example embodiments, the average value of the depth values may be obtained by weighted average using a weighted value corresponding to a correlation value between the current input image and the registered input image. In other words, the weighting value Wi in expression 1 may be a ratio of the correlation value of each registered input image to the sum of correlation values between the current input image and the registered input images.

Referring to fig. 2 and 17, the pose estimator 40 may determine the most relevant input image among the registered input images based on the image chart such that a correlation value between the most relevant input image and the current input image is the largest among the registered input images (S421). The pose estimator 40 may determine initial pose information of the current input image based on the depth data of the most relevant input image and the depth data of the current input image (S422).

In the example of fig. 13, a fifth input image I ₅ Sixth input image I ₆ And an eighth input image I ₈ Corresponds to the registered input image that has been registered, and a seventh input image I ₇ Corresponding to the current input image registered by generating the pose information PINF. In this case, the current input image I ₇ With registered input image I ₅ 、I ₆ And I ₈ The correlation value between them is M _7,5 ＝160，M _7,6 ＝83，M _7,8 =121, so the pose estimator 40 can determine the fifth input image I ₅ Is the most relevant input image. In some example embodiments, the current input image I may be based on ₇ Depth data and most relevant input image I ₅ Is used to generate a current input image I by performing an ICP algorithm ₇ Is set, the initial pose information PINF of (a). In some example embodiments, the current input image I may be based on ₇ Depth data and most relevant input image I ₅ Generating a current input image I by performing a perspective n-point (PnP) algorithm ₇ Is set, the initial pose information PINF of (a).

The pose estimator 40 may correct the initial pose information based on the virtual depth data and the depth data of the current input image to generate pose information of the current input image (S423). In some example embodiments, the pose estimator 40 may correct the initial pose information by performing an ICP algorithm based on the virtual depth data, the depth data of the current input image, and the initial pose information of the current input image.

In this way, by determining initial pose information using the registered input image most relevant to the current input image, the probability of the ICP algorithm converging wrong pose information can be reduced, and the accuracy of pose estimation can be improved.

Referring to fig. 2 and 18, the pose estimator 40 may determine an uncorrelated input image from among registered input images based on the image graph such that a correlation value between the uncorrelated input image and a current input image is less than a threshold value (S415), and exclude the uncorrelated input image in generating the virtual depth data (S416). Here, excluding the uncorrelated input image may indicate that the per-pixel depth value of the uncorrelated input image is excluded in the calculation of expression 1. The accuracy of the initial pose information may be improved by excluding registered input images that are relatively less relevant to the current input image.

Referring to fig. 2 and 19, the model generator 60 may generate a three-dimensional point cloud based on the registered image (S610), and reconstruct a three-dimensional model corresponding to the target based on the three-dimensional point cloud (S620).

In some example embodiments, the three-dimensional point cloud may be optimized or improved by performing three-dimensional bundle adjustment with respect to the registered image. The three-dimensional bundle adjustment may be performed by various methods known to those skilled in the art.

In some example embodiments, a Truncated Symbol Distance Function (TSDF) may be utilized to reconstruct the surface of the three-dimensional model.

Fig. 20A illustrates a translation error ter in meters of pose information, and fig. 20B illustrates a rotation error RERR in radians of pose information, which are estimated with respect to a plurality of image samples according to various methods. In fig. 20A and 20B, SCc refers to the results of the conventional method, and SCp refers to the results of some example embodiments. Fig. 21 shows detailed numerical results of translation error Tran and rotation error Rot with respect to several image samples ISET 1-ISET 5 according to conventional methods SC 1-SC 3 and methods according to some example embodiments. As shown in fig. 20A and 20B, methods according to some example embodiments may provide more regular results and reduced errors compared to conventional methods, regardless of the type of image.

Fig. 22 shows an overlapping view of two adjacent images. As shown in fig. 22, the degree of overlap and the accuracy of pose information may be improved by the method SCp according to some example embodiments, as compared to the conventional methods SC1 and SC 2.

Referring to fig. 23, the image photographing device 100 may include a Light Source (LS) 110, a sensing unit 130, and a timing control unit 150 (or a timing controller). The light source 110 generates modulated transmitted light TX to illuminate the object with the modulated transmitted light TX. The timing control unit 150 generates control signals SYNC and CTRL to control the operations of the light source 110 and the sensing unit 130. The sensing unit 130 may include depth pixels that convert the received light RX into an electrical signal. In addition, the sensing unit 130 may include color pixels that convert the visible light VL into an electrical signal.

The light source 110 may emit modulated transmitted light TX having a given, desired or predetermined wavelength. For example, the light source 110 may emit infrared light or near infrared light. The transmitted light TX generated by the light source 110 may be focused on the object 90 through the lens 81. The received light RX reflected by the object 90 may be focused on the sensing unit 130 through the lens 83.

The light source 110 may be controlled by the control signal SYNC to output the modulated transmitted light TX such that the intensity of the modulated transmitted light TX periodically varies. For example, the light source 110 may be implemented by a Light Emitting Diode (LED), a laser diode, or the like.

The control signals SYNC from the timing control unit 150 may include a reset signal RS and a transfer control signal TG as will be described with reference to fig. 28A to 28D and demodulation signals TG1 to TG4 as will be described with reference to fig. 30 and 31. The control signal SYNC supplied to the light source 110 may include a signal for synchronizing the modulated transmission light TX and the demodulation signals TG1 to TG4.

The sensing unit 130 may include a pixel array PX in which depth pixels and/or color pixels are arranged. The sensing unit 130 may further include an analog-to-digital conversion unit ADC and selection circuits ROW and COL for selecting a specific pixel in the pixel array PX.

In some example embodiments, the image photographing device 100 may be a three-dimensional image sensor including depth pixels for providing distance information and color pixels for providing image information. In this case, the sensing unit 130 may include a pixel array px_cz in which a plurality of depth pixels and a plurality of color pixels are alternately arranged, as will be described with reference to fig. 25.

In some example embodiments, the image photographing device 100 may include a depth sensor and a two-dimensional image sensor that are different from each other. In this case, the sensing unit 130 may include a pixel array px_c in which a plurality of color pixels are arranged and a pixel array px_z in which a plurality of depth pixels are arranged, as will be described with reference to fig. 27A and 27B.

In some example embodiments, the analog-to-digital conversion unit ADC may perform column analog-to-digital conversion of analog signals in parallel using a plurality of analog-to-digital converters respectively connected to a plurality of column lines, or may perform single analog-to-digital conversion of analog signals in series using a single analog-to-digital converter.

In some example embodiments, the analog-to-digital conversion unit ADC may include a Correlated Double Sampling (CDS) unit for extracting an effective signal component (effective voltage) based on a voltage sampled by a pixel.

In some example embodiments, the CDS unit may perform Analog Double Sampling (ADS) extracting the effective signal component based on an analog reset signal representing the reset component and an analog data signal representing the signal component.

In some example embodiments, the CDS unit may perform Digital Double Sampling (DDS) of converting an analog reset signal and an analog data signal into two digital signals to extract a difference between the two digital signals as an effective signal component.

In some example embodiments, the CDS unit may perform dual correlated double sampling for performing both analog double sampling and digital double sampling.

Fig. 24 is a diagram illustrating an example embodiment of a sensing unit included in the image photographing device of fig. 23. Fig. 24 shows an example embodiment of the sensing unit 130a in the case where the image capturing apparatus 100 of fig. 23 is a three-dimensional image sensor.

Referring to fig. 24, the sensing unit 130a may include a pixel array px_cz in which a plurality of color pixels and a plurality of depth pixels are arranged, color pixel selection circuits CROW and CCOL, depth pixel selection circuits ZROW and ZCOL, color pixel converters CADC, and depth pixel converters ZADC. The color pixel selection circuits CROW and CCOL and the color pixel converter CADC may provide color information RCDATA by controlling the color pixels included in the pixel array px_cz, and the depth pixel selection circuits ZROW and ZCOL and the depth pixel converter ZADC may provide depth information RZDATA by controlling the depth pixels included in the pixel array px_cz.

In this way, in the three-dimensional image sensor as shown in fig. 24, the component for controlling color pixels and the component for controlling depth pixels are independently operable to provide color information rcata and depth information RZDATA of a photographed image.

Referring to fig. 25, the pixel array px_cz may include color pixels R, G and B for providing image information and a depth pixel Z for providing depth information. For example, the pixel pattern 101 including the red, green, and blue pixels R, G, and B and the depth pixel Z may be repeatedly arranged in the pixel array px_cz.

Each of the color pixels R, G and B (or alternatively, at least one of them) may include a photodetection region for collecting photoelectrons generated by incident visible light, and the depth pixel Z may include a photodetection region for collecting photoelectrons generated by received light RX (that is, incident infrared light or near infrared light). For example, since the wavelength of infrared light is longer than that of visible light, in order to improve quantum efficiency, the depth pixel Z may include a photodiode formed deeper than the color pixels R, G and B.

A color filter may be formed over the color pixels R, G and B, and an infrared light pass filter may be formed over the depth pixel Z. For example, the red pixel R may be defined by a red filter, the green pixel G may be defined by a green filter, the blue pixel B may be defined by a blue filter, and the depth pixel Z may be defined by an infrared light pass filter. In addition, an infrared light cut filter may be further formed over the color pixels R, G and B.

Fig. 25 shows a non-limiting example of the pixel pattern 101, and the pixel pattern 101 may be variously changed. For example, the area ratio of one color pixel to one depth pixel may be variously changed and/or the number ratio of color pixels to depth pixels in the pixel array px_cz may be variously changed.

Fig. 26 is a diagram illustrating an example embodiment of a sensing unit included in the image photographing device of fig. 23. Fig. 26 illustrates an example embodiment of the sensing unit 130b in the case where the image photographing device 100 of fig. 23 includes a depth sensor and a two-dimensional image sensor that are different from each other.

Referring to fig. 26, the sensing unit 130b may include a pixel array px_c in which a plurality of color pixels are arranged and a pixel array px_z in which a plurality of depth pixels are arranged. The visible light VL for color information and the receiving light RX for depth information may be separated by the beam splitter 55 and then irradiated to the respective pixel arrays px_c and px_z.

The color pixel selection circuits CROW and CCOL, the depth pixel selection circuits ZROW and ZCOL, the color pixel converter CADC, and the depth pixel converter ZADC may be disposed adjacent to the respective pixel arrays px_c and px_z. The color pixel selection circuits CROW and CCOL and the color pixel converter CADC may provide color information RCDATA by controlling the color pixels included in the pixel array px_c, and the depth pixel selection circuits ZROW and ZCOL and the depth pixel converter ZADC may provide depth information RZDATA by controlling the depth pixels included in the pixel array px_z.

In this way, the sensing unit 130b may include a depth sensor and a two-dimensional image sensor that are different from each other, so that the component for controlling the color pixels and the component for controlling the depth pixels may be implemented to provide color information RCDATA and depth information RZDATA, respectively.

Referring to fig. 27A, the first pixel array px_c includes color pixels R, G and B for providing image information. For example, the pixel pattern 102 including the green pixel G, the red pixel R, the blue pixel B, and the green pixel G may be repeatedly arranged in the first pixel array px_c. Each of the color pixels R, G and B (or alternatively, at least one of them) may include a photo detection region for collecting photoelectrons generated by incident visible light. A color filter may be formed over the color pixels R, G and B. For example, the red pixel R may be defined by a red filter, the green pixel G may be defined by a green filter and/or the blue pixel B may be defined by a blue filter.

Referring to fig. 27B, the second pixel array px_z includes a depth pixel Z for providing depth information. For example, the same depth pixels Z may be repeatedly arranged in the second pixel array px_z. Each (or alternatively, at least one of) the depth pixels Z may include a photo detection region for collecting photoelectrons generated by the received light RX (i.e., incident infrared light or near infrared light). An infrared light pass filter may be formed over each depth pixel Z.

The unit pixels 200a, 200B, 200C, and 200D illustrated in fig. 28A, 28B, 28C, and 28D may be color pixels including color photodiodes or depth pixels including depth photodiodes.

Referring to fig. 28A, the unit pixel 200a may include a photosensitive element such as a photodiode PD, and a readout circuit including a transfer transistor TX, a reset transistor RX, a driving transistor DX, and a selection transistor SX.

For example, the photodiode PD may include an n-type region in a p-type substrate such that the n-type region and the p-type substrate form a p-n junction diode. The photodiode PD receives incident light and generates photo-charges based on the incident light. In some example embodiments, the unit pixel 200a may include a phototransistor, a pinned photodiode, or the like instead of or in addition to the photodiode PD.

The photo-charges generated in the photodiode PD may be transferred to the floating diffusion node FD through the transfer transistor TX turned on in response to the transfer control signal TG. The driving transistor DX functions as a source follower amplifier that amplifies a signal corresponding to the charge on the floating diffusion node FD. The selection transistor SX may transfer the amplified signal to the column line COL in response to the selection signal SEL. The floating diffusion node FD may be reset by a reset transistor RX. For example, the reset transistor RX may discharge the floating diffusion node FD in response to a reset signal RS for Correlated Double Sampling (CDS).

Fig. 28A shows a unit pixel 200a of a four-transistor configuration including four transistors TX, RX, DX, and SX. The configuration of the unit pixels may be variously changed as shown in fig. 28B, 28C, and 28D. Power is supplied via the voltage supply terminal VDD and ground.

Referring to fig. 28B, the unit pixel 200B may have a three-transistor configuration including a photosensitive element such as a photodiode PD and a readout circuit including a reset transistor RX, a driving transistor DX, and a selection transistor SX. In comparison with the unit pixel 200a of fig. 28A, the transfer transistor TX is omitted in the unit pixel 200B of fig. 28B.

Referring to fig. 28C, the unit pixel 200C may have a five-transistor configuration including a photosensitive element such as a photodiode PD and a readout circuit including a transfer transistor TX, a gate transistor GX, a reset transistor RX, a drive transistor DX, and a selection transistor SX. The gate transistor GX may selectively apply a transfer control signal TG to the transfer transistor TX in response to a selection signal SEL. The unit pixel 200C of fig. 28C further includes a gate transistor GX, as compared with the unit pixel 200a of fig. 28A.

Referring to fig. 28D, the unit pixel 200D may have a five-transistor configuration including a photosensitive element such as a photodiode PD and a readout circuit including a phototransistor PX, a transfer transistor TX, a reset transistor RX, a driving transistor DX, and a selection transistor SX. The phototransistor PX may be turned on or off in response to the photogate signal PG. The unit pixel 200d may be enabled when the phototransistor PX is turned on and disabled when the phototransistor PX is turned off. In comparison with the unit pixel 200a of fig. 28A, a phototransistor PX is also included in the unit pixel 200D of fig. 28D. In addition, the unit pixel may have a six-transistor configuration including the gate transistor GX (or bias transistor) of fig. 28C in addition to the configuration of fig. 28D.

Referring to fig. 29, the pixel array px_z includes a plurality of depth pixels Z1, Z2, Z3, and Z4. Depth pixels Z1, Z2, Z3, and Z4 may be time-of-flight (TOF) depth pixels that operate in response to a plurality of demodulation signals having different phases from one another. For example, the depth pixel Z1 may operate in response to a demodulation signal having a phase difference of 0 degrees with respect to the transmitted light TX radiated from the image photographing device 100 of fig. 4. In other words, the depth pixel Z1 may operate in response to a demodulation signal having the same phase as the transmitted light TX. The depth pixel Z2 may operate in response to a demodulation signal having a phase difference of 90 degrees with respect to the transmitted light TX, the depth pixel Z3 may operate in response to a demodulation signal having a phase difference of 180 degrees with respect to the transmitted light TX, and the depth pixel Z4 may operate in response to a demodulation signal having a phase difference of 270 degrees with respect to the transmitted light TX. For example, the pixel pattern 103 including the depth pixels Z1, Z2, Z3, and Z4 respectively operating in response to demodulation signals of different phases may be repeatedly arranged in the pixel array px_z.

Fig. 30 is a circuit diagram illustrating an example embodiment of a depth pixel included in the pixel array of fig. 29. Fig. 30 shows one pixel pattern 103 in the pixel array px_z of fig. 29.

Compared with the unit pixels of the single tap structure in fig. 28A, 28B, 28C, and 28D, the first to fourth pixels Z1, Z2, Z3, and Z4 in fig. 30 have a two tap structure for measuring distances according to the TOF scheme.

Referring to fig. 30, the first pixel Z1 and the third pixel Z3 may share a photosensitive element such as a photodiode PD. The first pixel Z1 may include a first readout circuit including a first transfer transistor TX1, a first reset transistor RX1, a first driving transistor DX1, and a first selection transistor SX1. The third pixel Z3 may include a third readout circuit including a third transfer transistor TX3, a third reset transistor RX3, a third driving transistor DX3, and a third selection transistor SX3. In the same manner, the second pixel Z2 and the fourth pixel Z4 may share a photosensitive element such as a photodiode PD. The second pixel Z2 may include a second readout circuit including a second transfer transistor TX2, a second reset transistor RX2, a second driving transistor DX2, and a second selection transistor SX2. The fourth pixel Z4 may include a fourth readout circuit including a fourth transfer transistor TX4, a fourth reset transistor RX4, a fourth drive transistor DX4, and a fourth select transistor SX4.

For example, the photodiode PD may include an n-type region in a p-type substrate such that the n-type region and the p-type substrate form a p-n junction diode. The photodiode PD receives incident light and generates photo-charges based on the incident light. In some example embodiments, the unit pixel 200e shown in fig. 30 may include a phototransistor, a pinned photodiode, or the like instead of the photodiode PD, or the unit pixel 200e shown in fig. 30 may include a phototransistor, a pinned photodiode, or the like in addition to the photodiode PD.

The photo charges generated in the photodiode PD may be transferred to the floating diffusion nodes FD1, FD2, FD3, and FD4 through the transfer transistors TX1, TX2, TX3, and TX4, respectively. The transfer control signals TG1, TG2, TG3, and TG4 may be demodulation signals whose phase differences with respect to the transmitted light TX are 0 degrees, 90 degrees, 180 degrees, and 270 degrees, respectively, as described above. In this way, the photo-charges generated in the photodiode PD may be split in response to the demodulation signals TG1, TG2, TG3, and TG4 to determine the round trip TOF of light, and the distance to the object may be calculated based on the round trip TOF.

The driving transistors DX1, DX2, DX3, and DX4 function as source follower amplifiers that amplify signals corresponding to the respective charges on the floating diffusion nodes FD1, FD2, FD3, and FD4. The selection transistors SX1, SX2, SX3, and SX4 may transfer the amplified signals to the column lines COL1 and COL2 in response to the selection signals SEL1, SEL2, SEL3, and SEL4, respectively. The floating diffusion nodes FD1, FD2, FD3, and FD4 may be reset by reset transistors RX1, RX2, RX3, and RX4, respectively. For example, for Correlated Double Sampling (CDS), the reset transistors RX1, RX2, RX3, and RX4 may discharge the floating diffusion nodes FD1, FD2, FD3, and FD4 in response to the reset signals RS 1, RS2, RS3, and RS4, respectively.

Fig. 30 shows a non-limiting example of a two-tap configured depth pixel, which may have various configurations such as a single tap configuration, a four tap configuration, and the like. The timing of the control signals may be appropriately determined according to the configuration of the depth pixels.

Referring to fig. 31, during the integration time interval TINT, the object is illuminated by modulating the transmitted light TX. As described with reference to fig. 23, the image photographing device 100 may include a light source 110 or a light emitting device to generate modulated transmitted light TX having periodically varying intensity. For example, the image photographing device 100 may repeatedly modulate the transmission and non-transmission of the transmitted light TX by turning on or off the light emitting device in a frequency range from about 10MHz to about 200 MHz. Even though fig. 31 shows the modulated transmitted light TX of the pulse train, any periodic optical signal such as a sinusoidal signal may be used as the modulated transmitted light TX and the demodulation signals TG1, TG2, TG3, and TG4.

The modulated transmission light TX is reflected by the subject and returns to the image capturing apparatus 100 as reception light RX. The received light RX is delayed by a time of flight (TOF) relative to the modulated transmitted light TX. Photo-charges are generated in the photo-detection region of the depth pixel by receiving light RX.

The demodulation signals TG1, TG2, TG3, and TG4 may have a given, desired, or predetermined phase with respect to the modulated transmission light TX. If the photo-charges Q1, Q2, Q3, and Q4 integrated in the activation intervals of the demodulation signals TG1, TG2, TG3, and TG4 are obtained, TOF can be calculated based on the photo-charges Q1, Q2, Q3, and Q4.

When the distance from the photo-sensing device to the object is 'D' and the speed of light is 'c', the distance may be calculated using the relationship d= (tof×c)/2. Even though fig. 31 shows four demodulation signals TG1, TG2, TG3, and TG4 of different phases, different combinations of demodulation signals can be used to obtain TOF. For example, the image photographing device 100 may use only the first demodulation signal TG1 having a phase equal to the phase of the modulated transmission light TX and the third demodulation signal TG3 having a phase opposite to the phase of the modulated transmission light TX. Even though not shown in fig. 31, the photo detection region PD and the floating diffusion region FD may be initialized by activating the reset signal RS or the like before the integration time interval TINT.

During the read-out time interval TRD, data bits D1, D2, D3 and D4 corresponding to the integrated photo-charges Q1, Q2, Q3 and Q4 are provided through the column lines COL1 and COL 2.

Referring to fig. 32, the camera 800 includes a light receiving lens 810, a three-dimensional image sensor 900, and an engine unit 840. The three-dimensional image sensor 900 may include a three-dimensional image sensor chip 820 and a light source module 830. According to some example embodiments, the three-dimensional image sensor chip 820 and the light source module 830 may be implemented by separate devices, or at least a portion of the light source module 830 may be included in the three-dimensional image sensor chip 820. In some example embodiments, the light receiving lens 810 may be included in the three-dimensional image sensor chip 820.

The light receiving lens 810 may focus incident light on a light receiving region (e.g., depth pixels and/or color pixels included in a pixel array) of the three-dimensional image sensor chip 820. The three-dimensional image sensor chip 820 may generate DATA1 including depth information and/or color image information based on incident light passing through the light receiving lens 810. For example, the DATA1 generated by the three-dimensional image sensor chip 820 may include depth DATA generated using infrared light or near infrared light emitted from the light source module 830 and red, green, blue (RGB) DATA of a bayer pattern generated using external visible light. The three-dimensional image sensor chip 820 may provide the DATA1 to the engine unit 840 based on the clock signal CLK. In some example embodiments, the three-dimensional image sensor chip 820 may be via a mobile industrial processor interface And/or Camera Serial Interface (CSI) and engine unit 84Phase 0 interface.

The engine unit 840 controls the three-dimensional image sensor 900. The engine unit 840 may process the DATA1 received from the three-dimensional image sensor chip 820. According to some example embodiments, to perform the above-described method of identifying motion, the engine unit 840 may include a motion zone tracker and/or a motion analyzer. In addition to motion recognition, the engine unit may also perform data processing. For example, the engine unit 840 may generate three-dimensional color DATA based on the DATA1 received from the three-dimensional image sensor chip 820. In other examples, the engine unit 840 may generate luminance, chrominance (YUV) DATA including the luminance component Y, the blue luminance difference component U, and the red luminance difference component V based on RGB DATA included in the DATA1 or compressed DATA such as Joint Photographic Experts Group (JPEG) DATA. The engine unit 840 may be connected to the host/application 850, and may provide the DATA2 to the host/application 850 based on the master clock MCLK. Furthermore, engine unit 840 may be configured via a Serial Peripheral Interface (SPI) and/or an inter-integrated circuit (I ² C) Interfacing with host/application 850.

With reference to fig. 33, a computing system 2000 may include a processor 2010, a memory device 2020, a storage device 2030, an input/output device 2040, a power supply 2050, and a three-dimensional image sensor 900. Although not shown in fig. 33, computing system 2000 may also include ports to communicate with video cards, sound cards, memory cards, universal Serial Bus (USB) devices, and/or other electronic devices.

Processor 2010 may perform various computations or tasks. According to some example embodiments, the processor 2010 may be a microprocessor or Central Processing Unit (CPU). The processor 2010 may communicate with the memory device 2020, the storage device 2030 and the input/output device 2040 via an address bus, a control bus and/or a data bus. In some example embodiments, processor 2010 may be coupled to an expansion bus, such as a Peripheral Component Interconnect (PCI) bus. The memory device 2020 may store data for operating the computing system 2000. For example, memory device 2020 may be implemented with a Dynamic Random Access Memory (DRAM) device, a mobile DRAM device, a Static Random Access Memory (SRAM) device, a Phase Random Access Memory (PRAM) device, a Ferroelectric Random Access Memory (FRAM) device, a Resistive Random Access Memory (RRAM) device, and/or a Magnetic Random Access Memory (MRAM) device. The storage 2030 may include a Solid State Drive (SSD), a Hard Disk Drive (HDD), a compact disk read only memory (CD-ROM), and so on. The input/output devices 2040 may include input devices (e.g., keyboard, keypad, mouse, etc.) and output devices (e.g., printer, 3D printer, display device, etc.). The power supply 2050 supplies operating voltages to the computing system 1000.

The three-dimensional image sensor 900 may be in communication with the processor 2010 via a bus or other communication line. The three-dimensional image sensor 900 may be integrated with the processor 2010 in one chip, or the three-dimensional image sensor 900 and the processor 2010 may be implemented as separate chips.

The computing system 2000 may be any computing system that utilizes a three-dimensional image sensor. For example, computing system 1000 may include a digital camera, a mobile phone, a smart phone, a Portable Multimedia Player (PMP), a Personal Digital Assistant (PDA), and so forth.

As described above, the method and data processing apparatus according to some example embodiments may generate exact or near exact pose information and reconstruct an exhaustive three-dimensional model by determining a registration order based on color data and sequentially generating pose information of an input image using the registration images according to the registration order. Additionally, methods and data processing apparatus according to some example embodiments may estimate exact or near exact pose information relative to both large and small motions by utilizing both color data and depth data. Further, the method and data processing apparatus according to some example embodiments may conveniently supplement the absent image and remove the noise image by determining the registration order based on an image chart representing the correlation between the input images. The reconstructed three-dimensional model may be displayed on a display device of input/output device 2040 or on display device 1112. The three-dimensional model may also be printed by a 3D printer of the input/output device 2040, or a two-dimensional rendering of the three-dimensional model may be printed by a printer of the input/output device 2040. The three-dimensional model may be utilized in a variety of ways, such as inserting the three-dimensional model into a video game or in a virtual or augmented reality environment. The three-dimensional model may also be used to create instructions to generate a mold for generating a copy of the modeled object. The mold may be created by engraving the mold using a computer-operated machine or by 3D printing the mold.

Some example embodiments may be applied to any electronic device and system. For example, the inventive concept may be applied to systems such as memory cards, solid State Drives (SSDs), embedded multimedia cards (eMMC), universal flash memory (UFS), mobile phones, smart phones, personal Digital Assistants (PDAs), portable Multimedia Players (PMPs), digital cameras, video cameras, personal Computers (PCs), server computers, workstations, notebook computers, digital TVs, set-top boxes, portable game consoles, navigation systems, wearable devices, internet of things (IoT) devices, internet of everything (IoE) devices, electronic books, virtual Reality (VR) devices, augmented Reality (AR) devices, three-dimensional scanners, three-dimensional printers, motion tracking devices, and the like.

Any of the elements and/or functional blocks disclosed above may include or be implemented in the following: processing circuitry such as hardware including logic circuitry; a hardware/software combination such as a processor executing software; or a combination thereof. For example, the controller 10, the timing control unit 150, the image signal processor 1166, the general-purpose processor 1161, and the processor 2010 may be implemented as processing circuits. The processing circuitry may include, but is not limited to, in particular, a Central Processing Unit (CPU), an Arithmetic Logic Unit (ALU), a digital signal processor, a microcomputer, a Field Programmable Gate Array (FPGA), a system on a chip (SoC), a programmable logic unit, a microprocessor, an Application Specific Integrated Circuit (ASIC), etc. The processing circuitry may include electrical components such as at least one of transistors, resistors, capacitors, and the like. The processing circuitry may include electrical components such as logic gates including at least one of an and gate, an or gate, a nand gate, a nor gate, and the like.

The foregoing is illustrative of some example embodiments and is not to be construed as limiting thereof. Although a few example embodiments have been described, those skilled in the art will readily appreciate that many modifications are possible in the detailed example embodiments without materially departing from the novel teachings.

Claims

1. A method of generating a three-dimensional model, comprising the steps of:

obtaining, by a data processing apparatus, a plurality of input images such that each of the plurality of input images includes color data and depth data of a target;

generating, by the data processing apparatus, an image graph based on the color data of the plurality of input images such that the image graph includes correlation values between the plurality of input images;

determining, by the data processing apparatus, a registration order of the plurality of input images based on the image chart;

generating, by the data processing apparatus, pose information of each input image in order with respect to the plurality of input images based on a registration order of the plurality of input images and depth data;

registering, by the data processing apparatus, the plurality of input images as registered images in order such that each registered image includes the pose information; and

Reconstructing, by the data processing apparatus, the three-dimensional model of the object based on the registered image.

2. The method of claim 1, wherein generating the image chart comprises:

extracting two-dimensional feature points included in each input image based on the color data of each input image;

generating matching information indicating a mapping relationship of the two-dimensional feature points included in different input images among the plurality of input images; and

the correlation value is determined based on the matching information.

3. The method of claim 2, wherein determining the correlation value comprises:

the number of matching pairs of the two-dimensional feature points included in two input images is determined as the correlation value between the two input images among the correlation values.

4. The method of claim 1, wherein determining the registration order comprises:

two input images corresponding to the maximum value or the highest correlation value among the correlation values to be registered first are determined.

5. The method of claim 1, wherein determining the registration order comprises:

determining a sum of correlation values between each input image that has not been registered and the registered input image; and

An input image corresponding to the maximum value or the highest value of the sums among input images that have not been registered is determined as a current input image to be registered next to the input image that has been registered.

6. The method of claim 1, wherein generating the pose information in order comprises:

generating virtual depth data based on the registered depth data of the input image; and

pose information of a current input image to be registered next to the registered input image is generated based on the virtual depth data and depth data of the current input image.

7. The method of claim 6, wherein generating the virtual depth data comprises:

converting the registered input image into a three-dimensional image with respect to the same coordinate system; and

and generating a depth value per pixel of the virtual depth data based on an average value of depth values respectively corresponding to the three-dimensional images.

8. The method of claim 7, wherein the average of the depth values is obtained by arithmetic averaging.

9. The method of claim 7, wherein the average value of depth values is obtained by using a weighted average of weighted values corresponding to correlation values between the current input image and registered input images.

10. The method of claim 6, wherein generating pose information of the current input image to be registered next comprises:

determining a most relevant input image of the registered input images based on the image chart such that a correlation value between the most relevant input image of the correlation values and the current input image is the largest among the registered input images; and

initial pose information of the current input image is determined based on depth data of the most relevant input image and depth data of the current input image.

11. The method of claim 10, wherein generating pose information for the current input image further comprises:

correcting the initial pose information based on the virtual depth data and the depth data of the current input image to generate pose information of the current input image.

12. The method of claim 10, wherein generating the virtual depth data comprises:

converting the registered input image into a three-dimensional image with respect to the same coordinate system corresponding to the initial pose information; and

13. The method of claim 6, wherein generating the pose information of the current input image comprises:

an iterative closest point algorithm is performed based on the virtual depth data and the depth data of the current input image.

14. The method of claim 6, wherein generating the virtual depth data comprises:

determining an uncorrelated input image of the registered input images based on the image graph such that a correlation value between the uncorrelated input image of the correlation values and the current input image is less than a threshold value; and

the uncorrelated input image is excluded from the registered input images in the step of generating the virtual depth data.

15. The method of claim 1, further comprising the step of:

determining a noise input image of the plurality of input images based on the image graph such that a correlation value between the noise input image and other input images is less than a threshold; and

the noisy input image is discarded.

16. The method of claim 1, wherein reconstructing the three-dimensional model comprises:

generating a three-dimensional point cloud based on the registered image; and

Reconstructing the three-dimensional model based on the three-dimensional point cloud.

17. The method of claim 16, wherein,

the step of generating the three-dimensional point cloud comprises the following steps: improving the three-dimensional point cloud by performing three-dimensional bundle adjustment with respect to the registered image, and

the step of reconstructing the three-dimensional model based on the three-dimensional point cloud comprises: reconstructing a surface of the three-dimensional model using a truncated symbol distance function.

18. A method of generating a three-dimensional model, comprising the steps of:

extracting, by the data processing apparatus, two-dimensional feature points included in each input image based on the color data of each input image;

generating, by the data processing apparatus, an image graph based on the matching information of the two-dimensional feature points such that the image graph includes correlation values between the plurality of input images;

generating, by the data processing apparatus, virtual depth data based on the registered depth data of the input image;

Generating, by the data processing apparatus, pose information of a current input image to be registered next to the registered input image based on the virtual depth data and the depth data of the registered input image;

19. The method of claim 18, wherein generating pose information of the current input image to be registered next comprises:

determining a most relevant input image of the registered input images based on the image chart such that a correlation value between the most relevant input image of the correlation values and the current input image is the largest among the registered input images;

determining initial pose information of the current input image based on the depth data of the most relevant input image and the depth data of the current input image; and

20. A data processing apparatus comprising:

processing circuitry configured to:

receiving a plurality of input images, such that each input image of the plurality of input images includes color data and depth data of a target,

generating an image graph based on the color data of the plurality of input images, such that the image graph includes correlation values between the plurality of input images,

determining a registration order of the plurality of input images based on the image chart,

generating pose information of each input image in order with respect to the plurality of input images based on a registration order of the plurality of input images and the depth data,

registering the plurality of input images in order as registered images such that each registered image includes the pose information; and

reconstructing a three-dimensional model of the object based on the registered image.