US20110211749A1 - System And Method For Processing Video Using Depth Sensor Information - Google Patents
System And Method For Processing Video Using Depth Sensor Information Download PDFInfo
- Publication number
- US20110211749A1 US20110211749A1 US12/714,514 US71451410A US2011211749A1 US 20110211749 A1 US20110211749 A1 US 20110211749A1 US 71451410 A US71451410 A US 71451410A US 2011211749 A1 US2011211749 A1 US 2011211749A1
- Authority
- US
- United States
- Prior art keywords
- image
- depth
- matte
- bin
- method recited
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/174—Segmentation; Edge detection involving the use of two or more images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20028—Bilateral filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20036—Morphological image processing
Definitions
- Video conferencing in informal settings is becoming increasingly common. Unlike formal video conference settings which typically have carefully chosen backdrops, informal settings often have visually cluttered or very different backgrounds. These backgrounds can be a distraction that degrades the user experience. It is desirable to replace these undesirable backdrops with a common esthetically pleasing background.
- Background subtraction is the problem of delineating foreground objects in the view of a camera so that the background can be modified, replaced or removed.
- Some methods for background subtraction use depth data from a depth camera to distinguish between background and foreground.
- One method uses a two step process, to segregate collected video into foreground and background information. First, a trimap is produced using only data that has a high probability of being background or foreground information. Second, pixels that do not have a high probability of being background or foreground information are filtered using a bilateral filter to generate an estimate of the alpha-matte. Because many of the computations in this process are performed on the high resolution color image domain, the video processing computational load is high and video processing may not run in real time.
- FIG. 1 shows a flow diagram of the method of image processing a video image using depth sensor information according to an embodiment of the present invention.
- FIG. 2 shows an image of a scene typically captured by an image capture system with depth sensor according to an embodiment of the present invention.
- FIG. 3 shows the depth sensor data after registration to the visible image shown in FIG. 2 and after the thresholding step according to one embodiment of the invention.
- FIG. 4 shows the image in FIG. 3 after the application of a morphological operation according to an embodiment of the invention.
- FIG. 5 shows image of FIG. 4 after the application of a temporal filtering step according to an embodiment of the invention.
- FIG. 6 shows the image of FIG. 2 after the temporally filtered matte shown in FIG. 5 is applied to remove the background shown in FIG. 2 according to an embodiment of the invention.
- FIG. 7 shows the matte image of FIG. 6 after it is superimposed onto a grayscale image according to an embodiment of the invention.
- FIG. 8 shows the matte image of FIG. 5 after application of cross bilateral filtering according to one embodiment of the invention.
- FIG. 9 is the image resulting after application of the method for image processing shown in FIG. 1 and described in the present invention.
- FIG. 10 is computer system for implementing method according to FIG. 1 in one embodiment of the invention.
- a depth sensor produces a 2D array of pixels where each pixel corresponds to the distance from the camera to an opaque object in the scene. Depth sensor information can be useful in distinguishing the background from the foreground in images, and thus is useful in background subtraction methods that can be used to remove distracting background from video images.
- FIG. 1 shows a flow diagram of the method of image processing a video image using depth sensor information according to an embodiment of the present invention.
- FIG. 2 shows an example of an image that would captured by an image capture system with depth sensor according to an embodiment of the present invention. According to the present invention, the method of FIG. 1 would be applied to the image of FIG. 2 in one embodiment, to produce the resultant images shown in FIGS. 3-9 .
- FIG. 1 includes the steps of: creating a registered depth map that registers depth pixels in a depth coordinate system to image pixels in an image coordinate system; and applying a threshold to each bin of the registered depth map to produce a threshold image.
- FIG. 1 shows the step of creating a registered depth map that registers the low resolution depth pixels to the high resolution color image pixels (step 110 ).
- the registered depth map is created according to the following steps: mapping each depth pixel from the depth sensor coordinate system to the image coordinate system, dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels; and adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds. After all of the depth measurements are binned, each bin contains zero, one or several depth measurement corresponding to that portion of the image area. For each depth pixel bin, the average depth value is computed for that bin.
- Both image sensor data and depth sensor data are captured by the video. As previously stated because the two sensors are not co-located, we are essentially capturing data from two different points and thus two different coordinate systems. Each image pixel captured corresponds to a pixel in an image coordinate system. Similarly, each depth pixel captured corresponds to a pixel in a depth coordinate system. Because the depth resolution is lower than the image resolution, each depth measurement corresponds to a number of image pixels. A depth measurement is roughly the average of all of the depth values of all of the corresponding image pixels. A first step in creating a registered depth map is to map each depth pixel from the depth pixel coordinate system to an image pixel in the image coordinate system.
- Mapping ensures that when we talk about a point in the video—we are referring to the same point (this image pixel that has a corresponding depth pixel on the same coordinate system).
- Camera calibration allows us to determine how the geometry of the depth sensor and image camera are related.
- the calibration, plus the depth recorded for a depth pixel, allows us to identify the 3D point in the scene corresponding to the depth pixel.
- the calibration then allows us to map the 3D point into an image pixel. It is through this process that the depth pixels are mapped to image pixels.
- depth sensor Another difference between depth sensor and image sensor data (besides the original coordinate systems) is resolution.
- the resolution in depth sensors has not reached the resolution levels available in video camera systems.
- a depth camera typically has a resolution on the order of 160 ⁇ 120 pixels while the resolution of an RGB image captured by video is typically on the order of 1024 ⁇ 768 pixels. This is unfortunate, since ideally we would like to know the depth at every pixel. Instead a block of RGB pixels is associated with a depth pixel.
- the last step in the creation of the registered depth map is computing a single average depth value for the depth values found in each bin.
- the number of pixel values associated with a particular bin varies.
- the value of the pixel is computed by finding the average depth value for the pixels in each bin. In the case where there is just one depth pixel, the average is just the value of that single depth value.
- a threshold value is applied to the single computed depth value in the bin (step 120 ).
- the threshold is used to determine which depth values in the image are in the foreground and which depth values are in the background. In one embodiment, a value of 1 is assigned if the depth value is below the threshold and a value of zero is assigned if the depth value is equal to or greater than the threshold value.
- the threshold is manually set. For example, if it is known that the person in the video is sitting in front of a desktop computer screen in a video conference, the threshold might be determined and manually set based on a likely distance that a person would be sitting from the computer screen. Alternatively, the threshold value might be automatically determined using face detection or histogram analysis. For a video conferencing system, detection of a face would indicate that the face of the person would be the depth of the foreground. Similarly, for a desktop to desktop video conference, using a histogram should lead to a distribution of peaks-one peak for where the person is sitting (the foreground), the other for indicating the background location.
- the denoising operator is a sequence of one or more morphological operators that is applied to the thresholded image to produce the coarse matte (step 130 ) shown in FIG. 4 .
- the thresholded image result is an extremely noisy binary mask. Morphological operators are used to minimize the noise, producing the result shown in FIG. 4 . It is important to note that we can do this efficiently because we are operating in low resolution.
- Temporal filtering is used primarily to minimize flickering along the boundary between the foreground and background.
- a temporal exponential filter is applied for each time step t.
- the function describing the filtering is:
- Matte( t ) beta ⁇ coarse matte( t )+(1 ⁇ beta) ⁇ matte( t ⁇ 1).
- Matte can generally be thought of as a reflection of the confidence level as to whether a pixel is in the foreground or background.
- Beta is some value between 0 and 1.
- the value of beta can be varied to control the amount of temporal filtering, possibly based on observed motion.
- temporal filtering is applied adaptively, using a small window when the matte is changing rapidly and using a long window when the matte is stationary. This reduces the appearance of latency between the matte and the RGB image while producing pleasing, low flicker (or flicker free) mattes.
- FIG. 6 shows the image of FIG. 2 after the temporally filtered matte shown in FIG. 5 is applied to remove the background shown in FIG. 2 .
- this temporally filtered matte can be used for background subtraction, it produces jagged boundaries as is shown in FIG. 6 .
- the matte shown in FIG. 6 can be additionally enhanced by applying additional image processing such as face detection or hair color detection to improve the results.
- the temporally filtered matte is upsampled (step 150 ).
- the resultant image has the same resolution as the high resolution image.
- upsampling could occur at an earlier point in the process described in FIG. 1 , (for example after the threshold step 120 , the morphological operation step 130 , the temporal filter step 140 ), applying the upsampling step would make the process less efficient computationally.
- FIG. 7 shows the upsampled matte superimposed on a high resolution image.
- an edge preserving filter is applied (step 160 ). Filtering removes the jagged edges that can be seen in the matte shown in FIG. 6 .
- the edge preserving feature of the new filter forces the new smoother edge to follow the foreground/background edge that is visible in the image shown in FIG. 2 .
- the edge preserving filter is a cross bilateral filter.
- the cross bilateral filter is applied using the intensity image as the range image. This produces the high quality matte image shown in FIG. 8 .
- the edge preserved matte image shown in FIG. 8 can be used to perform background subtraction (step 170 ). Performing background subtraction using this image results in the image shown in FIG. 9 .
- the method 100 may be embodied by a computer program, which may exist in a variety of forms both active and inactive.
- a computer program may exist in a variety of forms both active and inactive.
- it exist as software program(s) comprised of programs instructions in source code, object code, executable code or other formats.
- Certain processes and operation of various embodiments of the present invention are realized, in one embodiment, as a series of instructions (e.g. software program) that reside within computer readable storage memory of a computer system and are executed by the processor of the computer system. When executed, the instructions cause the computer system to implement the functionality of the various embodiments of the present invention.
- Any of the above can be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form.
- the computer readable storage medium can be any kind of memory that instructions can be stored on. Examples of the computer readable storage medium include but are not limited to a disk, a compact disk (CD), a digital versatile device (DVD), read only memory (ROM), flash, and so on.
- Exemplary computer readable storage signals, whether modulated using a carrier or not, are signals that a computer system hosting or running the computer program can be configured to access, including signals downloaded through the Internet or other networks. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself, as an abstract entity, is a computer readable medium. The same is true of computer networks in general. It is therefore understood that any electronic device capable of executing the above-described functions may perform those functions enumerated above.
- FIG. 10 illustrates a computer system, which may be employed to perform various functions described herein, according to one embodiment of the present invention.
- FIG. 10 illustrates a computer system 1000 , which may be employed to perform various functions of the asset location system, described herein above, according to an example.
- the computer system 1000 may be used as a platform for executing one or more of the functions described hereinabove.
- the computer system 1000 includes a microprocessor 1002 that may be used to execute some or all of the steps described in the methods shown in FIG. 1 . Commands and data from the processor 1002 are communicated over a communication bus 1004 .
- the computer system 1000 also includes a main memory 1006 , a secondary memory, such as a random access memory (RAM), where the program code for, for instance, may be executed during runtime.
- the secondary memory 1008 includes for example, one or more hard disk drives 1010 and/or a removable storage drive 1012 , representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc., where a copy of the program code for tracking tags may be stored.
- the removable storage drive 1010 may read from and/or write to a removable storage unit 1014 .
- User input and output devices may include, for instance, a keyboard 1016 , a mouse 1018 , and a display 1020 .
- a display adaptor 1022 may interface with the communication bus 1004 and the display 1020 and may receive display data from the processor 1002 and covert the display data into display commands for the display 1020 .
- the processor 1002 may communicate over a network, for instance, the Internet, LAN, etc. through a network adaptor.
- the embodiment shown in FIG. 10 is for purposes of illustration. It will be apparent to one of ordinary skill in the art that other know electronic components may be added or substituted in the computer system 1000 .
- a method for processing video comprised of both image and depth sensor information would comprise the steps of: dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels; adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds; averaging the value of the depth measurement for each bin to determine a single average value for each bin; and applying a threshold to each bin to produce a threshold image.
Abstract
A method for processing video using depth sensor information, comprising the steps of: dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels; adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds; averaging the value of the depth measurement for each bin to determine a single average value for each bin; and applying a threshold to each bin of the registered depth map to produce a threshold image.
Description
- Video conferencing in informal settings, for example in mobile or in desktop to desktop environments, is becoming increasingly common. Unlike formal video conference settings which typically have carefully chosen backdrops, informal settings often have visually cluttered or very different backgrounds. These backgrounds can be a distraction that degrades the user experience. It is desirable to replace these undesirable backdrops with a common esthetically pleasing background.
- Background subtraction (or foreground segmentation) is the problem of delineating foreground objects in the view of a camera so that the background can be modified, replaced or removed. Some methods for background subtraction use depth data from a depth camera to distinguish between background and foreground. One method uses a two step process, to segregate collected video into foreground and background information. First, a trimap is produced using only data that has a high probability of being background or foreground information. Second, pixels that do not have a high probability of being background or foreground information are filtered using a bilateral filter to generate an estimate of the alpha-matte. Because many of the computations in this process are performed on the high resolution color image domain, the video processing computational load is high and video processing may not run in real time.
- A process for providing background subtraction which is computationally efficient to meet the needs of the mobile and desktop settings is needed.
- The figures depict implementations/embodiments of the invention and not the invention itself. Some embodiments of the invention are described, by way of example, with respect to the following Figures:
-
FIG. 1 shows a flow diagram of the method of image processing a video image using depth sensor information according to an embodiment of the present invention. -
FIG. 2 shows an image of a scene typically captured by an image capture system with depth sensor according to an embodiment of the present invention. -
FIG. 3 shows the depth sensor data after registration to the visible image shown inFIG. 2 and after the thresholding step according to one embodiment of the invention. -
FIG. 4 shows the image inFIG. 3 after the application of a morphological operation according to an embodiment of the invention. -
FIG. 5 shows image ofFIG. 4 after the application of a temporal filtering step according to an embodiment of the invention. -
FIG. 6 shows the image ofFIG. 2 after the temporally filtered matte shown inFIG. 5 is applied to remove the background shown inFIG. 2 according to an embodiment of the invention. -
FIG. 7 shows the matte image ofFIG. 6 after it is superimposed onto a grayscale image according to an embodiment of the invention. -
FIG. 8 shows the matte image ofFIG. 5 after application of cross bilateral filtering according to one embodiment of the invention. -
FIG. 9 is the image resulting after application of the method for image processing shown inFIG. 1 and described in the present invention. -
FIG. 10 is computer system for implementing method according toFIG. 1 in one embodiment of the invention. - We describe an efficient method for processing video that uses a conventional video camera that includes a depth sensor. A depth sensor produces a 2D array of pixels where each pixel corresponds to the distance from the camera to an opaque object in the scene. Depth sensor information can be useful in distinguishing the background from the foreground in images, and thus is useful in background subtraction methods that can be used to remove distracting background from video images.
- Current depth sensors do not have the resolution of the image capture sensors. Currently the resolution of depth sensor output is typically at least an order of magnitude lower than those of the image sensor used in a video camera. We take advantage of the low resolution of the depth sensor data and apply many computationally intensive steps in low resolution before applying an efficient bilateral filtering operation in high resolution. The described method produces high quality video with a low computational load.
- Current depth cameras include two separate sensors: an image capture sensor and a depth sensor. Because these two sensors are not optically co-located, we need to register the depth data points to the image data points. Although one can perform the registration at the full resolution of the image, this is inefficient because the relatively low number of depth measurements must, in some form, be duplicated across the large number of image pixels. Generating a foreground segmentation from this sparse set of points is possible but is relatively computationally intensive. Instead we choose to perform this registration at the same resolution as the depth map.
- Referring to
FIG. 1 shows a flow diagram of the method of image processing a video image using depth sensor information according to an embodiment of the present invention.FIG. 2 shows an example of an image that would captured by an image capture system with depth sensor according to an embodiment of the present invention. According to the present invention, the method ofFIG. 1 would be applied to the image ofFIG. 2 in one embodiment, to produce the resultant images shown inFIGS. 3-9 . - Referring to
FIG. 1 includes the steps of: creating a registered depth map that registers depth pixels in a depth coordinate system to image pixels in an image coordinate system; and applying a threshold to each bin of the registered depth map to produce a threshold image. Referring toFIG. 1 shows the step of creating a registered depth map that registers the low resolution depth pixels to the high resolution color image pixels (step 110). In the embodiment described in the present invention, the registered depth map is created according to the following steps: mapping each depth pixel from the depth sensor coordinate system to the image coordinate system, dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels; and adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds. After all of the depth measurements are binned, each bin contains zero, one or several depth measurement corresponding to that portion of the image area. For each depth pixel bin, the average depth value is computed for that bin. - Both image sensor data and depth sensor data are captured by the video. As previously stated because the two sensors are not co-located, we are essentially capturing data from two different points and thus two different coordinate systems. Each image pixel captured corresponds to a pixel in an image coordinate system. Similarly, each depth pixel captured corresponds to a pixel in a depth coordinate system. Because the depth resolution is lower than the image resolution, each depth measurement corresponds to a number of image pixels. A depth measurement is roughly the average of all of the depth values of all of the corresponding image pixels. A first step in creating a registered depth map is to map each depth pixel from the depth pixel coordinate system to an image pixel in the image coordinate system.
- Mapping ensures that when we talk about a point in the video—we are referring to the same point (this image pixel that has a corresponding depth pixel on the same coordinate system). In one embodiment, we take depth sensor data and map it into the coordinate space of the RGB image. Camera calibration allows us to determine how the geometry of the depth sensor and image camera are related. The calibration, plus the depth recorded for a depth pixel, allows us to identify the 3D point in the scene corresponding to the depth pixel. The calibration then allows us to map the 3D point into an image pixel. It is through this process that the depth pixels are mapped to image pixels.
- Another difference between depth sensor and image sensor data (besides the original coordinate systems) is resolution. Currently, the resolution in depth sensors has not reached the resolution levels available in video camera systems. For example, a depth camera typically has a resolution on the order of 160×120 pixels while the resolution of an RGB image captured by video is typically on the order of 1024×768 pixels. This is unfortunate, since ideally we would like to know the depth at every pixel. Instead a block of RGB pixels is associated with a depth pixel.
- We map the depth pixels to the RGB image by coordinate transformation. Because this is computationally more expensive to perform computations in high resolution, we choose to remain in the lower resolution domain of the depth sensor. To do this, we divide the image into a number of bins such that the bins have the resolution of the depth sensor and each bin corresponds to a number of adjacent image pixels. Because the resolution of the depth sensor is typically less than the resolution of the RGB image sensor, a single depth pixel will typically correspond to a block of image pixels. The grouping will typically be related to the binning groups chosen.
- Typically, the last step in the creation of the registered depth map is computing a single average depth value for the depth values found in each bin. Depending on the mapping, the number of pixel values associated with a particular bin varies. In one embodiment, the value of the pixel is computed by finding the average depth value for the pixels in each bin. In the case where there is just one depth pixel, the average is just the value of that single depth value.
- After the registered depth map is created, a threshold value is applied to the single computed depth value in the bin (step 120). The threshold is used to determine which depth values in the image are in the foreground and which depth values are in the background. In one embodiment, a value of 1 is assigned if the depth value is below the threshold and a value of zero is assigned if the depth value is equal to or greater than the threshold value. After the step of creating a registered depth map for the image shown in
FIG. 2 and applying the threshold to each bin, results in the low resolution thresholded image shown inFIG. 3 . - In one embodiment, the threshold is manually set. For example, if it is known that the person in the video is sitting in front of a desktop computer screen in a video conference, the threshold might be determined and manually set based on a likely distance that a person would be sitting from the computer screen. Alternatively, the threshold value might be automatically determined using face detection or histogram analysis. For a video conferencing system, detection of a face would indicate that the face of the person would be the depth of the foreground. Similarly, for a desktop to desktop video conference, using a histogram should lead to a distribution of peaks-one peak for where the person is sitting (the foreground), the other for indicating the background location.
- After the thresholding step, a denoising operator is applied. In one embodiment, the denoising operator is a sequence of one or more morphological operators that is applied to the thresholded image to produce the coarse matte (step 130) shown in
FIG. 4 . As shown inFIG. 3 , the thresholded image result is an extremely noisy binary mask. Morphological operators are used to minimize the noise, producing the result shown inFIG. 4 . It is important to note that we can do this efficiently because we are operating in low resolution. - After application of the morphological operation, a temporal filter is applied (step 140). Temporal filtering is used primarily to minimize flickering along the boundary between the foreground and background. In one embodiment, and as shown by the function below—a temporal exponential filter is applied for each time step t. For this embodiment, the function describing the filtering is:
-
Matte(t)=beta×coarse matte(t)+(1−beta)×matte(t−1). - Matte can generally be thought of as a reflection of the confidence level as to whether a pixel is in the foreground or background. Beta is some value between 0 and 1. The value of beta can be varied to control the amount of temporal filtering, possibly based on observed motion. In one embodiment, temporal filtering is applied adaptively, using a small window when the matte is changing rapidly and using a long window when the matte is stationary. This reduces the appearance of latency between the matte and the RGB image while producing pleasing, low flicker (or flicker free) mattes.
- Applying exponential temporal filtering results in the matte shown in
FIG. 5 .FIG. 6 shows the image ofFIG. 2 after the temporally filtered matte shown inFIG. 5 is applied to remove the background shown inFIG. 2 . Although this temporally filtered matte can be used for background subtraction, it produces jagged boundaries as is shown inFIG. 6 . Optionally, the matte shown inFIG. 6 can be additionally enhanced by applying additional image processing such as face detection or hair color detection to improve the results. - After application of the temporal features (and application of optional enhancements), the temporally filtered matte is upsampled (step 150). When we upsample the temporally filtered matte, the resultant image has the same resolution as the high resolution image. Although in theory upsampling could occur at an earlier point in the process described in
FIG. 1 , (for example after thethreshold step 120, themorphological operation step 130, the temporal filter step 140), applying the upsampling step would make the process less efficient computationally. - Although various upsampling methods exist, in one embodiment nearest neighbor upsampling is used.
FIG. 7 shows the upsampled matte superimposed on a high resolution image. - After upsampling the matte, an edge preserving filter is applied (step 160). Filtering removes the jagged edges that can be seen in the matte shown in
FIG. 6 . The edge preserving feature of the new filter forces the new smoother edge to follow the foreground/background edge that is visible in the image shown inFIG. 2 . In one embodiment, the edge preserving filter is a cross bilateral filter. The cross bilateral filter is applied using the intensity image as the range image. This produces the high quality matte image shown inFIG. 8 . The edge preserved matte image shown inFIG. 8 can be used to perform background subtraction (step 170). Performing background subtraction using this image results in the image shown inFIG. 9 . - Some or all of the operations set forth in the method shown in
FIG. 1 may be contained as a utility, program or subprogram, in any desired computer accessible medium. In addition, themethod 100 may be embodied by a computer program, which may exist in a variety of forms both active and inactive. For example, it exist as software program(s) comprised of programs instructions in source code, object code, executable code or other formats. Certain processes and operation of various embodiments of the present invention are realized, in one embodiment, as a series of instructions (e.g. software program) that reside within computer readable storage memory of a computer system and are executed by the processor of the computer system. When executed, the instructions cause the computer system to implement the functionality of the various embodiments of the present invention. Any of the above can be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form. - The computer readable storage medium can be any kind of memory that instructions can be stored on. Examples of the computer readable storage medium include but are not limited to a disk, a compact disk (CD), a digital versatile device (DVD), read only memory (ROM), flash, and so on. Exemplary computer readable storage signals, whether modulated using a carrier or not, are signals that a computer system hosting or running the computer program can be configured to access, including signals downloaded through the Internet or other networks. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself, as an abstract entity, is a computer readable medium. The same is true of computer networks in general. It is therefore understood that any electronic device capable of executing the above-described functions may perform those functions enumerated above.
-
FIG. 10 illustrates a computer system, which may be employed to perform various functions described herein, according to one embodiment of the present invention.FIG. 10 illustrates acomputer system 1000, which may be employed to perform various functions of the asset location system, described herein above, according to an example. In this respect, thecomputer system 1000 may be used as a platform for executing one or more of the functions described hereinabove. - The
computer system 1000 includes amicroprocessor 1002 that may be used to execute some or all of the steps described in the methods shown inFIG. 1 . Commands and data from theprocessor 1002 are communicated over a communication bus 1004. Thecomputer system 1000 also includes amain memory 1006, a secondary memory, such as a random access memory (RAM), where the program code for, for instance, may be executed during runtime. Thesecondary memory 1008 includes for example, one or morehard disk drives 1010 and/or a removable storage drive 1012, representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc., where a copy of the program code for tracking tags may be stored. - The
removable storage drive 1010 may read from and/or write to aremovable storage unit 1014. User input and output devices may include, for instance, akeyboard 1016, amouse 1018, and adisplay 1020. Adisplay adaptor 1022 may interface with the communication bus 1004 and thedisplay 1020 and may receive display data from theprocessor 1002 and covert the display data into display commands for thedisplay 1020. In addition, theprocessor 1002 may communicate over a network, for instance, the Internet, LAN, etc. through a network adaptor. The embodiment shown inFIG. 10 is for purposes of illustration. It will be apparent to one of ordinary skill in the art that other know electronic components may be added or substituted in thecomputer system 1000. - The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. The foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive of or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in view of the above teachings. For example, if an image capture sensor had depth and image sensor, co-located, the coordinate transformation steps would not be required for this invention. In this case, a method for processing video comprised of both image and depth sensor information, would comprise the steps of: dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels; adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds; averaging the value of the depth measurement for each bin to determine a single average value for each bin; and applying a threshold to each bin to produce a threshold image.
- The embodiments are shown and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents:
Claims (20)
1. A method executed on a computer, for processing a video using depth sensor information comprising the steps of
creating a registered depth map that registers depth pixels in a depth coordinate system to image pixels in an image coordinate system,
wherein the registered depth map is created from video image information comprised of depth pixels corresponding to a depth coordinate system and image pixels corresponding to an image coordinate system, wherein the image coordinate system is divided into a number of bins such that each image pixel location is represented with a resolution comparable to the depth sensor; and
applying a threshold to each bin of the registered depth map to produce a threshold image.
2. The method recited in claim 1 wherein creating a registered depth map includes the steps of: mapping each depth pixel from the depth coordinate system to an image coordinate system, dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels; and adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds.
3. The method recited in step 2 wherein the step of creating a registered depth map further includes the step of: for each bin, computing the average depth measurement value for that bin.
4. The method recited in claim 1 further including the step of applying a morphological operator to the threshold image to create a rough matte image.
5. The method recited in claim 4 further including the step of applying a morphological operator to the thresholded image to create a rough matte.
6. The method recited in claim 5 further including the step of applying a temporal filter to produce a temporally filtered matte.
7. The method recited in claim 6 wherein the temporal filter is an exponential filter.
8. The method recited in claim 7 further including the step of applying a further image enhancing technique to produce an enhanced temporally filtered matte, wherein the image enhancing techniques are directed towards reducing the jaggedness of the boundary between the foreground and the background.
9. The method recited in claim 8 wherein the image enhancing technique uses face detection.
10. The method recited in claim 9 wherein the image enhancing technique uses hair color detection.
11. The method recited in claim 8 further including the step of upsampling the enhanced temporally filtered matte to create an upsampled matte.
12. The method recited in claim 6 further including the step of upsampling the temporally filtered matte to create an upsampled matte.
13. The method recited in claim 11 further including the step of applying an edge preserving filter to produce an edge preserved matte.
14. The method recited in claim 13 wherein the edge preserving filter is a cross bilateral filter.
15. The method recited in claim 14 further including the step of using the edge preserved matte to perform background subtraction.
16. A tangible computer readable storage medium having instructions for causing a computer to execute a method comprising the steps of:
creating a registered depth map that registers depth pixels in a depth coordinate system to image pixels in an image coordinate system,
wherein the registered depth map is created from video information comprised of depth pixels corresponding to a depth pixel coordinate system and image pixels corresponding to an image coordinate system, wherein the image coordinate system is divided into a number of bins such that each image pixel location is represented with a resolution comparable to the depth sensor; and
applying a threshold to each bin of the registered depth map to produce a threshold image.
17. A method, executed on a computer, for processing a video image comprised of both image and depth sensor information, comprising the steps of:
dividing the image area into a number of bins roughly equal to the depth sensor resolution, with each bin corresponding to a number of adjacent image pixels;
adding each depth measurement to the bin representing the portion of the image area to which the depth measurement corresponds;
averaging the value of the depth measurement for each bin to determine a single average value for each bin; and
applying a threshold to each bin to produce a threshold image.
18. The method recited in claim 17 further including the step of applying a morphological operator to the thresholded image to create a rough matte.
19. The method recited in claim 18 further including the step of applying a temporal filter to produce a temporally filtered matte.
20. The method recited in claim 19 further including the step of upsampling the enhanced temporally filtered matte to create an upsampled matte.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/714,514 US20110211749A1 (en) | 2010-02-28 | 2010-02-28 | System And Method For Processing Video Using Depth Sensor Information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/714,514 US20110211749A1 (en) | 2010-02-28 | 2010-02-28 | System And Method For Processing Video Using Depth Sensor Information |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110211749A1 true US20110211749A1 (en) | 2011-09-01 |
Family
ID=44505284
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/714,514 Abandoned US20110211749A1 (en) | 2010-02-28 | 2010-02-28 | System And Method For Processing Video Using Depth Sensor Information |
Country Status (1)
Country | Link |
---|---|
US (1) | US20110211749A1 (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100278450A1 (en) * | 2005-06-08 | 2010-11-04 | Mike Arthur Derrenberger | Method, Apparatus And System For Alternate Image/Video Insertion |
US20110235939A1 (en) * | 2010-03-23 | 2011-09-29 | Raytheon Company | System and Method for Enhancing Registered Images Using Edge Overlays |
US20120183238A1 (en) * | 2010-07-19 | 2012-07-19 | Carnegie Mellon University | Rapid 3D Face Reconstruction From a 2D Image and Methods Using Such Rapid 3D Face Reconstruction |
US20120182394A1 (en) * | 2011-01-19 | 2012-07-19 | Samsung Electronics Co., Ltd. | 3d image signal processing method for removing pixel noise from depth information and 3d image signal processor therefor |
US20120249468A1 (en) * | 2011-04-04 | 2012-10-04 | Microsoft Corporation | Virtual Touchpad Using a Depth Camera |
US20120281905A1 (en) * | 2011-05-05 | 2012-11-08 | Mstar Semiconductor, Inc. | Method of image processing and associated apparatus |
US20130010066A1 (en) * | 2011-07-05 | 2013-01-10 | Microsoft Corporation | Night vision |
US20130329075A1 (en) * | 2012-06-08 | 2013-12-12 | Apple Inc. | Dynamic camera mode switching |
US20140112574A1 (en) * | 2012-10-23 | 2014-04-24 | Electronics And Telecommunications Research Institute | Apparatus and method for calibrating depth image based on relationship between depth sensor and color camera |
WO2014053837A3 (en) * | 2012-10-03 | 2014-07-31 | Holition Limited | Image processing |
US20140253688A1 (en) * | 2013-03-11 | 2014-09-11 | Texas Instruments Incorporated | Time of Flight Sensor Binning |
US8917270B2 (en) | 2012-05-31 | 2014-12-23 | Microsoft Corporation | Video generation using three-dimensional hulls |
US8976224B2 (en) | 2012-10-10 | 2015-03-10 | Microsoft Technology Licensing, Llc | Controlled three-dimensional communication endpoint |
US9332218B2 (en) | 2012-05-31 | 2016-05-03 | Microsoft Technology Licensing, Llc | Perspective-correct communication window with motion parallax |
US20160247536A1 (en) * | 2013-02-20 | 2016-08-25 | Intel Corporation | Techniques for adding interactive features to videos |
US20170091957A1 (en) * | 2015-09-25 | 2017-03-30 | Logical Turn Services Inc. | Dimensional acquisition of packages |
US20170109872A1 (en) * | 2010-08-30 | 2017-04-20 | The Board Of Trustees Of The University Of Illinois | System for background subtraction with 3d camera |
WO2017152529A1 (en) * | 2016-03-09 | 2017-09-14 | 京东方科技集团股份有限公司 | Determination method and determination system for reference plane |
US9767598B2 (en) | 2012-05-31 | 2017-09-19 | Microsoft Technology Licensing, Llc | Smoothing and robust normal estimation for 3D point clouds |
US9792491B1 (en) * | 2014-03-19 | 2017-10-17 | Amazon Technologies, Inc. | Approaches for object tracking |
US20190080498A1 (en) * | 2017-09-08 | 2019-03-14 | Apple Inc. | Creating augmented reality self-portraits using machine learning |
US20190228504A1 (en) * | 2018-01-24 | 2019-07-25 | GM Global Technology Operations LLC | Method and system for generating a range image using sparse depth data |
EP3537378A1 (en) * | 2018-03-06 | 2019-09-11 | Sony Corporation | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
WO2019202511A1 (en) * | 2018-04-20 | 2019-10-24 | Sony Corporation | Object segmentation in a sequence of color image frames based on adaptive foreground mask upsampling |
US10475187B2 (en) * | 2016-03-30 | 2019-11-12 | Canon Kabushiki Kaisha | Apparatus and method for dividing image into regions |
US10515463B2 (en) * | 2018-04-20 | 2019-12-24 | Sony Corporation | Object segmentation in a sequence of color image frames by background image and background depth correction |
US10769806B2 (en) * | 2015-09-25 | 2020-09-08 | Logical Turn Services, Inc. | Dimensional acquisition of packages |
CN113128430A (en) * | 2021-04-25 | 2021-07-16 | 科大讯飞股份有限公司 | Crowd gathering detection method and device, electronic equipment and storage medium |
US11394898B2 (en) | 2017-09-08 | 2022-07-19 | Apple Inc. | Augmented reality self-portraits |
CN114862923A (en) * | 2022-07-06 | 2022-08-05 | 武汉市聚芯微电子有限责任公司 | Image registration method and device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100034457A1 (en) * | 2006-05-11 | 2010-02-11 | Tamir Berliner | Modeling of humanoid forms from depth maps |
US20100098157A1 (en) * | 2007-03-23 | 2010-04-22 | Jeong Hyu Yang | method and an apparatus for processing a video signal |
US20100310155A1 (en) * | 2007-12-20 | 2010-12-09 | Koninklijke Philips Electronics N.V. | Image encoding method for stereoscopic rendering |
US20110080336A1 (en) * | 2009-10-07 | 2011-04-07 | Microsoft Corporation | Human Tracking System |
US20110085084A1 (en) * | 2009-10-10 | 2011-04-14 | Chirag Jain | Robust spatiotemporal combining system and method for video enhancement |
US20110273529A1 (en) * | 2009-01-30 | 2011-11-10 | Thomson Licensing | Coding of depth maps |
-
2010
- 2010-02-28 US US12/714,514 patent/US20110211749A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100034457A1 (en) * | 2006-05-11 | 2010-02-11 | Tamir Berliner | Modeling of humanoid forms from depth maps |
US20100098157A1 (en) * | 2007-03-23 | 2010-04-22 | Jeong Hyu Yang | method and an apparatus for processing a video signal |
US20100310155A1 (en) * | 2007-12-20 | 2010-12-09 | Koninklijke Philips Electronics N.V. | Image encoding method for stereoscopic rendering |
US20110273529A1 (en) * | 2009-01-30 | 2011-11-10 | Thomson Licensing | Coding of depth maps |
US20110080336A1 (en) * | 2009-10-07 | 2011-04-07 | Microsoft Corporation | Human Tracking System |
US20110085084A1 (en) * | 2009-10-10 | 2011-04-14 | Chirag Jain | Robust spatiotemporal combining system and method for video enhancement |
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8768099B2 (en) * | 2005-06-08 | 2014-07-01 | Thomson Licensing | Method, apparatus and system for alternate image/video insertion |
US20100278450A1 (en) * | 2005-06-08 | 2010-11-04 | Mike Arthur Derrenberger | Method, Apparatus And System For Alternate Image/Video Insertion |
US20110235939A1 (en) * | 2010-03-23 | 2011-09-29 | Raytheon Company | System and Method for Enhancing Registered Images Using Edge Overlays |
US8457437B2 (en) * | 2010-03-23 | 2013-06-04 | Raytheon Company | System and method for enhancing registered images using edge overlays |
US20120183238A1 (en) * | 2010-07-19 | 2012-07-19 | Carnegie Mellon University | Rapid 3D Face Reconstruction From a 2D Image and Methods Using Such Rapid 3D Face Reconstruction |
US8861800B2 (en) * | 2010-07-19 | 2014-10-14 | Carnegie Mellon University | Rapid 3D face reconstruction from a 2D image and methods using such rapid 3D face reconstruction |
US9792676B2 (en) * | 2010-08-30 | 2017-10-17 | The Board Of Trustees Of The University Of Illinois | System for background subtraction with 3D camera |
US20170109872A1 (en) * | 2010-08-30 | 2017-04-20 | The Board Of Trustees Of The University Of Illinois | System for background subtraction with 3d camera |
US20120182394A1 (en) * | 2011-01-19 | 2012-07-19 | Samsung Electronics Co., Ltd. | 3d image signal processing method for removing pixel noise from depth information and 3d image signal processor therefor |
US20120249468A1 (en) * | 2011-04-04 | 2012-10-04 | Microsoft Corporation | Virtual Touchpad Using a Depth Camera |
US20120281905A1 (en) * | 2011-05-05 | 2012-11-08 | Mstar Semiconductor, Inc. | Method of image processing and associated apparatus |
US8903162B2 (en) * | 2011-05-05 | 2014-12-02 | Mstar Semiconductor, Inc. | Method and apparatus for separating an image object from an image using three-dimensional (3D) image depth |
US20130010066A1 (en) * | 2011-07-05 | 2013-01-10 | Microsoft Corporation | Night vision |
US9001190B2 (en) * | 2011-07-05 | 2015-04-07 | Microsoft Technology Licensing, Llc | Computer vision system and method using a depth sensor |
US9846960B2 (en) | 2012-05-31 | 2017-12-19 | Microsoft Technology Licensing, Llc | Automated camera array calibration |
US9836870B2 (en) | 2012-05-31 | 2017-12-05 | Microsoft Technology Licensing, Llc | Geometric proxy for a participant in an online meeting |
US10325400B2 (en) | 2012-05-31 | 2019-06-18 | Microsoft Technology Licensing, Llc | Virtual viewpoint for a participant in an online communication |
US9767598B2 (en) | 2012-05-31 | 2017-09-19 | Microsoft Technology Licensing, Llc | Smoothing and robust normal estimation for 3D point clouds |
US8917270B2 (en) | 2012-05-31 | 2014-12-23 | Microsoft Corporation | Video generation using three-dimensional hulls |
US9251623B2 (en) | 2012-05-31 | 2016-02-02 | Microsoft Technology Licensing, Llc | Glancing angle exclusion |
US9332218B2 (en) | 2012-05-31 | 2016-05-03 | Microsoft Technology Licensing, Llc | Perspective-correct communication window with motion parallax |
US9256980B2 (en) | 2012-05-31 | 2016-02-09 | Microsoft Technology Licensing, Llc | Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds |
US9756247B2 (en) | 2012-06-08 | 2017-09-05 | Apple Inc. | Dynamic camera mode switching |
US9160912B2 (en) * | 2012-06-08 | 2015-10-13 | Apple Inc. | System and method for automatic image capture control in digital imaging |
US20130329075A1 (en) * | 2012-06-08 | 2013-12-12 | Apple Inc. | Dynamic camera mode switching |
WO2014053837A3 (en) * | 2012-10-03 | 2014-07-31 | Holition Limited | Image processing |
US9552655B2 (en) | 2012-10-03 | 2017-01-24 | Holition Limited | Image processing via color replacement |
GB2506707B (en) * | 2012-10-03 | 2020-01-08 | Holition Ltd | Image processing |
US9332222B2 (en) | 2012-10-10 | 2016-05-03 | Microsoft Technology Licensing, Llc | Controlled three-dimensional communication endpoint |
US8976224B2 (en) | 2012-10-10 | 2015-03-10 | Microsoft Technology Licensing, Llc | Controlled three-dimensional communication endpoint |
US20140112574A1 (en) * | 2012-10-23 | 2014-04-24 | Electronics And Telecommunications Research Institute | Apparatus and method for calibrating depth image based on relationship between depth sensor and color camera |
US9147249B2 (en) * | 2012-10-23 | 2015-09-29 | Electronics And Telecommunications Research Institute | Apparatus and method for calibrating depth image based on relationship between depth sensor and color camera |
US20160247536A1 (en) * | 2013-02-20 | 2016-08-25 | Intel Corporation | Techniques for adding interactive features to videos |
US9922681B2 (en) * | 2013-02-20 | 2018-03-20 | Intel Corporation | Techniques for adding interactive features to videos |
US9784822B2 (en) * | 2013-03-11 | 2017-10-10 | Texas Instruments Incorporated | Time of flight sensor binning |
US20140253688A1 (en) * | 2013-03-11 | 2014-09-11 | Texas Instruments Incorporated | Time of Flight Sensor Binning |
US20160003937A1 (en) * | 2013-03-11 | 2016-01-07 | Texas Instruments Incorporated | Time of flight sensor binning |
US9134114B2 (en) * | 2013-03-11 | 2015-09-15 | Texas Instruments Incorporated | Time of flight sensor binning |
US9792491B1 (en) * | 2014-03-19 | 2017-10-17 | Amazon Technologies, Inc. | Approaches for object tracking |
US10096131B2 (en) * | 2015-09-25 | 2018-10-09 | Logical Turn Services Inc. | Dimensional acquisition of packages |
US10769806B2 (en) * | 2015-09-25 | 2020-09-08 | Logical Turn Services, Inc. | Dimensional acquisition of packages |
US20170091957A1 (en) * | 2015-09-25 | 2017-03-30 | Logical Turn Services Inc. | Dimensional acquisition of packages |
US10319104B2 (en) | 2016-03-09 | 2019-06-11 | Boe Technology Group Co., Ltd. | Method and system for determining datum plane |
WO2017152529A1 (en) * | 2016-03-09 | 2017-09-14 | 京东方科技集团股份有限公司 | Determination method and determination system for reference plane |
US10475187B2 (en) * | 2016-03-30 | 2019-11-12 | Canon Kabushiki Kaisha | Apparatus and method for dividing image into regions |
US20190080498A1 (en) * | 2017-09-08 | 2019-03-14 | Apple Inc. | Creating augmented reality self-portraits using machine learning |
US11394898B2 (en) | 2017-09-08 | 2022-07-19 | Apple Inc. | Augmented reality self-portraits |
US10839577B2 (en) * | 2017-09-08 | 2020-11-17 | Apple Inc. | Creating augmented reality self-portraits using machine learning |
US20190228504A1 (en) * | 2018-01-24 | 2019-07-25 | GM Global Technology Operations LLC | Method and system for generating a range image using sparse depth data |
US10706505B2 (en) * | 2018-01-24 | 2020-07-07 | GM Global Technology Operations LLC | Method and system for generating a range image using sparse depth data |
KR20190106698A (en) * | 2018-03-06 | 2019-09-18 | 소니 주식회사 | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
JP2019160298A (en) * | 2018-03-06 | 2019-09-19 | ソニー株式会社 | Image processing apparatus and method for object boundary stabilization in image of sequence of images |
US10643336B2 (en) | 2018-03-06 | 2020-05-05 | Sony Corporation | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
KR102169431B1 (en) * | 2018-03-06 | 2020-10-23 | 소니 주식회사 | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
CN110248085A (en) * | 2018-03-06 | 2019-09-17 | 索尼公司 | For the stabilized device and method of object bounds in the image of image sequence |
EP3537378A1 (en) * | 2018-03-06 | 2019-09-11 | Sony Corporation | Image processing apparatus and method for object boundary stabilization in an image of a sequence of images |
US10515463B2 (en) * | 2018-04-20 | 2019-12-24 | Sony Corporation | Object segmentation in a sequence of color image frames by background image and background depth correction |
US10477220B1 (en) | 2018-04-20 | 2019-11-12 | Sony Corporation | Object segmentation in a sequence of color image frames based on adaptive foreground mask upsampling |
WO2019202511A1 (en) * | 2018-04-20 | 2019-10-24 | Sony Corporation | Object segmentation in a sequence of color image frames based on adaptive foreground mask upsampling |
CN111989711A (en) * | 2018-04-20 | 2020-11-24 | 索尼公司 | Object segmentation in a sequence of color image frames based on adaptive foreground mask upsampling |
JP2021521542A (en) * | 2018-04-20 | 2021-08-26 | ソニーグループ株式会社 | Object segmentation of a series of color image frames based on adaptive foreground mask-up sampling |
CN113128430A (en) * | 2021-04-25 | 2021-07-16 | 科大讯飞股份有限公司 | Crowd gathering detection method and device, electronic equipment and storage medium |
CN114862923A (en) * | 2022-07-06 | 2022-08-05 | 武汉市聚芯微电子有限责任公司 | Image registration method and device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20110211749A1 (en) | System And Method For Processing Video Using Depth Sensor Information | |
Chen et al. | Robust image and video dehazing with visual artifact suppression via gradient residual minimization | |
Liu et al. | Single image dehazing via large sky region segmentation and multiscale opening dark channel model | |
Banterle et al. | Inverse tone mapping | |
US9311901B2 (en) | Variable blend width compositing | |
Xu et al. | Shadow removal from a single image | |
US20150302592A1 (en) | Generation of a depth map for an image | |
WO2016159884A1 (en) | Method and device for image haze removal | |
KR100846513B1 (en) | Method and apparatus for processing an image | |
EP1987491A2 (en) | Perceptual image preview | |
CN105323497A (en) | Constant bracket for high dynamic range (cHDR) operations | |
KR101051459B1 (en) | Apparatus and method for extracting edges of an image | |
JP2010525486A (en) | Image segmentation and image enhancement | |
Kim et al. | Low-light image enhancement based on maximal diffusion values | |
KR20110011356A (en) | Method and apparatus for image processing | |
CN108234826B (en) | Image processing method and device | |
Garcia et al. | Unified multi-lateral filter for real-time depth map enhancement | |
Dai et al. | Adaptive sky detection and preservation in dehazing algorithm | |
KR20140109801A (en) | Method and apparatus for enhancing quality of 3D image | |
Hui et al. | Depth enhancement using RGB-D guided filtering | |
Liu et al. | Automatic objects segmentation with RGB-D cameras | |
Tallón et al. | Upsampling and denoising of depth maps via joint-segmentation | |
WO2019200785A1 (en) | Fast hand tracking method, device, terminal, and storage medium | |
CN116563172B (en) | VR globalization online education interaction optimization enhancement method and device | |
Jyothirmai et al. | Enhancing shadow area using RGB color space |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L. P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAN, KAR HAN;CULBERTSON, W BRUCE;REEL/FRAME:025070/0562 Effective date: 20100301 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |