MXPA00005355A - Improved image conversion and encoding techniques - Google Patents

Improved image conversion and encoding techniques

Info

Publication number
MXPA00005355A
MXPA00005355A MXPA/A/2000/005355A MXPA00005355A MXPA00005355A MX PA00005355 A MXPA00005355 A MX PA00005355A MX PA00005355 A MXPA00005355 A MX PA00005355A MX PA00005355 A MXPA00005355 A MX PA00005355A
Authority
MX
Mexico
Prior art keywords
depth
depth map
contour
images
data
Prior art date
Application number
MXPA/A/2000/005355A
Other languages
Spanish (es)
Inventor
Victor Harman Philip
Original Assignee
Dynamic Digital Depth Research Pty Ltd
Victor Harman Philip
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dynamic Digital Depth Research Pty Ltd, Victor Harman Philip filed Critical Dynamic Digital Depth Research Pty Ltd
Publication of MXPA00005355A publication Critical patent/MXPA00005355A/en

Links

Abstract

A method of producing a depth map for use in the conversion of 2D images into stereoscopic images including the steps of:identifying at least one object within a 2D image;allocating the or each object with an identifying tag;allocating the or each object with a depth tag;and determining and defining an outline of each or the object.

Description

METHOD TO PRODUCE A DEPTH MAP TO BE USED IN THE CONVERSION OF IMAGES BIDIMENSIONALS IN STEREOSCOPIC IMAGES DESCRIPTION OF THE INVENTION The present invention is generally directed to stereoscopic image synthesis, and more particularly to an improved method for converting two-dimensional (2D) images for additional coding, transmission and decoding for the purpose of displaying a stereoscopic image. Applicants have previously described in PCT / AU96 / 00820, a method for producing images for the left and right eyes for a stereoscopic display of an original 2D image, which includes steps d: a. identify at least one object within an original image b. outline each object c. define a characteristic depth for each object d. respectively displace the selected areas of each object by a certain amount in a lateral direction as a function of the depth characteristic of each object, to form two enlarged images for observation by the left and right eyes of the observer.
REF .: 119300 These stages can be called individually or collectively as location of dynamic depth charts or DDC. The present invention further improves the operation of the applicant's prior system. The present invention provides, in one aspect, a method for producing a depth map for use in the conversion of 2D images into stereoscopic images including the steps of: identifying at least one object within a 2D image; assign each object with an identification tag, - assign each object with a depth label; and determine and define a contour for each of the objects. In a further aspect, the present invention provides a method for encoding a depth map for use in converting 2D images into stereoscopic images including: assigning an object identifier to an object; assign the object with a depth label; and define the outline of the object.
The outline of the object can be defined by a series of coordinates, curves and / or geometric shapes. Conveniently, the identification tag can be a unique number. In another aspect, the present invention provides the use of Bézier curves to generate an outline of an object in a 2D to 3D conversion process. In a further aspect, the present invention provides the use of curves to define an object in a 2D to 3D conversion process. In another aspect, the present invention provides the use of geometric shapes to define an outline of an object in a 2D to 3D conversion process. In another aspect, the present invention provides a method for transmitting depth map information wherein the information is included in the vertical return interval or the MPEG data stream. In a further aspect, the present invention provides the use of generic libraries to aid in the 2D to 3D conversion processes. To provide a better understanding of the present invention, reference is made to the accompanying drawings, which illustrate a preferred embodiment of the present invention. In the drawings: Figures 1 and 2 show a preferred method of converting depth map data to distortion grid. Figures 3, 4, 5 and 6 show various techniques for determining the contour of an object, as described by the present invention. • Figure 7 shows a sample distortion grid. Figure 8 shows a block diagram of a decoder hardware for an alternative decoder. Figure 9 shows a sample flow diagram of a decoding process of an alternative decoder. Figure 10 shows an example of an undistorted mesh. Figure 11 shows a sample depth map of a cone. Figure 12 shows a modified sample mesh with a depth map. Figures 13 to 16 show a method for transferring Z elevations of depth maps in X displacements. Figure 17 shows an original picture in an undistorted mesh.
Figure 18 shows a modified sample mesh with a displacement map X. Figure 19 shows a sample combination of an original frame mesh and a shift mesh. Figure 20 shows a sample resulting from an enlarged image for an alternative eye. Figure 21 shows a flow diagram of simplified displacements.
Identification of the object The objects in the 2D image that will be converted can be identified by a human operator using visual inspection. The operator will typically label each object, or group of objects, in the image using a computer mouse, a stylus, an indicator or device to assign a unique number to the object. The number can be created manually by the operator or can be generated automatically in a particular sequence by a computer. The objects can also be identified completely automatically using a computer that is iautomatic so the operator helps the computer to determine the position of an object or objects.
To automatically identify an object, the computer will use features such as object size, color, speed of movement, shading, texture, brightness, darkness, focus as well as differences between previous and current and future images. Neural networks and expert systems can also be used to aid in the identification of objects. In the semiautomatic identification of objects, an operator can provide assistance to the computer to make the computer determine the nature of the image where the objects can be found. For example, the operator can warn the computer that the scene is not a generic "News Reader" format, in which case, the computer will try to locate the head and shoulders of the news reader, the desktop and the background, etc. The operator can choose from a menu of possible generic scenes. The operator can manually override and / or correct and adjust any selection of objects made by the computer. The computer program can learn from these corrections, using neural networks or expert systems, for example, in a way that continuously improves the accuracy of object identification and numbering. Once an object has been identified and numbered, the object can then be followed either manually, automatically or semiautomatically as it moves within the image on successive frames. An operator can also use identification information produced by another operator that works either in the same sequence or from a previous conversion of similar scenes.
Contouring the object The outline of an object or objects can be determined either manually, automatically or semiautomatically. In a manual contouring, the operator can trace the outline of the object or objects using a computer mouse, an optical pen, an indicator or another device. The operator can select the outline of the object on a pixel by pixel basis, using a straight line or curved approximations, Bézier curves or the best fit of a library of curves or generic forms. The operator can also choose from a library of generic forms which in advance approximate the correct shape and scale or dynamically adjust the shape to match. For example, the operator may wish to select the outline of a man, in which case the generic outline of a man can be retrieved from a library and adjusted accordingly, either manually, semiautomatically or automatically. The operator can also select from a library of geometric shapes such as circles, ellipses, triangles, squares, etc. In automatic contouring, the computer can use characteristics such as size, color, speed of movement, shading, brightness, darkness, differences between previous and current and future images. Neural networks and expert systems can also be used to determine the contour of objects. In semiautomatic contouring, an operator can provide assistance to the computer by warning the computer about the nature of the image where the objects can be found. For example, the operator can warn the computer that the scene is of the generic "news selector" format, in which case the computer will try to locate the head in the men of the news reader, the desk and the background, etc. The operator can choose from a menu of possible generic objects. The operator can manually override and / or correct and adjust any contouring performed by the computer. The computer program can learn from these corrections, using neural networks or expert systems, for example, in a way that continuously improves the accuracy of the contouring. Once the object has been contoured, the object can then be followed either manually, automatically or semiautomatically as it moves inside the image on successive frames. An operator can also use object outline information produced by another operator that is working on either the same sequence or from a previous conversion of similar scenes. The operator can also choose from a library of previously defined contours, which can include geometric shapes such as circles, ellipses, triangles, squares, etc., either manually, semiautomatically or automatically to adjust the outline of the library so that match the selected object. The library can be indexed by individual contours, for example news readers, or it can be based on a particular family of objects, for example horse races, evening news, etc.
Depth of definition The depth of an object or objects can be determined manually, automatically or semiautomatically. The depth of the objects can be assigned using any alphanumeric, visual, audible or tactile information. In the preferred embodiment, the depth of the object is indicated by the shading of the object with a particular color. Typically this will be white for the objects that should - appear - once converted, in a 3D position closer to the observer, and black for objects that are at a greater 3D distance from the observer. Obviously, this conversion can be altered, for example invert the colors used to indicate relative or absolute depth. In another mode, a numeric value can be assigned to the depth of the object. This value can be positive or negative, in a linear or non-linear series, and contain single or multiple digits. In a preferred embodiment, this value will vary from 0 to 255, to allow the value to be encoded in a single octet, where 255 represents objects that must appear, once converted, in a 3D position closer to the observer and 0 for objects that are at a greater 3D distance from the observer. Obviously, this convention can be altered, for example inverting b another interval can be used. In the manual depth definition, the operator can assign the depth of the object or objects using a computer mouse, a stylus, an indicator or another device. The operator can assign to the object depth by placing the indicating device within the object's contour and entering a depth value. The depth can be entered by the operator as an alphanumeric or graphic numerical value and can be assigned by the operator or can be automatically assigned by the computer from a predetermined range of allowable values. The operator can also select the depth of the object of a library or allowable depth menu. The operator may also assign a range of depths within an object or a range of depth that varies with time, location of the object or movement or any combination of these factors. For example, the object may be a table that has its edge closest to the observer and its edge farthest away from the observer. When converted to 3D, the apparent depth of the table should vary along its length. In order to obtain this, the operator can divide the table into several segments and assign each segment an individual depth. Alternatively, the operator can assign a variable depth continuously within the object by shading the object so that the amount of shading represents the depth at this particular position of the table. In this example, a light shadow can represent a close object and a dark shadow, a distant object. For the example of the table, the nearest edge would be slightly shaded, and the shading would become progressively darker, until it reached the far edge. The variation of depth within an object can be linear or non-linear and can vary with time, location of the object or movement, or any combination of these factors. The variation of depth, inside an object can be in the form of a ramp. A linear ramp can have a start point (A) and an end point (B). The color is defined at point A and B. A gradient is applied from the point A to point B on the perpendicular line. A radial ramp defines a ramp similar to a linear ramp although it uses the distance from a central point (A) to a radius (B). A simple extension of the radial ramp can be taped to the outer edge, to allow a central point of variable sizing. A linear extension is the distance from a line segment as opposed to the distance from the perpendicular. In this example, the color is defined for the line segment, and the color for the "outside". The color along the line segment is defined and the color tapers are out of the "outside" color. A variety of ramps can easily be encoded. Ramps can also be based on more complex curves, equations, variable transparency, etc. In another example, an object can be moved from the front of the image to the back over a period of frames. The operator can assign a depth for the object in the first frame and a depth of the object in the last scene or subsequent scenes. The computer can then interpolate the depth of the object over successive frames in a linear or predetermined way in some other way. This process can also be completely automated, so the computer assigns variation in the depth of the object based on the change in the size of an object as it moves with respect to time. In an automatic depth definition, the computer can use characteristics such as size, color, speed of movement, shading, focus, differences between previous and current and future images. Neural networks and expert systems can also be used to determine the depth of objects. In the definition of semiautomatic depth, an operator can provide assistance to the computer by warning the computer about the nature of the image where depths are to be assigned. For example, the operator can warn the computer that the scene is a generic "news reader" format, in which case the computer will try to locate the head and shoulders of the newsreader, the desktop and the background, etc., and place these in a sequence of logical depth. The operator can choose from a menu of possible generic objects and depths. The operator can manually override and / or correct and adjust any depth decision of the object made by the computer. The computer program may depend on these corrections, using neural networks or expert systems, for example, so as to continuously improve the accuracy of depth assignment. Once an object has been assigned a specific depth, the object can be followed either manually, automatically or semiautomatically as it moves inside the image on successive frames. An operator can also use depth definitions, produced by another operator either working in the same sequence or from a previous conversion of similar scenes.
Multiple operators In order to convert a video stream in a synchronized manner, it may be necessary for many operators to work on the 2D source material. Although this can be located in the same facilities, through the use of online computer services, for example the Internet, operators can be used anywhere in the world. In such an arrangement, to ensure the safety of the source material, it may be necessary to remove the audio and modify the colors of the image. This will have no effect on the ability of operators to determine the contour of an object but will avoid plagiarism of the original source material. Since the current selection of an object contour is a relatively simple process, this can be done more cost-effectively in countries with low labor costs. When using this arrangement, the conversion procedure can conveniently be as follows: 1. A supervising operator identifies a video sequence that will be converted to 3D and numbers each frame in the sequence. 2. The supervisor applies the necessary security procedures, if necessary. 3. The supervisor identifies the object or objects in the scenes that require them to be contoured and uniquely labeled each, as previously described. 4. The video stream is then converted to an appropriate digital format and transmitted via the online service to a destination or remote destinations. For long video sequences, this may not be economical, in which case CD-ROM or other support media may be preferable. 5. The sequence is received by the remote position, where the operator or operators carry out the manipulation of the objects. 6. Since the results of the manipulation result in the contours of the objects being identified, the data for which they can be subsequently compressed, the file size will generally be substantially less than that of the original images. In this case, the object information can be conveniently returned to the supervisor using online email services. 7. The supervisor performs quality control on the received object contours and matches the frame numbers with the original video source material. 8. The supervisor then passes the object contours and the original source material to a subsequent operator who applies the necessary depth information for each object. Since the application of depth information is an artistic and creative process, it is considered desirable, though not essential, to be carried out in a central position by a small group of operators. This will also ensure consistency of the depths of objects over a long sequence.
Definition of complex depth In order to produce a more realistic 3D appearance, it is sometimes desirable to use depth definitions that are more complex than simple ramps or linear variations. This is particularly desirable for objects that have a complex internal structure with many variations in depth, for example a tree. The depth map for such objects can be produced by adding a map of surface texture highlights to the object. For example, if we consider a tree, the first step would be to make a trace around the contour of the tree and then assign the tree a depth. Then you can add a map of surface texture highlights to provide each leaf on the tree with its own individual depth. Such texture maps have been found to be useful for the present invention to add detail to relatively simple objects. However, for a fine detail, such as leaves on a tree or other complex objects, this method is not preferred as the method can further complicate the tree, or similar, as it moves through the wind or changes the angle of the camera from one frame to another. An additional and more preferred method is to use the luminance (black and white components) of the original object to create a map of necessary surface highlights. In general, the elements of the object that are closest to the observer will be clearer and those that are farther away will be darker. Therefore, by assigning a luminance value of light to the near elements of a dark luminance to the distant elements, a map of surface highlights can be created automatically. The advantage of this technique is that the object itself can be used to create its own map of surface highlights and any movement of the object from one frame to another is automatically followed. Other attributes of an object can also be used to create a map of surface highlights, and these include, but are not limited to, chrominance, saturation, color grouping, reflections, shadows, focus, sharpness, etc. The map values of surface ridges obtained from object attributes will preferably also be scaled so that the range of depth variation within the object is consistent with the overall range of depths of the overall image.
Depth maps The process of detecting objects, determining their contour and assigning depths will be referred to as the creation of depth maps. In a preferred embodiment, the depth maps may consist of gray scale images of a resolution of 80 x 60 x 8 bits to allow objects within the associated 2D images to be defined as one of the 256 individual depths.
Alternatively, the shape of the curve can be defined as a ratio of the distance between the xy and sequential coordinates and the displacement of the curve of a straight line between these points, xlyl and x2y2 located on a line A and that is linked by a curve . The curve between these points has a maximum displacement B measured from line A to the midpoint of the curve. Therefore, the curve can be defined as follows: curve = B / A which preferably will have a value of -128 to +128 where 0 indicates a straight line between the two points. It should be noted that since the value assigned to the curve is the ratio of two measurements, then the same curve value can be assigned to other curves that have the same B / A ratio.
Coding of depth maps Depth maps can be encoded in many ways. In a preferred embodiment, the object number, the depth and the contour of the object can be coded as follows. Consider the contour of a person shown in Figure 3. The person is assigned the number of object 1 with a depth of 20. The outline of the object has been determined as explained previously and in the x, y specific positions. Typically, when a change of direction of the object's contour occurs, a particular mark is made. This mark may be of an alphanumeric character, a shape, a color or other form of visual indication. Each of these brands will have an x, and specific position. In the preferred embodiment, it will be within the range of 0 to 255. Between each pair of positions x, and there will be a curve. Each curve can be determined by the selection of a library of all possible curve shapes. In the preferred embodiment, each curve will be provided with a value typically within the range of -127 to +128 to allow the curve to be defined using one octet. The curves that advance clockwise from position x, to the next position x, y can be assigned positive values while those that advance in the counterclockwise direction can be assigned negative values . Other assignments may apply.
Determination of depth threshold Adding a depth threshold to the conversion algorithm ensures that objects in the front of the threshold do not get distorted. This is done to avoid some minor distortions that may occur at the edges of foreground objects when intersecting with an object in the background. In the preferred conversion algorithm, a depth map is used to create a continuous depth map that forms the 3D profile of the final scene. When a threshold is applied to this process, the depth map is processed to detect threshold transitions, and the depth above and below the transition are processed independently. Therefore, the depth map data for this object can be defined as follows: < number of objectx depth of objectxxl, yl, curval, x2y2, curve2, ... xl, y1 > The object depth information contains the data required to generate the depth of the current object. As previously mentioned, this depth data can be a single value, a ramp (linear, radial or other), or another method to describe the depth of a single object. The following methods demonstrate possible means to encode the depth data of a single object.
The depth data can be encoded as follows, for a single depth value: < flag 1 of depth-depth depth > The depth data can be coded as follows for an object with a linear ramp as its depth value: < flag 2 of depth? xl, yl, value 1 of depth, x2, y2, value 2 of depth > where the depth of the object varies linearly from the value 1 in xl, and l to the value 2 in x2, y2. The depth data can be encoded as follows for an object with a non-linear ramp as its depth value: < flag 3 of depthxxl, yl, value 1 of depth, 2, 2 value 2 of depth, gamma > where gamma is the value that describes the non-linear variation of depth over the interval of xl, yl and x2, y2.
The depth data can be coded as follows for an object with a radial ramp as its depth value: < flag 4 of depthxxl, yl, depth value 1, radius, depth value 2 > where the object has a depth value 1 in xl, and l and the depth varies linearly or in some other way up to a depth value in all pixels of point radii away from xl, yl. It will be understood that once the depth data of the objects has been transmitted, it is not necessary to transmit the depth map again until the object moves or changes shape. Unless the change of position of the objects, then the new position of the object can be transmitted by determining a displacement to the position of the object, as follows: < object number? x displacement, and displacement > similarly, objects that change depth and not position or size can be transmitted in the following way: < object number? depth > It will also be understood that the adjacent contact objects will share x, y coordinates and that therefore there is redundancy in the x coordinates, and that it needs to be transmitted in order to uniquely define the depth maps of each object in the scene. In order to minimize the amount of additional data required to be transmitted or stored, it is desirable to compress the data comprising the depth data. The compression can use any form of data compression algorithm and many are known to those skilled in the art. Compression examples include, but are not limited to, path length coding and Huffman coding. Since objects can not be moved from one frame to another, it is only necessary to transmit the difference in the depth maps between frames. The techniques that allow the differences between frames to be measured and processed are also known to those skilled in the art. It will be appreciated that the depth map information may be included in the vertical return interval (VBI) of an analog television signal or MPEG or other digital transmission stream of a digital television signal as previously described by mesh transmission of distortion. Similarly, the depth map data can be added to the VOB file on a DVD. It is known how data can be included in VBI and the MPEG data stream, and the preferred modality is the technique currently used to include Closed Captioning and Teletext with standard television images. In another preferred embodiment, the data may be included within the user data area of the MPEG data stream. In order to include this data in the VBI or MPEG2 stream, the following calculations indicate the likely size of the data requirements. Establishing the assumption that: the VBI specification allows 32 octets / video line the maximum number of objects per image = 20 the maximum X, Y coordinates per object = 20 so that the object #, object depth, X, Y and the shape data of each requires 1 octet Then the octets / object = 1 + 1 + 3 (20) = 62 octets. Therefore, for 20 VBI data of objects = 20x62 = 1240 octets / frame It should be noted that this is the worst case and in practice a typical scene requires 200 octets / frame. This value will decrease significantly with the application of adequate data compression and taking into account redundancy, etc. Regarding including this information within an MPEG data stream, the MPEG standard allows the provision of a data stream to the receiving position. The techniques for providing the data supply within an MPEG stream can be used to supply depth map data to the receiver decoder. It is also possible to include this information in one of the sound channels of the MPEG signal. When the MPEG signal is recorded in a medium such as CD-ROM or DVD, then the information may be contained within a digital audio file, such as a separate digital or analog file, or it may be recorded on the disc in another medium. Other techniques will be apparent to those skilled in the art. It is also possible to transmit the original depth map as part of the MPEG data stream. In a preferred embodiment, the resolution of the depth map can typically be reduced to 640x480x8 pixels before noticeable errors in the depth of the objects in the resulting 3D images become visible. This resolution is the same as the DCT block size in an MPEG encoded video signal. Therefore, you can include the map depth information in the MPEG signal by adding additional information to the DCT block that defines the depth of each block when it becomes 3D. The depth map may also be included in the MPEG data stream as previously described, for example audio channel, or other methods familiar to those skilled in the art. The reduced resolution depth map can also be compressed, before inclusion in the MPEG stream, using standard image compression techniques that include, but are not limited to JPEG, MJPEG, MPEG, etc. In a further preferred embodiment, the contour of the object is defined using Bézier curves. Consider the outline of a person shown in Figure 4. Bézier curves are applied to the contour, which results in the x, y coordinates shown. The depth map for the object and therefore can be defined as: «Object number and object depth > < xl, yl, xla, yla, x2b, yxb, x2, y2, ... xlb, ylb > Bézier curves can also be generated so that they only require 3 x coordinates, and, as illustrated in Figure 5, and can be defined as follows «Obj etoxprof number of obj etoxxl, yl, xla, yla, x2, y2,,. . .x8a, y8a > This method is preferable since it requires a smaller number of elements to define the curve. In a further preferred embodiment, the outline of the object is defined using geometric shapes. Consider the outline of a person shown in Figure 5. Geometric shapes are applied to the contour, resulting in the construction shown. The circle that forms the head will have a center defined by xl, yl and a radius rl. Triangles can be described as x2a, y2a, x2b, y2b, x2c, y2c and similarly for other polygons. Each geometric shape can have the general shape < shape? parameters > The depth map for the object can therefore be defined as «Object number object depth · shape1? Parameters > ... < form? parameters > It will also be appreciated that contours and / or depth maps created using any of these methods, whether compressed or uncompressed, can be stored in any suitable format and medium, analogue or digital, with or without their associated 2D images. Storage may include, but is not limited to floppy disk, hard disk, CD-ROM, laser disk, DVD, RAM, ROM, magnetic recording tape, video tape, video cassette, etc. The stored contours and / or depth maps can be recovered at a later time and / or placed to allow the reconstruction of the depth maps for the generation of distortion meshes for the generation of 3D images or for additional adjustment and tuning fine.
Decoder It has been previously described that the distortion mesh can be used to convert a 2D image into 3D. Now it is possible to generate a necessary distortion grid from a depth map. This depth map itself is generated from traditional information transmitted within the 2D video. The generation of a distortion grating from a depth map can be carried out in real time, semi-time real or off-line, and can be carried out locally or, via any suitable transmission means, at a remote location. The generation can be implemented in programming elements (software) or physical elements (hardware). Therefore, instead of transmitting the subpixel points of the distortion mesh as part of the 2D image, the necessary information can be transmitted to recreate the depth map. The depth map can then be reconstructed in the decoder and the conversion can be carried out in a distortion grid. These conversions can be carried out either in real time, semi real time or offline in the receiving position, and can be implemented in programming elements or physical elements. The preferred method of conversion from depth data in a depth map and then to the distortion grid is as shown in a flow diagram of programming elements in figure 1 and in physical elements in figure 2. Individual elements of the process of converting programming elements work as follows: Source of image sequence - 2D movie or video or some other source of image sequence. Source of area and depth - this is the information that is sent with the image sequence and in the preferred mode it is contained in the VBI or MPEG data stream. It contains information regarding the position, shape and depth of each object. Application areas with depth to the depth map - return an object to the "area" within the object that is filled / shaded according to the depth information. The entire area outside the shaded area is left unaltered. This process results in the reconstruction of the original depth maps. Defocusing of the depth map - the sharp depth map then becomes blurred (in a Gaussian way, fast or other) to remove any sharp edge. Unfocusing provides a smooth transition between objects in order to eliminate overlapping of images. The defocus is weighted slightly in the horizontal direction. Vertical defocusing helps to stop the sectioning and overlapping of colors in the image, within images above and below thereby providing a smoother transition between near and far objects. Image processing using depth map - The defocused depth map is then used as a source for displacement of the distortion grid, white means maximum displacement and black means no displacement. The amount of distortion along the horizontal axis is graded according to the depth of the depth map at any given pixel position. In the preferred implementation, the shift for the left image is to the right, and the shift of the right image is to the left. A total forced parallax can be applied to the image so that the white displaced objects (foreground) converge at the screen level. The black areas (background) have a forced parallax equal to an undisplaced image. The direction of travel and the forced parallax may vary to suit particular requirements of the 3D display system over which the converted images are to be displayed. Once a distortion grid has been generated, the conversion of the 2D and 3D image is carried out as previously described. A preferred embodiment of a convert of physical elements for generating left and right images separated from a distortion grid is shown in Figure 2, which can be completely digital. A method to implement this process is shown in Figure 2A and operates as follows. The system uses two line memories, which have multiple ports to allow simultaneous access. A video line is written into one of the line memories while another line memory is read to generate the output video signal. At the end of the current line, the line memories are exchanged. The depth information is extracted from the video signal to regenerate the depth map for the current image. For each output pixel, the depth map is translated into a pixel shift (of the distortion grid). The pixel offset is added to the pixel counter as the video line is read from the line memory. The pixel shift is a fractional value, so it is necessary to read the pixel values of each side of the desired pixel and interpolate the intermediate value. The odd or even field signal from the video decoder is used to control the sequential video field output and to synchronize the obturator lens of the observers to the output video signal. The basic circuits can be duplicated to generate left and right video signals for 3D display that require this video format. A functional block diagram of the DDC decoder is shown in Figure 2b. The first process is to extract data from the object of the video that enters which can be inserted into the VBI or MPEG data stream. The extracted data will be in compressed format and later decompressed using a microprocessor. The microprocessor output is the original object outline information and is processed again to produce depth information for each object. These data are passed to a set of three rotating field buffers, the buffers are controlled by a microprocessor. The first buffer regenerates the original depth maps. The depth maps then move to the next buffer where horizontal and vertical defocus are applied. Once defocused has been applied, the resulting data is applied to a final buffer where the data is passed to the pixel depth converter shown in Figure 2a. Once the data has been transferred to the offset converter, the final buffer is cleared and ready to receive the next depth map. The process of the DDC decoder is illustrated in Figure 2c. This shows the process as a synchronization diagram and establishes the assumption that the current microprocessors are not fast enough to carry out the entire decoding process simultaneously. Therefore, the decoding process must be carried out sequentially in a pipeline process. As the operation of the microprocessor improves the expected of this number, if not all of these processes will be carried out simultaneously. In figure 2c (1) Four video frames are shown, each frame includes odd and even fields. In (2) a list of objects is generated for four frames, while in (3) a depth map for a frame (4) is generated. In (4) the horizontal and vertical defocuses are applied, and in (5) the depth map for frame 4 is transmitted and the buffer that is ready for the next list of objects is deleted. Therefore, in (5) the depth map for table 4 and the 2D image are concurrently available to allow 3D conversion. It should be noted that Figure 2C illustrates the process for an individual frame and in practice, at any time, the depth maps for four different frames are being generated by different sections of the physical elements.
Alternative decoders As previously stated, the currently available microprocessors are not fast enough to carry out the entire decoding process simultaneously. Therefore, an alternative preferred embodiment of a decoder will be described that does not require the use of a fast microprocessor. This alternative decoder makes use of integrated circuits that have been developed for the processing of 2D and 3D computer graphics. Such dedicated graphics processors are capable of generating more than 500,000 polygons per second. Since these integrated circuits are manufactured in large quantities, therefore they are not expensive, the production of a low cost DDC decoder is feasible. The decoder uses the simplest polygon generation capabilities of a graphics processor, polygons mapped by shadowless textures. The decoding process can be more easily understood by explaining the process if it were done manually.
This is illustrated by the flow diagram in diagram 9 and in the subsequent drawings. The process begins with the production of an undistorted mesh, using as many polygons in the xy plane as are necessary to obtain a relatively uniform deformation. In the preferred embodiment, typically 10,000 polygons per field can be used. An example of an undistorted mesh section is shown in Figure 10. The depth map for the object to be converted to 3D (in this example, a cone whose tip is facing the viewer, as seen in the figure 11), is applied to the mesh, which is modified so that the elevation of the z axis of the polygons of the mesh depends on the value of the corresponding pixel in the depth map. This is illustrated in Figure 12. The next step in the process is to move the elevation of the z-axis of each polygon to an equivalent x shift. This is illustrated in Figures 13 to 16. In Figure 13 a section of the x axis is shown through the elevation mesh z. In Figure 14, a row of points along the x-axis is selected and rotated 90 ° around the point y = 0. Figure 15 shows the effect of the rotation at the 45 ° point and Figure 16, after of a rotation of 90 °. This process is repeated for all x rows, which effectively translates the z-axis elevation of the depth maps to a x offset.
The next step in the process is to map the original video frame into an undistorted mesh as indicated in Figure 7. The undistorted mesh is transformed into the previously generated x offset map as indicated in Figure 18. The image The resulting video is then stretched, according to the mesh shift, figure 19. This has the same effect as widening the image as described in our previous application PCT / AU96 / 00820. The enlarged image can be used to form an image of a stereo pair, the other can be formed by rotating the points in Figure 13 at -90 °, which will produce a mesh and a corresponding image, as shown in figure 20. When this process is implemented in physical elements, using a 2D / 3D graphics processor, it is possible to eliminate the stage of moving the elevations of the z-axis in equivalent x displacements. Since it is known that polygons that are closer to the observer require more lateral displacement than polygons farther from the observer, the displacement mesh of Figure 18 can be produced directly from the depth map of Figure 11. This can be obtained since there is a direct relationship between the gray scale value of the depth map and the displacement of each corresponding polygon. This simplified process is illustrated as a flow chart in Figure 21.
Alternative physical element decoder Figure 8 shows a block diagram of physical elements of a DDC decoder based on a 2D / 3D graphics processor. The extraction and generation of the depth map from the CDD data remains as previously described and is illustrated in Figure 2b. The operation of the decoder can be as follows. The incoming video is passed to the DDC data decoder which extracts the DDC information from the video stream and retrieves the depth map for each video field. The video also becomes RGB, YUV or other standard video format and is placed in a double field memory. This allows the video field to be read in the 2D / 3D graphics processor at the same time a new field is loaded. The depth map output from the DDC data decoder is passed to the depth map to the polygon mesh converter, which defines the shape of the polygons to be processed by the 2D / 3D graphics processor. The other input to the graphics processor is the original 2D video image which is used as a texture map to which polygons are applied. The output of the graphics processor is passed to a field memory that allows the video to be read in an interlaced format. This is subsequently passed to a PAL / NTSC encoder, whose output will be the standard field sequential 3D video signal.
Reuse of depth maps It will also be appreciated that it is not necessary to transmit the entire depth map to the receiver since the same depth maps will be reused when similar or similar scenes are displayed again. Therefore, it is desirable that the decoder retain in memory a sequence of previously transmitted depth maps for reuse rather than requiring reprocessing of a depth map that has been previously sent. Either the depth map or the resulting distortion mesh can be retained in the decoder memories which can be volatile or non-volatile and include, but are not limited to RAM, EEPROM, flash memory, magnetic or optical storage, etc. It is also intended that generic depth maps and / or distortion grids be stored in the decoder. This will allow the scenes that are presented frequently to be converted without the need to transmit or convert them into depth maps. The correct depth map can be selected by including data in the video signal that uniquely identifies the decoder whose default depth map is applied. It is also intended that the decoder have the ability to receive new or altered depth maps in a manner that allows a library of depth maps and / or distortion grids to be maintained within the decoder. This library can be kept inside, although it is not limited to the following RAM media, EEPROM, flash memory, magnetic or optical storage, etc. It is intended that the library be updated by the transmission of specific depth maps or distortion grids that are included in the video signal. It is also intended that the library be maintained by means of external or internal plug-in modules containing such depth maps or distortion grids and by downloading to the decoder via the video signal, modem or the internet. Other means of maintaining the library will be apparent to those skilled in the art. The general format of DDC data included in the video signal may, in the preferred embodiment, include a header flag which indicates to the decoder the nature of the data that follows. Many existing standards can be used for this format, which in general will have the following format; < Flag #? Data that will be activated in the decoder > flag examples include, but are not limited to, the following; Flag 1 - The following data is a depth map Flag 2 - The following data relates to the relocation of an existing object Flag 3 - The following data is related to the change in the depth of an object. Flag 4 - The following data relates to the reuse of a previously transmitted depth map. Flag 5 - The following data relates to the use of a depth map within the library. Flag 6 - The following data relates to the modification of a depth map within the library. Flag 7 - The following data relates to the addition of a new depth map within the library. Flag 8 - The following data is related to the deletion of an existing depth map in the library. Flag 9 - The following data relates to the use of motion parallax delays.
Flag 10 - The following data is related to the use of a forced parallax. Flag 11 - The following data relates to the use of a mathematical algorithm. Flag 12 - The following data relates to the use of a library of mathematical algorithms. Alternatively, the length of each data packet can be a different length in which it uniquely defines each packet and eliminates the need for a flag. In the preceding description, the same process can be applied to the distortion gratings. It is also intended that the decoder be able to determine the most suitable depth map to apply to the associated 3D image by automatically making a selection of a nominal set within the library. For example, the DDC data can be directed to a decoder to analyze the library of depth maps between specific index points or by generic category, ie, evening news, horse races. The decoder then has to select the appropriate map based on the object size, shape, speed, direction, color, shading, darkening, etc. As an additional product of the original depth map decoding process, created during the coding process, it can be arranged in a format suitable for use with 3D display systems that require a 2D image and object depth information. These exhibitions may be autostereoscopic and / or volumetric in nature.
Alternative approaches Alternatively, the mesh distortion process can be defined by a mathematical algorithm. This algorithm can be stored in the decoder and the DDC data then comprises the parameters to which the algorithm is applied. For example, consider the general formula f (x, y) = [l-exp (t | (| x | -rx) .dx | ')]. sin (((P | .x) / rx) + P1 / 2) [l-exp (- | (| and | -r) .dy |)] .sen (((P | .y) / ry) + P1 / 2) where P | constant 3.14159 ... 1 * 1 absolute value of x rx interval of x, -rx < = x < = rx r and y interval, -ry < = and < = r and dx damping factor for x dy and damping factor for y If the following values pass to the equation via the DDC data, then the distortion grid is produced in figure 7: rx = ry = 50 dx = dy = 0.1 In terms of CDD data, the following could be transmitted < Flag 11x50, 50, 0.1, 0.1 > Additionally, these parameters can be stored in the memory within the decoder in the form of a library and can be requested again by sending the library index within the DDC data. In terms of DDC data, the following can be transmitted: < Flag 12 > < library index > An additional example of the use of flag 9, motion parallax, will be considered. The prior art has shown that a 2D image having movement in the horizontal direction can be converted to 3D by the use of motion parallax. It is desirable that the movement of the image is due to the horizontal movement of the camera, that is, a panning of the camera. In this technique, one of the eyes of the observers receives the current video field while the other eye receives a previous field, that is, there is a delay between the images presented to each eye. The choice of which eye receives the delayed image and the amount of delay depends on the direction and speed of the horizontal movement in the 2D image. The delay typically can be in the range of 1 to 4 fields. The choice of direction and delay can be made by considering a vector of general movement within the 2D image and selecting these parameters based on the size, direction and stability of the vector. In the prior art it has been necessary to perform these calculations in real time in the observation position that requires substantial processing capabilities. It has been found that a preferred method is to calculate the motion vectors, and hence the field delay quantity direction, in the transmission position and then transmit these values as part of the video signal. Therefore, in a preferred embodiment, the transmitted data would be as follows: < Flag9? Address and delay > where < direction and delay > typically it would be in the range of -4 to +4. Then the DDC decoder can retrieve this data and use it to insert the correct field delay amount and correction into the processed images. Distortion mesh can also be obtained in real time by adding a camera to an existing 2D video or film camera which, using a variable focus lens and a sharpness detection algorithm, determines the depth of the objects the image that is being observed by the camera. The depth of the object can be obtained from a pair of stereo cameras, so the correlation between pixels in each image indicates depth of the object. The output of these configurations, before processing to provide distortion mesh data, can be used to generate depth maps. This is obtained by processing the original 2D image and applying shading, or other indications, to indicate depth of the object, as explained in this description. The outline of each object can be obtained from the characteristics of the object such as object size, color, speed of movement, shading, texture, brightness, darkness as well as differences between previous, current and future images. Neural networks and expert systems can also be used to aid in the identification of objects. It is also proposed to move the image inside the camera so as to obtain a physical displacement of the subsequent images in the image sensor of the cameras. This displacement can be produced optically, electro-optically, mechanically, electromechanically, electronically or by other methods known to those skilled in the art. The offset can be in one direction, ie, x, or in multiple directions, either sequentially or randomly. The movement of the objects in the camera sensor will be greater for those objects that are closer to the camera. By correlating the pixels in successive images, the depth of each object can be determined. Alternatively, a plurality of cameras may be used. Other techniques can be used to determine the depth of objects within a scene. These include, but are not limited to, the use of field locators that operate on optical, ultrasonic or microwave laser principles, or the projection of gratings on objects within the scene and determination of the depth of an object from the distortion resulting from the grid. Many packages of programming elements (software) computer aided drawing (CAD) allow to produce drawn frame models of the images.
These linear sketch models, which are a projection of the facets of the object, can be used to finish the position of objects within a scene. Similarly, part of the process to convert three non-stereoscopic images from packages similar to 3D Studio, allows the distance to be transmitted from the camera to each pixel. This facility can produce a grayscale image which has the closest object with a white appearance, and the furthest point from the camera with a black color. This gray scale map can be used as a compatible depth map for stereoscopic 3D conversion. It is noted that in relation to this date, the best method known to the applicant to carry out the aforementioned invention, is that which is clear from the present description of the invention.

Claims (42)

  1. Having described the invention as above, the content of the following claims is claimed as property: 1. A method for producing a depth map to be used in the conversion of two-dimensional images into stereoscopic images, characterized in that it includes the steps of: minus one object within a two-dimensional 2D image; assign each object with an identification tag; assign to each object with a depth label; and determine and define a contour for each object.
  2. 2. The method according to claim 1, characterized in that the contour of the object is defined by a series of coordinates, curves and / or geometric shapes.
  3. 3. The method according to any of the preceding claims, characterized in that the identification tag is a unique numerical number.
  4. 4. The method according to any of the preceding claims, characterized in that the identification of at least one object includes the step of comparing the 2D image with a library of generic scenes.
  5. 5. The method according to any of the preceding claims, characterized in that the step of determining the contour also includes tracing the object pixel by pixel.
  6. 6. The method according to any of claims 1 to 4, characterized in that the step of determining the contour also includes using straight lines to approximate the contour of the object.
  7. 7. The method according to any of claims 1 to 4, characterized in that the step of determining the contour further includes using curved approximations to approximate the contour of the object.
  8. 8. The method according to any of claims 1 to 4, characterized in that the step of determining the contour also includes using Bézier curves to approximate the contour of the object.
  9. 9. The method according to any of claims 1 to 4, characterized in that the step of determining the contour further includes comparing the object with a library of curves and / or generic geometric shapes to approximate the contour.
  10. 10. The method according to claim 9, characterized in that it includes graduation of the curve and / or the generic or geometric shape to better match the object.
  11. 11. The method according to any of the preceding claims, characterized in that the depth label includes a color code.
  12. 12. The method according to claim 11, characterized in that the target represents objects relatively close to the observer, and the black indicates objects relatively distant from the observer.
  13. 13. The method according to any of claims 1 to 10, characterized in that the depth label is a number value.
  14. 14. The method according to claim 13, characterized in that the numerical value varies from 0 to 255.
  15. 15. The method according to any of the preceding claims, characterized in that at least one object is further divided into a plurality of segments, each segment is assigned a depth label.
  16. 16. The method according to claim 15, characterized in that the variation in depth is defined by a ramp function.
  17. 17. The method according to claim 16, characterized in that the ramp function is a linear or radial ramp.
  18. 18. The method according to any of the preceding claims, characterized in that it also includes the tracking of each object in successive frames of the image, and determine and assign depth labels for the object in each respective frame.
  19. 19. The method according to any of the preceding claims, characterized in that it also includes adding a surface texture surface map to the object or to each object.
  20. 20. The method according to claim 19, characterized in that the map of surface texture ridges is defined by decomposing the object into a plurality of components and assigning each component a separate depth label.
  21. 21. The method according to claim 19, characterized in that the map of surface texture ridges is defined by the luminance values of individual components of the object.
  22. 22. The method according to claim 19, characterized in that the map of surface texture ridges is defined by the chrominance, saturation, color grouping, reflections, shadows, focus and / or sharpness of individual components of the object.
  23. 23. The method according to any of the preceding claims, characterized in that it also includes producing images in gray scales of a resolution of 80x60x8 bits of each 2D image.
  24. 24. A method for producing a depth map for use in the conversion of 2D images into a video sequence, in stereoscopic images, characterized in that it includes the steps of: identifying and numbering each frame of the video sequence; identify at least one object within the video sequence, - assign each object with an identification tag; dividing the video sequence into a plurality of partial sequences; transmitting the partial sequences to the plurality of operators, each operator determines and defines the contour for each object in the partial sequence previously assigned to said identification tag, - receiving the partial sequences of a plurality of operators; compare the partial sequences to perform the video sequence; and assign each object with a depth label.
  25. 25. The method according to claim 24, characterized in that it further includes the step of adding security measures to the sequence before the video sequence is divided into a plurality of partial sequences.
  26. 26. The method according to claim 25, characterized in that the security measures include removing the audio from, and / or modifying the colors of the video sequence.
  27. 27. A method for encoding a depth map for use in the conversion of 2D images into stereoscopic images, characterized in that it includes: assigning an object identifier to an object, - assigning the object with a depth label; and define the outline of the object.
  28. 28. The method according to claim 27, characterized in that the contour of the object is defined by a series of coordinates x, y, each coordinate x, and is separated by a curve.
  29. 29. The method according to claim 28, characterized in that each curve is stored in a library and assigned a unique number.
  30. 30. The method according to claim 28 or claim 29, characterized in that the contour of the object also includes data in the orientation of each curve.
  31. 31. The method according to any of claims 28 to 30, characterized in that the curve is a Bézier curve.
  32. 32. The method according to claim 27, characterized in that the contour of the object is defined by at least one geometric shape.
  33. 33. The method according to claim 32, characterized in that at least one geometric shape is defined by the shape of the contour and the contour parameters.
  34. 34. The method according to any of claims 27 to 33, characterized in that the coding of the depth label of the object includes: assigning a type of depth; and assign a depth for the object;
  35. 35. The method according to claim 34, characterized in that the depth type includes a single value, a linear ramp or a radial ramp.
  36. 36. A method for transmitting 2D images and depth map data for observation in a stereoscopic observation system, characterized in that it includes: embedding the depth map data in the vertical return interval of an analog television signal.
  37. 37. A method for transmitting 2D images and depth map data for observation in a stereoscopic observation system, characterized in that it includes: embedding the depth map data in the MPEG of a digital television signal.
  38. 38. A method for transmitting 2D images and depth map data for observation in a stereoscopic observation system, characterized in that it includes: embedding the data of the depth map in the VOB file of a DVD.
  39. 39. A method for decoding depth map data, characterized in that it includes: receiving 2D images and depth map data corresponding to 2D images; determine an object identified in the depth map data; determine the corresponding depth for the object; shading the object based on the depth; and process the image to form a distortion grid where the amount of distortion depends on the depth.
  40. 40. The method according to claim 39, characterized in that it further includes: defocusing the depth map before forming the distortion grid to thereby provide a smoother transition between objects.
  41. 41. A method for decoding depth map data, characterized in that it includes: producing an undistorted mesh from a plurality of polygons, - applying the depth map to the mesh, where the elevation of the polygons within the mesh depends on the depth recorded in the depth map; convert the elevation of the polygons into translational displacements in order to create a distorted mesh; and apply the distorted mesh to a 2D image that corresponds to the depth map data.
  42. 42. A decoder for decoding depth map data, characterized in that it includes a library of depth maps, where the input data is compared to the library, and when the data does not match a depth map in the depth map library , the decoder processes the incoming data using the method according to claim 41.
MXPA/A/2000/005355A 1997-12-05 2000-05-31 Improved image conversion and encoding techniques MXPA00005355A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PPPP0778 1997-12-05
PPPP2865 1998-04-08

Publications (1)

Publication Number Publication Date
MXPA00005355A true MXPA00005355A (en) 2001-07-31

Family

ID=

Similar Documents

Publication Publication Date Title
CA2305735C (en) Improved image conversion and encoding techniques
EP3751857A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and decoding
EP2005757B1 (en) Efficient encoding of multiple views
Zhang et al. 3D-TV content creation: automatic 2D-to-3D video conversion
EP1141893B1 (en) System and method for creating 3d models from 2d sequential image data
US6205241B1 (en) Compression of stereoscopic images
US20090219383A1 (en) Image depth augmentation system and method
JP2004505394A (en) Image conversion and coding technology
JP2000503177A (en) Method and apparatus for converting a 2D image into a 3D image
US20150379720A1 (en) Methods for converting two-dimensional images into three-dimensional images
JP2022533754A (en) Method, apparatus, and computer program product for volumetric video encoding and decoding
EP1668919A1 (en) Stereoscopic imaging
KR100335617B1 (en) Method for synthesizing three-dimensional image
US7439976B2 (en) Visual communication signal
AU738692B2 (en) Improved image conversion and encoding techniques
US11823323B2 (en) Apparatus and method of generating an image signal
MXPA00005355A (en) Improved image conversion and encoding techniques
Sommerer et al. Time-lapse: an immersive interactive environment based on historic stereo images