WO2012044308A1 - Projection matrix - Google Patents

Projection matrix Download PDF

Info

Publication number
WO2012044308A1
WO2012044308A1 PCT/US2010/050944 US2010050944W WO2012044308A1 WO 2012044308 A1 WO2012044308 A1 WO 2012044308A1 US 2010050944 W US2010050944 W US 2010050944W WO 2012044308 A1 WO2012044308 A1 WO 2012044308A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
projection matrix
scene
points
point
Prior art date
Application number
PCT/US2010/050944
Other languages
French (fr)
Inventor
Renato Keshet
Michal Aharon
Hadas Kogan
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2010/050944 priority Critical patent/WO2012044308A1/en
Publication of WO2012044308A1 publication Critical patent/WO2012044308A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/536Depth or shape recovery from perspective effects, e.g. by using vanishing points
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration

Definitions

  • the field of digital image editing, manipulation, and enhancement has evolved to contain three-dimensional (3D) scene structure understanding.
  • 3D three-dimensional
  • 3D properties of a simple scene, such a room can be characterized using a camera projection matrix, which maps 3D "world” coordinates into two dimensional (2D) "image” coordinates, mimicking the image acquisition process.
  • the projection matrix can be computed from physical parameters related to intrinsic and extrinsic camera properties, such as its location, orientation and focal distance.
  • Determining a camera projection matrix for a given scene allows for an understanding of the 3D structure of the scene from a single 2D image.
  • This information may be used for a variety of vision tasks such as camera calibration, perspective rectification, scene reconstruction and more.
  • this knowledge may be used in applications that allow a user to insert and manipulate new objects such as furniture and decorations into a 2D image, such that they will appear geometrically correct.
  • Figure 1 is a schematic diagram of vanishing points in a room according to an example
  • Figure 2 is a schematic diagram of a scene and of selected points within the scene according to an example
  • Figure 3 is a schematic representation of a portion of a room according to an example
  • Figure 4 is a flowchart of a method for calculating a camera projection matrix according to an example
  • Figure 5 is a schematic block diagram of a system suitable for implementing examples as described herein;
  • Figure 6 is a block diagram of a method for calculating a projection matrix according to an example
  • Figure 7a is a schematic block diagram of a system according to an example.
  • Figure 7b is a schematic block diagram of a system according to an example. DETAILED DESCRIPTION
  • the present specification relates to a method for estimating 3D structure of a scene from a 2D image. More particularly, the present specification relates to calculating a camera projection matrix that characterizes the 3D structure of a scene or structure in a 2D image using a processor, such as an image processor, that is capable of performing the methods described herein.
  • the calculation uses information that is computed or otherwise provided beforehand to determine the 3D characteristic points, which can be used to refine an initial estimate for a projection matrix.
  • An example of a scene in a 2D image from which the 3D structure may be estimated according to an example includes a man-made indoor scene, such as a room inside a building.
  • Another example of a 2D image may be an outdoor scene including buildings or other man-made objects which have corner points and vanishing points.
  • a camera projection matrix, or projection matrix is typically a 3 x 4 matrix which describes the mapping of points in a 3D world coordinate system to corresponding points on a 2D image, and is typically defined using a pinhole camera model which describes the mathematical relationship between the coordinates of the 3D point and its projection onto the image plane of an ideal pinhole camera, that is, where the camera aperture is described as a point and no lenses are used to focus light.
  • the model provides a first order mapping from the 3D scene to the 2D image, but in the vast majority of vision tasks this proves to be sufficient and so second order effects can be usefully disregarded.
  • the term "vanishing point” is used to broadly describe a point which is the intersection of perspective projections of 3D imaginary parallel lines which relate to parallel lines in a scene.
  • One example of such parallel lines may include the top line of a picture frame and the bottom of the same picture frame.
  • Another example may include the top line of a wall where the wall meets the ceiling in a room, and the bottom line of the wall where the wall meets the floor.
  • main line describes the border between two main surfaces in the scene.
  • the previous examples of the top and bottom lines of a wall at the ceiling and floor are examples of main lines, and the intersection of two such lines suffices to define the position of a corner point.
  • Main lines also include the borders between walls.
  • a "corner point” is to be broadly understood as an intersection between main lines, such as between two or three main lines for example.
  • a corner point is the intersection between, for example, the top lines of two adjacent walls and the line between the two adjacent walls.
  • a room corner should be taken to be synonymous with a corner point in general, including, but not limited to a corner point in an outdoor scene or structure, such as the corner point of a building or other structure for example. That is to say, reference to a room corner herein is provided for the sake of simplifying the present description, and not intended to limit implementations in which data representing any other corner point other than a room corner is available. Accordingly, although certain examples are described with reference to a "room corner", this does not preclude the use of methods and systems as described herein being used to determine a projection matrix for an outdoor scene for example. Furthermore, references to a room, scene or structure can be used interchangeably.
  • One way of calculating a projection matrix from a given image is to gather correspondences between 3D world points and 2D image points, and then compute the matrix parameters that support these correspondences.
  • Particular choices of corresponding pairs can be specific points that characterize the 3D structure, such as for example the main vanishing points and room corners in a scene.
  • the main vanishing points in a room are those vanishing points associated with lines that are parallel to the intersections of two of walls, the ceiling and the floor (since these are mutually orthogonal, there may be up to three main vanishing points in room scenes).
  • Corresponding pairs associated with the main vanishing points and room corners can be estimated automatically from a single image, and a projection matrix can be computed when the three main vanishing points for a scene are given, as well as two room corners. In fact, it is sufficient to have information representing the position of one room corner and the distance between two room corners, in addition to the 3 vanishing points. According to an example, it is thus possible to obtain an estimation of the projection matrix for a scene even with a subset of data. This is sufficient to support some practical applications.
  • a method for calculating a final projection matrix uses prior assumptions in the form of heuristic or initial data about how images of rooms and scenes or structures are captured in order to compensate for any missing or inaccurate correspondence points which would otherwise make the calculation of a camera projection matrix for the room or scene impossible or impractical. For example, assuming the use of a typical focal length for a camera of between 30 to 50 millimeters and that the camera is approximately parallel to the floor at a distance of 3-10m etc. The method can be extended to simultaneously compute vanishing points as well as the projection matrix.
  • Figure 1 shows a schematic diagram of a basic room 100 as seen from a single perspective inside the room 100. As shown, the perspective of the image is of a camera view directed towards the line of intersection between where the first and second walls 102, 104 join. Because figure 1 is a schematic of a simple room having no contents, the location of the corners 106, 108 and main lines 110, 112, 1 14, 116, 118 in the scene are clearly visible to the human eye, however in real life situations this may not always be the case.
  • the vanishing points 120, 122 are beyond the boundaries of the image, and may be determined using image processing techniques. In some examples, the vanishing points 120, 122 may be obtained by any method known in the art. According to one example, the vanishing points 120, 122 may be computed by comparing two columns of an image and finding an affine transformation that maps between similar features in the two columns. The location of the vanishing points can then be directly computed from the parameters of the affine transformation. In the present example, each of the main lines is extended using broken lines 124, with the exception of a first main line 110 where the first and second walls 102, 104 join.
  • the image has left and right vanishing points 120, 122.
  • the left vanishing point 120 is the point at which second and third main lines 116, 118 intersect when extended using the broken lines 124.
  • the right vanishing point 122 is the point at which fourth and fifth main lines 112, 114 intersect when extended. As shown, the vanishing points 120, 122 are located outside the image boundaries due to the angle of the main lines from the image perspective.
  • the vanishing points 120, 122 may then be used to obtain the corner points 106, 108.
  • the upper corner point 106 is located at the intersection of the first, second, and fourth main lines 110, 112, 116, which is where the ceiling 126 meets both the first and second walls 102, 104.
  • the lower corner point 108 is located at the intersection of the first, third, and fifth main lines 1 0, 114, 1 18, which is where the floor 128 meets both the first and second walls 102, 104.
  • Each of the corner and vanishing points is a characteristic point that is helpful to understanding the 3D structure of the image.
  • the schematic of figure 1 is thus a 2D representation of a 3D scene, in which there is a mapping of the coordinates of a 3D point in the scene to the 2D image coordinates of the point's projection onto the image plane of a camera (or other suitable imaging device) which captured the scene.
  • the diagram of figure 1 will generally be an image captured using a camera whose camera projection matrix is that which is desired.
  • the position of at least one point in a room can be used in order to derive a camera projection matrix, and user input can be sought to mark or otherwise derive such a position.
  • the location of the corner of a room is the simplest point in an image for a user to detect and mark.
  • any other point can be used.
  • a point on a wall that is 1 meter left to the corner, and 1 meter higher for example (that is to say, a point on a specific place on the wall) can be used.
  • Such a point gives a correspondence between the 2D image coordinates, and the 3D world coordinates.
  • a corner point is much easier to detect by eye, and therefore this is used as a correspondence point according to an example.
  • a corner point can be marked by a user, or derived from other markings which give lines whose intersection when extended is a corner point.
  • four points can be provided (eg marked by a user) such as points which are at the intersection between walls and the floor, two on the left of a room and two on the right of a room for example. Each pair of points on either side of the room defines a line, and the intersection of the two lines when projected defines a corner point.
  • vanishing points can be used for the calculation of the camera projection matrix, and can be extracted using an automatic procedure (without human intervention), such as using a process as described below.
  • the camera is located at a point C in the world coordinate system, and that it is rotated with respect to that system by a rotation matrix R. Also, it is assumed that the focal distance of the camera is f and that the optical center (the point where the principal axis of the camera intersects the image plane) has the coordinates (p x , p y ) on the image coordinate system.
  • 3x4 matrix P is the camera projection matrix, and is given by:
  • the projection matrix P thus has 9 free parameters: f , p x , p y , 3 coordinates for C, and 3 free parameters for R (R is a rotation matrix defined by 3 angles; its columns are unitary and mutually orthogonal).
  • An approach for inferring the projection matrix parameters from a given picture may include identifying at least five points of known coordinates in the real world in the image. For example, let ⁇ (ui.v,) ⁇ be a set of N (equal to or larger than five) points in an image, and ⁇ ( ⁇ ,, ⁇ , ⁇ ) ⁇ be the corresponding set of coordinates in the real world, that is the coordinates of the real world points in coordinate system of the 3D scene. Then, a system of 2N equations and nine variables can be obtained, given by: where:
  • a particular case of point correspondence is that of main vanishing points.
  • a main vanishing point is obtained by right multiplying P by one of the three vectors (1 ,0,0,0), (0,1 ,0,0), and (0,0,1 ,0), which are the homogeneous coordinates of the points where parallel lines intersect on each of the three main directions X, Y, and Z in a scene.
  • parameters f, px, py, and the rotation matrix R can be computed.
  • An additional point allows the computation of homogeneous coordinates for the camera position C, i.e., the parameters C1/C3 and C2/C3.
  • This provides a projection matrix with an undefined scaling parameter.
  • the parameters f, R, and the last vanishing point can be computed.
  • an additional correspondence pair gives a projection matrix with scaling uncertainty, and the identification of a known object in the image eliminates the uncertainty.
  • the vector provides an initial start point for a calculation of a camera projection matrix.
  • ⁇ s,, s 2 , s 3 ⁇ are the three angles that define the rotation matrix R, where s, is associated with an angle of rotation around the person (or tripod) holding the camera, i.e. an angle of rotation in the horizontal plane such as yaw, s 2 is associated with an angle of rotation around the camera view axis, such as roll, and s 3 is associated with the pitch of the camera.
  • the parameters ⁇ sun, s 5 ) are the image coordinates of the principal point, (poul, p y ), s 6 is the focal distance f, and ⁇ s 7 , s 8 , s 8 ⁇ are the world coordinates of the camera C, where - relative to the position of the camera - s 7 is the distance to the left wall, stress is the height from the floor, and s 9 is the distance from the right wall in a scene being considered.
  • a measure Prob(S) is the probability that a given vector of parameters reflects the projection matrix associated with a real-life image of a room.
  • the measure is calculated using heuristics and a Gaussian model. That is, the desired likelihood is approximated by defining that:
  • s n is the expected value of s n
  • ⁇ ⁇ are the standard deviations of s n that reflect a probable range of values for that variable
  • the ⁇ roof ⁇ are constants associated with each s braid and that transform all variables to a single, comparable scale, and which reflect the amount of confidence for each of the prior defined values s n . While actual values for these quantities can be estimated by gathering data over a large dataset, heuristic values for these variables are used according to an example.
  • the camera is likely to be parallel to the floor, that the image of a scene is uncropped (so that the principal point is at the center of the image for example), that the 35mm-equivalent of the focal distance is 40mm, that the camera is approximately 1.60m from the level of the floor of the scene, and is equidistant to the room walls at a distance of 5 meters.
  • the table below provides a listing of suitable heuristic values according to an example. It will be appreciated that other values can be chosen depending on the circumstances and on the nature of the scene being considered.
  • the vector S is a starting point for the estimation of a camera projection matrix, which according to an example, is refined using data gathered from an image of a scene, and more specifically data representing the correspondence between points in an image to certain known points in the 3D scene.
  • a vector of measurements M is provided according to an example which is a collection of correspondence points; i.e., a set of points in the image of a scene where each of the 3D world coordinates are known or otherwise defined.
  • the set M includes all, some or none of the main vanishing points and room corners of the scene.
  • data corresponding to one position in a room such as a room corner for example, but more generally, any point in a room where the world coordinates are known so that a correspondence into the 2D image coordinate system can be obtained is provided.
  • the position of a room corner is provided, with a floor corner being suitable when objects such as floor standing furtniture are to be inserted into an image of a room. If the actual room is not of height h, the computed scale factor of the resulted projection matrix will be affected. The more points ⁇ p' j ⁇ for which the world coordinates are known, the more accurate the estimation of the camera projection matrix. Accordingly, it is desirable to know the position of a room corner and at least one vanishing point. According to an example, user input is used in order to define or otherwise determine the position of a corner in a scene.
  • a user can mark a corner position in an image, and the mark can be used to determine the image coordinates corresponding to the room corner.
  • the mark can be used to determine the image coordinates corresponding to the room corner.
  • a user can mark points at the intersection of the walls and floor (or ceiling) of the room. The points can then be used to. extend lines whose intersection is a room corner.
  • main vanishing points can be detected using a number of techniques which do not call for user input.
  • techniques can involve comparing multiple columns of an image and finding an affine transformation that maps between similar features in the two columns, using a Hough transform approach for example, where the similar features generally occur at the intersection of walls, floors and ceiling. The location of the vanishing points can then be directly computed.
  • the fact that man-made scenes typically include regular features or textures can be used.
  • the regularity is used in order to determine a measure of similarity from which scale and displacement between features can be determined and thus used to provide a measure for the vanishing points in a scene as will be described below.
  • Prob(S ⁇ M) i.e., to find the parameter vector S that is the most likely for a given set of measurements M.
  • a Bayesian approach is used in an example, such that maximization of Prob(S ⁇ M) is equivalent to the maximization of Prob(M ⁇ S)Prob(S).
  • the value of Prob(M ⁇ S) is defined according to:
  • the 2-norm provides a measure of the 'distance', or similarity, of a measured world point which has been transformed into image coordinates using a projection matrix based on a parameter set S to the measured value for a corresponding image point.
  • a numeric optimization function can be used to solve for f(S).
  • such functions proceed by having a scalar function of a number of variables defined, which in the present example is the set of variables for the camera projection matrix, and which thus provides an initial estimate.
  • the scalar function acts like a black box that returns a value for each selection of a set of variables.
  • the minimization technique samples the search space around the initial set to find the set of variables that result in a local minimum. Any such suitable unconstrained nonlinear optimization technique can be used.
  • the search space can be confined for each parameter within a range determined using the value for the standard deviation given above for example.
  • the first part of the expression above for e(S) describes how the s screw should seek to obey the input data, that is the set of image points in vector M.
  • the second part of the expression describes how the s n are favoured to be closer to the expected value of s n .
  • the 2-norm function is thus effectively deriving a measure of similarity between the estimated projection matrix given . the input data set and taking into account the set of provided correspondence points.
  • a search space for the s n includes a set of values which are used in order to determine a minimum value for E(S).
  • the search space can be minimized by taking into account some simple factors stemming from the geometry of a typical room and the use of a camera.
  • the search space for a camera pitch angle can be limited to an angle in the range —
  • pragmatic choices can be made for other ones of the parameters, such as those corresponding the standard deviation of the parameters given in the table above for example.
  • vanishing points or room corners of a room can be determined either manually (such as by a user marking points on the walls in an image), or automatically using techniques such as those described above, ahd added as part of the initial data set M in order to then determine the parameter set S for the camera projection matrix.
  • Figure 2 is a schematic diagram of a scene and of manually selected points within the scene according to an example.
  • the scene of figure 2 is an internal room 200 in which the floor, ceiling and two walls of the room are within a field of view of a camera whose camera projection matrix is desired.
  • a user marks 6 points (shown by the black stars) along the main lines 201 , 202 of the room, which form the intersection between the ceiling 203 and the two walls 204, 205, and on main lines 207, 208 between the floor 206 and the two walls 204, 205. Such marks are depicted by points 209-214.
  • Marks 210, 213 are provided at the visible corners of the room 200.
  • the position of two vanishing points and the two room corners can be determined by, for example, projecting lines along the marks so that the lines intersect - the points at which the projected lines intersect gives the positions for the VPs and room corners.
  • Such line projections are shown as white lines in figure 2. Note that the exact position of the points 209-214 (which are not to scale) is not significant, and need not be at the exact positions specified - i.e. the position of the points can vary along their respective lines providing that they are positioned to enable the desired data to be obtained.
  • the determined VPs and room corner position are input to vector M.
  • the addition of data representing the position of, one vanishing point (left or right) improves the measure.
  • the processes of calculating the position of the main vanishing points and extracting the projection matrix are combined.
  • the approach can enable noisy, inaccurate and even incorrect data to be dealt with.
  • the estimation of the main vanishing points can typically proceed by using similar patches detected in an image. This is driven by the assumption that such similar patches likely represent object edges or planes that are parallel to one of the three main axis (for example, edges of furniture parallel to the walls).
  • the lines linking pairs of similar patches are termed segments. Note that segments could be related to existing object edges within the image, but might also be related to perceptual lines that do not actually exist within the image.
  • FIG. 3 is a schematic representation of a portion of a room according to an example.
  • a wall 300 includes a repeating pattern, such as wallpaper for example. Automatic detection of vanishing points in such a scene can be provided by detecting pairs of similar image patches within the image portion and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
  • the wall 300 exhibits a global self-similarity property, which can be used to estimate a VP.
  • image patches are small neighbourhoods of pixels around a pixel of interest. This is illustrated in figure 3 by boxes 301 on the wall 300, each pair thereof being joined by a dotted projection line 302 which crosses with the other projection lines at the VP 303.
  • a set of straight virtual line segments that connect pairs of matching image patches are concurrent and converge at the VP.
  • the term 'virtual line' is a line constructed or projected through matching (or similar) patches in an image.
  • a virtual line may coincide with a true straight line in the image but, equally may not coincide with any discernable line, straight edge or linear feature in the image.
  • the process of obtaining a single VP from a global 2D self-similarity can be viewed as equivalent to clustering a large collection of VP candidates, each obtained from either the meeting point of virtual lines connecting matching points as described above, or equivalently obtained by a 1 D-affine similarity between a pair of parallel 1 D image profiles.
  • the function C is a similarity function given by: where a (S) is the angle between the segment s k and the line that connects the d'th vanishing point (induced by S) and the middle point in the segment s k .
  • a (S) is the angle between the segment s k and the line that connects the d'th vanishing point (induced by S) and the middle point in the segment s k .
  • a numeric optimization function can be used to solve for e(S), as described above.
  • an iterative procedure that estimates both is used according to an example. Firstly, an initial assignment of segments is set. Such an assignment can be determined by using the proximity of segments to the VPs inferred from the default projection matrix. Assuming this assignment is fixed, the projection matrix is estimated using the above equation for e(S). Given the updated projection matrix the segments are then reassigned as given above. This is repeated until convergence is reached.
  • FIG 4 is a flowchart of a method for calculating a projection matrix according to an example.
  • An image of a room in the form of image data 400 is presented to a user in block 402, such as being presented on a display of a system for example.
  • the user is able to interact with the image using the system in order to select points on the image.
  • a lower corner point of the room (or scene) is visible in block 403
  • a user marks a point on the image corresponding to the lower corner point of the room at block 404. If the lower corner point is not visible in block 403, the user can select an upper corner point at block 405.
  • the user can mark points on the image at the intersection of the walls and ceiling or at the walls and floor of the room. For example, a user can mark two points on the right side of the room at a wall and floor or wall and ceiling intersection, followed by two points on the left side of the room at a wall and floor or wall and ceiling intersection. Using the marks at the intersection, line intersections are calculated in block 407, and the intersection is the corner of the room (408).
  • the data relating to the room corner is used to calculate a camera projection matrix for the room including using data 410 representing the position of vanishing points which have been determined automatically using a process as described above. If the calculation fails, or the results are not reasonable, the user can be prompted to input more data in the form of any of (if not already obtained):
  • vanishing points can be computed by determining the intersection points of lines derived from the points (that is to say, each pair of points define a line, with the first and third lines intersecting at one vanishing point and the second and forth lines intersecting at another vanishing point).
  • the order of the points marked by a user is not significant, but determining what a user marks is - that is, whether it is a bottom right point or an upper left point etc that is marked, so that it can be determined which two points should be used to draw lines to derive a corner point. The provision of whether a bottom or top corner is favoured depends on the application.
  • the camera projection matrix calculated according to an example is used to virtually 'insert' or 'plant' things (like furniture) in a room. Those things are more likely to be inserted on the floor, and therefore if the bottom corner is marked or derived it gives an accurate match to the floor as mentioned above. If the application is for the virtual painting of a ceiling for example, the upper corner could be a more sensible choice.
  • FIG. 5 is a schematic block diagram of a system 500 that can implement any of the examples described herein.
  • the system 500 includes a processing unit 501 (CPU), which can be an image processor according to an example, a system memory 503, and a system bus 505 that couples the processing unit 501 to the various other components - of the system 500.
  • the processing unit 501 typically includes one or multiple processors, each of which may be in the form of any one of various commercially available processors for example.
  • the system memory 503 typically includes a read only memory (ROM) that stores a basic input/output system (BIOS) that contains startup routines for the system 500 and a random access memory (RAM).
  • ROM read only memory
  • BIOS basic input/output system
  • RAM random access memory
  • the system bus 505 may be a memory bus, a peripheral bus or a local bus, and may be compatible with any of a variety of bus protocols, including PCI(e), VESA, MicroChannel, ISA, and EISA.
  • the system 500 also includes a persistent storage memory 507 (e.g., a hard drive (HDD), a floppy disk drive, a CD-ROM drive, magnetic tape drives, flash memory devices, and digital video disks) that is connected to the system bus 505 and contains one or multiple computer-readable media disks that provide non-volatile or persistent storage for data, data structures and machine readable or computer-executable instructions.
  • a persistent storage memory 507 e.g., a hard drive (HDD), a floppy disk drive, a CD-ROM drive, magnetic tape drives, flash memory devices, and digital video disks
  • digital image data 400 and initial values for parameters of a camera projection matrix 520 can be stored in memory 507.
  • a user may interact (e.g., enter commands or data) with system 500 using input devices 509 (e.g., a keyboard, a computer mouse, a microphone, joystick, and touch pad or touch sensitive display screen).
  • Information may be presented through a user interface that is displayed to a user on the display 511 (implemented by, e.g., a display monitor which can be touch sensitive, including a capacitive, resistive or inductive touch sensitive surface for example), and which is controlled by a display controller 513 (implemented by, e.g., a video graphics card).
  • a user can be presented with an image of a room representing image data 400, and can mark points in the image using an input device 509 of the system.
  • the system 500 can also typically include peripheral output devices, such as speakers and a printer for example.
  • a remote computer may be connected to the system 500 through a network interface card (NIC) 515.
  • NIC network interface card
  • the system 500 can include provision to send data to a remote location where certain items are prepared for a user (such as printed merchandise etc).
  • the system memory 503 also stores processing information 517 that includes input data, processing data, and output data.
  • the system can interface with a graphics driver to present a user interface on the display 511 for managing and controlling the operation of the system 500, such as for marking room positions for example.
  • the system can calculate a projection matrix in block 520 of system memory 503.
  • image data 400 is used by system 500 to calculate vanishing points using segments as described above.
  • Figure 6 is a block diagram of a method for calculating a projection matrix according to an example. More specifically, for image data 600 representing an image of a scene for which the projection matrix is desired, a position of a vanishing point in an image plane of the image is determined in block 601. In block 603 a set of initial input data representing multiple initial measures for at least one parameter of the projection matrix and a position in the image for a corner point of the scene are received. In block 605 a function dependent on the measures is minimized Using the position, and a set of final parameters for the projection matrix is calculated in block 607.
  • Figure 7a is a schematic block diagram of a system according to an example.
  • An image processor 701 receives image data 700 representing an image of a scene or structure.
  • a projection matrix for the scene or structure is calculated in block 702. More specifically, the image processor 701 , coupled to memory 703 calculates values for the parameters of the projection matrix 702 by minimizing the function 704 in response to the receipt of data representing a corner point 705 of the scene or structure.
  • Figure 7b is a schematic block diagram of a system according to an example.
  • An image processor 801 receives image data 800 representing an image of a scene or structure.
  • a projection matrix for the scene or structure is calculated in block 802. More specifically, the image processor 801 , coupled to memory 803 calculates values for the parameters of the projection matrix 802 by minimizing the function 804 in response to the receipt of data representing a corner point 805 of the scene or structure.
  • a projection matrix 520, 702, 802 can be used to augment the scene which it has been determined for to position items so that the perspective of the scene is obeyed. For example, in a room, furniture can be added to the room in order to determine a desired placement and/or orientation. Similarly, for an outdoor scene, items can be placed into the scene which obey the perspective of the scene.
  • an initial measure for a projection matrix of a room, scene or structure can be determined using a set of initial, heuristic, parameter values.
  • the initial measure can be refined using the provision of correspondence points in an image of the room, scene or structure and the corresponding 3D world points.
  • a corner point can be manually marked by a user, or derived using the intersection of lines from points on the image.
  • Vanishing points which can further be used to refine the measure, can be derived using the techniques as described above and incorporated into the calculation of the parameters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

An image processor to receive image data representing an image of a three-dimensional structure and to calculate a projection matrix from the image data, including by receiving a set of initial input data representing heuristic measures for parameters of the projection matrix, and a position in the image for a corner point of the structure, and to minimise a function dependent on the measures using the position to calculate the projection matrix.

Description

PROJECTION MATRIX
BACKGROUND
[0001] The field of digital image editing, manipulation, and enhancement has evolved to contain three-dimensional (3D) scene structure understanding. In a 3D structure of a man-made scene, there are several characteristic points, including vanishing points and corner points. The 3D properties of a simple scene, such a room, can be characterized using a camera projection matrix, which maps 3D "world" coordinates into two dimensional (2D) "image" coordinates, mimicking the image acquisition process. The projection matrix can be computed from physical parameters related to intrinsic and extrinsic camera properties, such as its location, orientation and focal distance.
[0002] Determining a camera projection matrix for a given scene allows for an understanding of the 3D structure of the scene from a single 2D image. This information may be used for a variety of vision tasks such as camera calibration, perspective rectification, scene reconstruction and more. For example, this knowledge may be used in applications that allow a user to insert and manipulate new objects such as furniture and decorations into a 2D image, such that they will appear geometrically correct.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] Various features and advantages of the present disclosure will be apparent from the detailed description which follows, taken in conjunction with the accompanying drawings, which together illustrate, by way of example only, features of the present disclosure, and wherein:
[0004] Figure 1 is a schematic diagram of vanishing points in a room according to an example;
[0005] Figure 2 is a schematic diagram of a scene and of selected points within the scene according to an example;
[0006] Figure 3 is a schematic representation of a portion of a room according to an example;
l [0007] Figure 4 is a flowchart of a method for calculating a camera projection matrix according to an example;
[0008] Figure 5 is a schematic block diagram of a system suitable for implementing examples as described herein;
[0009] Figure 6 is a block diagram of a method for calculating a projection matrix according to an example;
[0010] Figure 7a is a schematic block diagram of a system according to an example; and
[0011] Figure 7b is a schematic block diagram of a system according to an example. DETAILED DESCRIPTION
[0012] The present specification relates to a method for estimating 3D structure of a scene from a 2D image. More particularly, the present specification relates to calculating a camera projection matrix that characterizes the 3D structure of a scene or structure in a 2D image using a processor, such as an image processor, that is capable of performing the methods described herein. The calculation uses information that is computed or otherwise provided beforehand to determine the 3D characteristic points, which can be used to refine an initial estimate for a projection matrix. An example of a scene in a 2D image from which the 3D structure may be estimated according to an example includes a man-made indoor scene, such as a room inside a building. Another example of a 2D image may be an outdoor scene including buildings or other man-made objects which have corner points and vanishing points.
[0013] A camera projection matrix, or projection matrix, is typically a 3 x 4 matrix which describes the mapping of points in a 3D world coordinate system to corresponding points on a 2D image, and is typically defined using a pinhole camera model which describes the mathematical relationship between the coordinates of the 3D point and its projection onto the image plane of an ideal pinhole camera, that is, where the camera aperture is described as a point and no lenses are used to focus light. The model provides a first order mapping from the 3D scene to the 2D image, but in the vast majority of vision tasks this proves to be sufficient and so second order effects can be usefully disregarded.
[0014] As used in the present specification and in the claims, the term "vanishing point" is used to broadly describe a point which is the intersection of perspective projections of 3D imaginary parallel lines which relate to parallel lines in a scene. One example of such parallel lines may include the top line of a picture frame and the bottom of the same picture frame. Another example may include the top line of a wall where the wall meets the ceiling in a room, and the bottom line of the wall where the wall meets the floor.
[0015] Also as used herein, the term "main line" describes the border between two main surfaces in the scene. The previous examples of the top and bottom lines of a wall at the ceiling and floor are examples of main lines, and the intersection of two such lines suffices to define the position of a corner point. Main lines also include the borders between walls. Accordingly, a "corner point" is to be broadly understood as an intersection between main lines, such as between two or three main lines for example. A corner point is the intersection between, for example, the top lines of two adjacent walls and the line between the two adjacent walls. Reference herein to a "room corner" should be taken to be synonymous with a corner point in general, including, but not limited to a corner point in an outdoor scene or structure, such as the corner point of a building or other structure for example. That is to say, reference to a room corner herein is provided for the sake of simplifying the present description, and not intended to limit implementations in which data representing any other corner point other than a room corner is available. Accordingly, although certain examples are described with reference to a "room corner", this does not preclude the use of methods and systems as described herein being used to determine a projection matrix for an outdoor scene for example. Furthermore, references to a room, scene or structure can be used interchangeably.
[0016] One way of calculating a projection matrix from a given image is to gather correspondences between 3D world points and 2D image points, and then compute the matrix parameters that support these correspondences. Particular choices of corresponding pairs can be specific points that characterize the 3D structure, such as for example the main vanishing points and room corners in a scene. The main vanishing points in a room are those vanishing points associated with lines that are parallel to the intersections of two of walls, the ceiling and the floor (since these are mutually orthogonal, there may be up to three main vanishing points in room scenes). Corresponding pairs associated with the main vanishing points and room corners can be estimated automatically from a single image, and a projection matrix can be computed when the three main vanishing points for a scene are given, as well as two room corners. In fact, it is sufficient to have information representing the position of one room corner and the distance between two room corners, in addition to the 3 vanishing points. According to an example, it is thus possible to obtain an estimation of the projection matrix for a scene even with a subset of data. This is sufficient to support some practical applications.
[0017] According to an example, a method for calculating a final projection matrix uses prior assumptions in the form of heuristic or initial data about how images of rooms and scenes or structures are captured in order to compensate for any missing or inaccurate correspondence points which would otherwise make the calculation of a camera projection matrix for the room or scene impossible or impractical. For example, assuming the use of a typical focal length for a camera of between 30 to 50 millimeters and that the camera is approximately parallel to the floor at a distance of 3-10m etc. The method can be extended to simultaneously compute vanishing points as well as the projection matrix.
[0001] Reference will now be made in detail to certain implementations, examples of which are illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the implementations. Well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the implementations.
[0002] It will also be understood that, although the terms first, second, etc. can be used herein to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first item could be termed a second item, and, similarly, a second item could be termed a first item and so on. [0018] The terminology used in the description herein is for the purpose of describing particular implementations and is not intended to be limiting. As used in the description and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of the associated listed items. It will be further understood that the terms "includes" and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, integers, steps, operations, elements, components, and/or groups thereof.
[0019] Figure 1 shows a schematic diagram of a basic room 100 as seen from a single perspective inside the room 100. As shown, the perspective of the image is of a camera view directed towards the line of intersection between where the first and second walls 102, 104 join. Because figure 1 is a schematic of a simple room having no contents, the location of the corners 106, 108 and main lines 110, 112, 1 14, 116, 118 in the scene are clearly visible to the human eye, however in real life situations this may not always be the case.
[0020] The vanishing points 120, 122 are beyond the boundaries of the image, and may be determined using image processing techniques. In some examples, the vanishing points 120, 122 may be obtained by any method known in the art. According to one example, the vanishing points 120, 122 may be computed by comparing two columns of an image and finding an affine transformation that maps between similar features in the two columns. The location of the vanishing points can then be directly computed from the parameters of the affine transformation. In the present example, each of the main lines is extended using broken lines 124, with the exception of a first main line 110 where the first and second walls 102, 104 join.
[0021] In the diagram of figure 1 , the image has left and right vanishing points 120, 122. The left vanishing point 120 is the point at which second and third main lines 116, 118 intersect when extended using the broken lines 124. The right vanishing point 122 is the point at which fourth and fifth main lines 112, 114 intersect when extended. As shown, the vanishing points 120, 122 are located outside the image boundaries due to the angle of the main lines from the image perspective.
[0022] The vanishing points 120, 122 may then be used to obtain the corner points 106, 108. The upper corner point 106 is located at the intersection of the first, second, and fourth main lines 110, 112, 116, which is where the ceiling 126 meets both the first and second walls 102, 104. The lower corner point 108 is located at the intersection of the first, third, and fifth main lines 1 0, 114, 1 18, which is where the floor 128 meets both the first and second walls 102, 104. Each of the corner and vanishing points is a characteristic point that is helpful to understanding the 3D structure of the image.
[0023] The schematic of figure 1 is thus a 2D representation of a 3D scene, in which there is a mapping of the coordinates of a 3D point in the scene to the 2D image coordinates of the point's projection onto the image plane of a camera (or other suitable imaging device) which captured the scene. As such, the diagram of figure 1 will generally be an image captured using a camera whose camera projection matrix is that which is desired. There are thus two coordinate systems being considered: one related to the real world (3D) scene and one for the captured image of the scene (2D). In general, it may not be possible to obtain data representing the position of the three vanishing points and two room corners in a scene such as that depicted in figure 1. According to an example, the position of at least one point in a room can be used in order to derive a camera projection matrix, and user input can be sought to mark or otherwise derive such a position. The location of the corner of a room is the simplest point in an image for a user to detect and mark. However, any other point can be used. For example, a point on a wall that is 1 meter left to the corner, and 1 meter higher for example (that is to say, a point on a specific place on the wall) can be used. Such a point gives a correspondence between the 2D image coordinates, and the 3D world coordinates. However, a corner point is much easier to detect by eye, and therefore this is used as a correspondence point according to an example.
[0024] A corner point can be marked by a user, or derived from other markings which give lines whose intersection when extended is a corner point. In order to detect a corner point, four points can be provided (eg marked by a user) such as points which are at the intersection between walls and the floor, two on the left of a room and two on the right of a room for example. Each pair of points on either side of the room defines a line, and the intersection of the two lines when projected defines a corner point. In order to augment the data representing a corner point, vanishing points can be used for the calculation of the camera projection matrix, and can be extracted using an automatic procedure (without human intervention), such as using a process as described below.
[0025] According to an example, it is assumed that at the time an image of a scene was captured the camera is located at a point C in the world coordinate system, and that it is rotated with respect to that system by a rotation matrix R. Also, it is assumed that the focal distance of the camera is f and that the optical center (the point where the principal axis of the camera intersects the image plane) has the coordinates (px, py) on the image coordinate system.
[0026] Under these assumptions, a point (Χ,Υ,Ζ) in the real world is mapped by the camera to the point (u,v) in the image, according to the following homogeneous equation:
Figure imgf000008_0001
where the 3x4 matrix P is the camera projection matrix, and is given by:
Figure imgf000008_0002
The projection matrix P thus has 9 free parameters: f , px, py, 3 coordinates for C, and 3 free parameters for R (R is a rotation matrix defined by 3 angles; its columns are unitary and mutually orthogonal). An approach for inferring the projection matrix parameters from a given picture may include identifying at least five points of known coordinates in the real world in the image. For example, let {(ui.v,)} be a set of N (equal to or larger than five) points in an image, and {(Χ,,Υί,Ζί)} be the corresponding set of coordinates in the real world, that is the coordinates of the real world points in coordinate system of the 3D scene. Then, a system of 2N equations and nine variables can be obtained, given by:
Figure imgf000009_0001
where:
Figure imgf000009_0002
[0027] A particular case of point correspondence is that of main vanishing points. A main vanishing point is obtained by right multiplying P by one of the three vectors (1 ,0,0,0), (0,1 ,0,0), and (0,0,1 ,0), which are the homogeneous coordinates of the points where parallel lines intersect on each of the three main directions X, Y, and Z in a scene.
[0028] When the three main vanishing points are provided, parameters f, px, py, and the rotation matrix R can be computed. An additional point allows the computation of homogeneous coordinates for the camera position C, i.e., the parameters C1/C3 and C2/C3. This provides a projection matrix with an undefined scaling parameter. In order to obtain the full projection matrix without any scaling uncertainty the dimensions of a known object in an image can be identified. Similarly, if two vanishing points are provided as well as the optical center (px, py), then the parameters f, R, and the last vanishing point, can be computed. As before, an additional correspondence pair gives a projection matrix with scaling uncertainty, and the identification of a known object in the image eliminates the uncertainty.
[0029] According to an example, the available camera parameters from which a camera projection matrix can be computed are gathered in a vector, S = {s1 ( ... , s9}. The vector provides an initial start point for a calculation of a camera projection matrix. Here, {s,, s2, s3} are the three angles that define the rotation matrix R, where s, is associated with an angle of rotation around the person (or tripod) holding the camera, i.e. an angle of rotation in the horizontal plane such as yaw, s2 is associated with an angle of rotation around the camera view axis, such as roll, and s3 is associated with the pitch of the camera. The parameters {s„, s5) are the image coordinates of the principal point, (p„, py), s6 is the focal distance f, and {s7, s8, s8} are the world coordinates of the camera C, where - relative to the position of the camera - s7 is the distance to the left wall, s„ is the height from the floor, and s9 is the distance from the right wall in a scene being considered.
[0030] According to an example, a measure Prob(S) is the probability that a given vector of parameters reflects the projection matrix associated with a real-life image of a room. The measure is calculated using heuristics and a Gaussian model. That is, the desired likelihood is approximated by defining that:
Figure imgf000010_0001
where p is a normalization constant, sn is the expected value of sn, ση are the standard deviations of sn that reflect a probable range of values for that variable, and the {μ„} are constants associated with each s„ and that transform all variables to a single, comparable scale, and which reflect the amount of confidence for each of the prior defined values sn. While actual values for these quantities can be estimated by gathering data over a large dataset, heuristic values for these variables are used according to an example. It is assumed that the camera is likely to be parallel to the floor, that the image of a scene is uncropped (so that the principal point is at the center of the image for example), that the 35mm-equivalent of the focal distance is 40mm, that the camera is approximately 1.60m from the level of the floor of the scene, and is equidistant to the room walls at a distance of 5 meters. The table below provides a listing of suitable heuristic values according to an example. It will be appreciated that other values can be chosen depending on the circumstances and on the nature of the scene being considered. Using the values given below, the vector S is a starting point for the estimation of a camera projection matrix, which according to an example, is refined using data gathered from an image of a scene, and more specifically data representing the correspondence between points in an image to certain known points in the 3D scene.
Figure imgf000011_0001
[0031] A vector of measurements M is provided according to an example which is a collection of correspondence points; i.e., a set of points in the image of a scene where each of the 3D world coordinates are known or otherwise defined. According to an example, the set M includes all, some or none of the main vanishing points and room corners of the scene. According to an example, data corresponding to one position in a room, such as a room corner for example, but more generally, any point in a room where the world coordinates are known so that a correspondence into the 2D image coordinate system can be obtained is provided. More specifically, the vector M includes image points {p'j}, j = 1 ,2, . . ., for which the world coordinates {pw,} are given. Any subset of the following may be included according to an example (provided in homogeneous coordinates):
- Vanishing points mapping (1 ,0,0,0), (0,1 ,0,0) and (0,0,1 ,0) to the estimated three vanishing points.
- The floor corner (0,0,0,1 ) mapping to the estimated floor corner in the image coordinate system.
- The ceiling corner of the room, (0,0,h,1 ), mapping to the estimated ceiling corner in the image coordinate system, where h is set to be a typical room height (e.g., 2.90m).
[0032] In general, to allow the scene and objects therein to be manipulated, the position of a room corner is provided, with a floor corner being suitable when objects such as floor standing furtniture are to be inserted into an image of a room. If the actual room is not of height h, the computed scale factor of the resulted projection matrix will be affected. The more points {p'j} for which the world coordinates are known, the more accurate the estimation of the camera projection matrix. Accordingly, it is desirable to know the position of a room corner and at least one vanishing point. According to an example, user input is used in order to define or otherwise determine the position of a corner in a scene. A user can mark a corner position in an image, and the mark can be used to determine the image coordinates corresponding to the room corner. Alternatively, if a corner is not visible, a user can mark points at the intersection of the walls and floor (or ceiling) of the room. The points can then be used to. extend lines whose intersection is a room corner.
[0033] According to an example, and in order to increase the accuracy of a calculated projection matrix, main vanishing points can be detected using a number of techniques which do not call for user input. As described, techniques can involve comparing multiple columns of an image and finding an affine transformation that maps between similar features in the two columns, using a Hough transform approach for example, where the similar features generally occur at the intersection of walls, floors and ceiling. The location of the vanishing points can then be directly computed. Alternatively, the fact that man-made scenes typically include regular features or textures can be used. Typically, the regularity is used in order to determine a measure of similarity from which scale and displacement between features can be determined and thus used to provide a measure for the vanishing points in a scene as will be described below.
[0034] Now, given the vector including the initial starting point for the parameters of a desired projection matrix, and the vector including data representing the correspondence between image points and real world points, it is possible to maximize Prob(S\M), i.e., to find the parameter vector S that is the most likely for a given set of measurements M. A Bayesian approach is used in an example, such that maximization of Prob(S\M) is equivalent to the maximization of Prob(M\S)Prob(S). According to an example the value of Prob(M\S) is defined according to:
Figure imgf000013_0001
where η is a normalization constant, σ is a standard deviation due to measurement inaccuracy, and Ps(pwi) is the perspective projection (w.r.t. to the parameter vector S) of the point p™. This is given by the relationship:
Figure imgf000013_0002
where Ps is the projection matrix associated with S, and F(p) is the transformation of the point p from homogeneous coordinates to normal ones, such that:
Figure imgf000013_0003
The 2-norm provides a measure of the 'distance', or similarity, of a measured world point which has been transformed into image coordinates using a projection matrix based on a parameter set S to the measured value for a corresponding image point. According to Bayesian derivation techniques, the maximization of Prob(S\M) is in fact equivalent to the minimization of the function e(S) = -log [Prob(M|S)Prob(S)j, which is equal to:
Figure imgf000013_0004
[0035] According to an example, a numeric optimization function can be used to solve for f(S). Typically, such functions proceed by having a scalar function of a number of variables defined, which in the present example is the set of variables for the camera projection matrix, and which thus provides an initial estimate. The scalar function acts like a black box that returns a value for each selection of a set of variables. Given the initial set of variables, such as those given in the table above, the minimization technique samples the search space around the initial set to find the set of variables that result in a local minimum. Any such suitable unconstrained nonlinear optimization technique can be used. The search space can be confined for each parameter within a range determined using the value for the standard deviation given above for example.
[0036] The first part of the expression above for e(S) describes how the s„ should seek to obey the input data, that is the set of image points in vector M. The second part of the expression describes how the sn are favoured to be closer to the expected value of sn. The 2-norm function is thus effectively deriving a measure of similarity between the estimated projection matrix given . the input data set and taking into account the set of provided correspondence points.
[0037] According to an example, a search space for the sn includes a set of values which are used in order to determine a minimum value for E(S). The search space can be minimized by taking into account some simple factors stemming from the geometry of a typical room and the use of a camera. For example, the search space for a camera pitch angle can be limited to an angle in the range — | < s3 < so as to preclude the situation in which a camera is facing in a direction (towards the floor or ceiling) which would mean that it was unable to capture an image of the room or scene in question. Similarly pragmatic choices can be made for other ones of the parameters, such as those corresponding the standard deviation of the parameters given in the table above for example.
[0038] According to an example, vanishing points or room corners of a room can be determined either manually (such as by a user marking points on the walls in an image), or automatically using techniques such as those described above, ahd added as part of the initial data set M in order to then determine the parameter set S for the camera projection matrix.
[0039] Figure 2 is a schematic diagram of a scene and of manually selected points within the scene according to an example. The scene of figure 2 is an internal room 200 in which the floor, ceiling and two walls of the room are within a field of view of a camera whose camera projection matrix is desired. According to an example, a user marks 6 points (shown by the black stars) along the main lines 201 , 202 of the room, which form the intersection between the ceiling 203 and the two walls 204, 205, and on main lines 207, 208 between the floor 206 and the two walls 204, 205. Such marks are depicted by points 209-214. Marks 210, 213 are provided at the visible corners of the room 200. Using the marks, the position of two vanishing points and the two room corners can be determined by, for example, projecting lines along the marks so that the lines intersect - the points at which the projected lines intersect gives the positions for the VPs and room corners. Such line projections are shown as white lines in figure 2. Note that the exact position of the points 209-214 (which are not to scale) is not significant, and need not be at the exact positions specified - i.e. the position of the points can vary along their respective lines providing that they are positioned to enable the desired data to be obtained.
[0040] The determined VPs and room corner position are input to vector M. According to an example, it is possible to obtain a suitable measure for the camera projection matrix with data for the vector M including one corner point. The addition of data representing the position of, one vanishing point (left or right) improves the measure.
[0041] Under certain circumstances, it may not be possible to provide an indication of the location of certain priors, such as the vanishing points or room corners for example. Therefore, according to an example, the processes of calculating the position of the main vanishing points and extracting the projection matrix are combined. The approach can enable noisy, inaccurate and even incorrect data to be dealt with. As mentioned above, the estimation of the main vanishing points can typically proceed by using similar patches detected in an image. This is driven by the assumption that such similar patches likely represent object edges or planes that are parallel to one of the three main axis (for example, edges of furniture parallel to the walls). The lines linking pairs of similar patches are termed segments. Note that segments could be related to existing object edges within the image, but might also be related to perceptual lines that do not actually exist within the image.
[0042] According to an example, data from similar patches can thus be used to jointly compute vanishing points in an image as well as the camera projection matrix. This can increase the accuracy of vanishing point detection, as well as reducing processing time. Figure 3 is a schematic representation of a portion of a room according to an example. A wall 300 includes a repeating pattern, such as wallpaper for example. Automatic detection of vanishing points in such a scene can be provided by detecting pairs of similar image patches within the image portion and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image. As can be appreciated from figure 3, the wall 300 exhibits a global self-similarity property, which can be used to estimate a VP. In this context, image patches are small neighbourhoods of pixels around a pixel of interest. This is illustrated in figure 3 by boxes 301 on the wall 300, each pair thereof being joined by a dotted projection line 302 which crosses with the other projection lines at the VP 303.
[0043] It will be appreciated that a set of straight virtual line segments that connect pairs of matching image patches are concurrent and converge at the VP. As used herein, the term 'virtual line' is a line constructed or projected through matching (or similar) patches in an image. A virtual line may coincide with a true straight line in the image but, equally may not coincide with any discernable line, straight edge or linear feature in the image. In effect, the process of obtaining a single VP from a global 2D self-similarity can be viewed as equivalent to clustering a large collection of VP candidates, each obtained from either the meeting point of virtual lines connecting matching points as described above, or equivalently obtained by a 1 D-affine similarity between a pair of parallel 1 D image profiles. Following this view, it is possible to generalize a self similarity approach for detecting multiple VPs located anywhere in an image plane; even when the VPs are not within the image area.
[0044] Accordingly, the inputs to the minimization process are line segments {Sj}Ki=i from the image, in addition to pairs of corresponding points that do not relate to vanishing points (e.g., one or both room corners). That is to say, the measurement set M includes
{Si}K i= as well as image points {p'j}, j = 1 ,2 for which the world coordinates {pWj} are given.
[0045] According to an example, it is initially assumed that an assignment of the segments to the corresponding VPs is provided such that {ak}Kk=i , where ak e {1 ,2,3}, i.e. which of the three main vanishing points each of the segments is pointing to is known. Then, the following Gaussian model for Prob(M\S) associated to the new measurement vector M is defined as:
Figure imgf000017_0001
where 77 is a normalization constant, σ and Ps(p™) are as before, v is the standard deviation due to measurement inaccuracy, and V s is the d'th vanishing point, as induced from the parameter vector S. The function C is a similarity function given by:
Figure imgf000017_0002
where a (S) is the angle between the segment sk and the line that connects the d'th vanishing point (induced by S) and the middle point in the segment sk. Thus, the projection matrix that best aligns with the input data can be computed by solving:
Figure imgf000017_0003
According to an example, a numeric optimization function can be used to solve for e(S), as described above.
[0046] Assuming that the projection matrix is given, a new assignment of the segments to their corresponding VPs can be obtained by assigning each segment to the closest VP, i.e.,
Figure imgf000017_0004
As neither the projection matrix nor the correct assignment of segments is known, an iterative procedure that estimates both is used according to an example. Firstly, an initial assignment of segments is set. Such an assignment can be determined by using the proximity of segments to the VPs inferred from the default projection matrix. Assuming this assignment is fixed, the projection matrix is estimated using the above equation for e(S). Given the updated projection matrix the segments are then reassigned as given above. This is repeated until convergence is reached.
[0047] Figure 4 is a flowchart of a method for calculating a projection matrix according to an example. An image of a room in the form of image data 400 is presented to a user in block 402, such as being presented on a display of a system for example. The user is able to interact with the image using the system in order to select points on the image. According to an example, if a lower corner point of the room (or scene) is visible in block 403, a user marks a point on the image corresponding to the lower corner point of the room at block 404. If the lower corner point is not visible in block 403, the user can select an upper corner point at block 405. Alternatively, at block 406, the user can mark points on the image at the intersection of the walls and ceiling or at the walls and floor of the room. For example, a user can mark two points on the right side of the room at a wall and floor or wall and ceiling intersection, followed by two points on the left side of the room at a wall and floor or wall and ceiling intersection. Using the marks at the intersection, line intersections are calculated in block 407, and the intersection is the corner of the room (408).
[0048] At block 409, the data relating to the room corner is used to calculate a camera projection matrix for the room including using data 410 representing the position of vanishing points which have been determined automatically using a process as described above. If the calculation fails, or the results are not reasonable, the user can be prompted to input more data in the form of any of (if not already obtained):
- 2 intersection points between left wall and floor;
2 intersection points between right wall and floor;
- 2 intersection points between left wall and ceiling;
- 2 intersection points between right wall and ceiling.
[0049] Using this information vanishing points can be computed by determining the intersection points of lines derived from the points (that is to say, each pair of points define a line, with the first and third lines intersecting at one vanishing point and the second and forth lines intersecting at another vanishing point). According to an example, the order of the points marked by a user is not significant, but determining what a user marks is - that is, whether it is a bottom right point or an upper left point etc that is marked, so that it can be determined which two points should be used to draw lines to derive a corner point. The provision of whether a bottom or top corner is favoured depends on the application. In most cases the camera projection matrix calculated according to an example is used to virtually 'insert' or 'plant' things (like furniture) in a room. Those things are more likely to be inserted on the floor, and therefore if the bottom corner is marked or derived it gives an accurate match to the floor as mentioned above. If the application is for the virtual painting of a ceiling for example, the upper corner could be a more sensible choice.
[0050] Figure 5 is a schematic block diagram of a system 500 that can implement any of the examples described herein. The system 500 includes a processing unit 501 (CPU), which can be an image processor according to an example, a system memory 503, and a system bus 505 that couples the processing unit 501 to the various other components - of the system 500. The processing unit 501 typically includes one or multiple processors, each of which may be in the form of any one of various commercially available processors for example. The system memory 503 typically includes a read only memory (ROM) that stores a basic input/output system (BIOS) that contains startup routines for the system 500 and a random access memory (RAM). The system bus 505 may be a memory bus, a peripheral bus or a local bus, and may be compatible with any of a variety of bus protocols, including PCI(e), VESA, MicroChannel, ISA, and EISA. The system 500 also includes a persistent storage memory 507 (e.g., a hard drive (HDD), a floppy disk drive, a CD-ROM drive, magnetic tape drives, flash memory devices, and digital video disks) that is connected to the system bus 505 and contains one or multiple computer-readable media disks that provide non-volatile or persistent storage for data, data structures and machine readable or computer-executable instructions. According to an example, digital image data 400 and initial values for parameters of a camera projection matrix 520 can be stored in memory 507.
[0051] A user may interact (e.g., enter commands or data) with system 500 using input devices 509 (e.g., a keyboard, a computer mouse, a microphone, joystick, and touch pad or touch sensitive display screen). Information may be presented through a user interface that is displayed to a user on the display 511 (implemented by, e.g., a display monitor which can be touch sensitive, including a capacitive, resistive or inductive touch sensitive surface for example), and which is controlled by a display controller 513 (implemented by, e.g., a video graphics card). Accordingly, a user can be presented with an image of a room representing image data 400, and can mark points in the image using an input device 509 of the system.
[0052] The system 500 can also typically include peripheral output devices, such as speakers and a printer for example. A remote computer may be connected to the system 500 through a network interface card (NIC) 515. For example, the system 500 can include provision to send data to a remote location where certain items are prepared for a user (such as printed merchandise etc). As shown in figure 5, the system memory 503 also stores processing information 517 that includes input data, processing data, and output data. In some examples, the system can interface with a graphics driver to present a user interface on the display 511 for managing and controlling the operation of the system 500, such as for marking room positions for example.
[0053] Using data for initial parameters of a projection matrix (sn), and data representing the position of a room corner and vanishing points for a room or scene, the system can calculate a projection matrix in block 520 of system memory 503. According to an example, image data 400 is used by system 500 to calculate vanishing points using segments as described above.
[0054] Figure 6 is a block diagram of a method for calculating a projection matrix according to an example. More specifically, for image data 600 representing an image of a scene for which the projection matrix is desired, a position of a vanishing point in an image plane of the image is determined in block 601. In block 603 a set of initial input data representing multiple initial measures for at least one parameter of the projection matrix and a position in the image for a corner point of the scene are received. In block 605 a function dependent on the measures is minimized Using the position, and a set of final parameters for the projection matrix is calculated in block 607.
[0055] Figure 7a is a schematic block diagram of a system according to an example. An image processor 701 receives image data 700 representing an image of a scene or structure. A projection matrix for the scene or structure is calculated in block 702. More specifically, the image processor 701 , coupled to memory 703 calculates values for the parameters of the projection matrix 702 by minimizing the function 704 in response to the receipt of data representing a corner point 705 of the scene or structure.
[0056] Figure 7b is a schematic block diagram of a system according to an example. An image processor 801 receives image data 800 representing an image of a scene or structure. A projection matrix for the scene or structure is calculated in block 802. More specifically, the image processor 801 , coupled to memory 803 calculates values for the parameters of the projection matrix 802 by minimizing the function 804 in response to the receipt of data representing a corner point 805 of the scene or structure.
[0057] A projection matrix 520, 702, 802 can be used to augment the scene which it has been determined for to position items so that the perspective of the scene is obeyed. For example, in a room, furniture can be added to the room in order to determine a desired placement and/or orientation. Similarly, for an outdoor scene, items can be placed into the scene which obey the perspective of the scene.
[0058] Accordingly, an initial measure for a projection matrix of a room, scene or structure can be determined using a set of initial, heuristic, parameter values. The initial measure can be refined using the provision of correspondence points in an image of the room, scene or structure and the corresponding 3D world points. For example, a corner point can be manually marked by a user, or derived using the intersection of lines from points on the image. Vanishing points, which can further be used to refine the measure, can be derived using the techniques as described above and incorporated into the calculation of the parameters.

Claims

CLAIMS What is claimed is:
1. An image processor to receive image data representing an image of a three-dimensional structure and to calculate a projection matrix from the image data by receiving a set of initial input data representing heuristic measures for parameters of the projection matrix, and a position in the image for a corner point of the structure, and to minimise a function dependent on the measures using the position to calculate the projection matrix.
2. An image processor as claimed in claim 1 , further operable to calculate a vanishing point for the structure by detecting pairs of similar image patches within the image and identifying a concurrent set of straight virtual lines that substantially converge at a point on an image plane, each line passing through a pair of similar image patches within the image.
3. An image processor as claimed in claim 2, wherein in minimising the function, the processor is operable to use the calculated vanishing point and the position of the corner point to calculate the projection matrix for the image.
4. An image processor as claimed in claim 1 , wherein in receiving data for a position in the image for a corner point, the image processor is operable to receive data input by a user representing a user marked position in the image corresponding to the position of a corner point in the structure.
5. An image processor as claimed in claim 1 , wherein in calculating the projection matrix the image processor is operable to receive data representing multiple marked points on main lines of the image and to use the marked points to calculate the position of a corner point of the structure by projecting lines from the marked points.
6. An image processor as claimed in claim 1 , further operable to use the projection matrix to insert an image of an object into the image of the three-dimensional structure.
7. An image processor as claimed in claim 1 , further operable to use the projection matrix to render the image of the three-dimensional structure from a viewpoint different to a viewpoint which was used to capture the image of the structure.
8. A method for calculating a projection matrix for a two-dimensional image of three- dimensional scene, including:
determining a position of a vanishing point in an image plane of the image; and receiving a set of initial input data representing multiple initial measures for at least one parameter of the projection matrix and a position in the image for a corner point of the scene; and
minimising a function dependent on the measures using the position to calculate a set of final parameters for the projection matrix.
9. A method as claimed in claim 8, wherein determining a position of a vanishing point includes detecting pairs of similar image patches within the image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
10. A method as claimed in claim 8, wherein receiving data for a position for a corner point includes receiving data representing the position of multiple points at an intersection of main lines in the image, and using the points to calculate the position of the corner point using an intersection of lines projected through the points.
11. A method as claimed in claim 8, further comprising using the projection matrix to insert an image of an object into the image of the scene.
12. A method as claimed in claim 8, further comprising using the projection matrix to render the scene from a viewpoint different to a viewpoint which was used to capture the image of the scene.
13. A computer-readable medium storing computer-readable program instructions arranged to be executed on a computer system, the instructions comprising:
to receive digital image data representing an image of a scene;
to receive a set of input parameters for a projection matrix;
to calculate an initial estimate of the projection matrix using the input parameters; and to process the initial estimate using measures for the position of a corner point and a vanishing point of the scene to give a final projection matrix.
14. The computer-readable medium of claim 13, further comprising instructions:
to determine the position of a vanishing point in the image by receiving user input representing the position of multiple points on the image;
to receive user input representing the position of a corner point on the image.
15. The computer-readable medium of claim 13, further comprising instructions: to insert an image of an object into the image of the scene using the final projection matrix to determine a geometry for the object.
PCT/US2010/050944 2010-09-30 2010-09-30 Projection matrix WO2012044308A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2010/050944 WO2012044308A1 (en) 2010-09-30 2010-09-30 Projection matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2010/050944 WO2012044308A1 (en) 2010-09-30 2010-09-30 Projection matrix

Publications (1)

Publication Number Publication Date
WO2012044308A1 true WO2012044308A1 (en) 2012-04-05

Family

ID=45893480

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/050944 WO2012044308A1 (en) 2010-09-30 2010-09-30 Projection matrix

Country Status (1)

Country Link
WO (1) WO2012044308A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013163579A3 (en) * 2012-04-27 2014-01-16 Adobe Systems Incorporated Automatic adjustment of images
WO2014062293A1 (en) * 2012-10-16 2014-04-24 Qualcomm Incorporated Sensor calibration and position estimation based on vanishing point determination
EP2779102A1 (en) * 2013-03-12 2014-09-17 E.sigma Systems GmbH Method of generating an animated video sequence
CN106990663A (en) * 2017-06-05 2017-07-28 电子科技大学中山学院 A kind of three-dimensional house type projection arrangement of portable and collapsible
KR20200025238A (en) * 2018-08-29 2020-03-10 한국전자통신연구원 Image generating apparatus, imaging system including image generating apparatus and operating method of imaging system
CN112669388A (en) * 2019-09-30 2021-04-16 上海禾赛科技股份有限公司 Calibration method and device for laser radar and camera device and readable storage medium
CN113012226A (en) * 2021-03-22 2021-06-22 浙江商汤科技开发有限公司 Camera pose estimation method and device, electronic equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6094619A (en) * 1997-07-04 2000-07-25 Institut Francais Du Petrole Method for determining large-scale representative hydraulic parameters of a fractured medium
US20040095385A1 (en) * 2002-11-18 2004-05-20 Bon-Ki Koo System and method for embodying virtual reality
KR20060007815A (en) * 2004-07-22 2006-01-26 학교법인 중앙대학교 Reconstruction method of parametrized model
US20100118360A1 (en) * 2007-03-15 2010-05-13 Seereal Technologies S.A. Method and Device for Reconstructing a Three-Dimensional Scene with Corrected Visibility

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6094619A (en) * 1997-07-04 2000-07-25 Institut Francais Du Petrole Method for determining large-scale representative hydraulic parameters of a fractured medium
US20040095385A1 (en) * 2002-11-18 2004-05-20 Bon-Ki Koo System and method for embodying virtual reality
KR20060007815A (en) * 2004-07-22 2006-01-26 학교법인 중앙대학교 Reconstruction method of parametrized model
US20100118360A1 (en) * 2007-03-15 2010-05-13 Seereal Technologies S.A. Method and Device for Reconstructing a Three-Dimensional Scene with Corrected Visibility

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9582855B2 (en) 2012-04-27 2017-02-28 Adobe Systems Incorporated Automatic adjustment of images using a homography
CN105453131A (en) * 2012-04-27 2016-03-30 奥多比公司 Automatic adjustment of images
WO2013163579A3 (en) * 2012-04-27 2014-01-16 Adobe Systems Incorporated Automatic adjustment of images
GB2516405A (en) * 2012-04-27 2015-01-21 Adobe Systems Inc Automatic adjustment of images
US9008460B2 (en) 2012-04-27 2015-04-14 Adobe Systems Incorporated Automatic adjustment of images using a homography
US9098885B2 (en) 2012-04-27 2015-08-04 Adobe Systems Incorporated Camera calibration and automatic adjustment of images
GB2516405B (en) * 2012-04-27 2016-06-15 Adobe Systems Inc Automatic adjustment of images
US9729787B2 (en) 2012-04-27 2017-08-08 Adobe Systems Incorporated Camera calibration and automatic adjustment of images
US9519954B2 (en) 2012-04-27 2016-12-13 Adobe Systems Incorporated Camera calibration and automatic adjustment of images
US9361688B2 (en) 2012-10-16 2016-06-07 Qualcomm Incorporated Sensor calibration and position estimation based on vanishing point determination
US9135705B2 (en) 2012-10-16 2015-09-15 Qualcomm Incorporated Sensor calibration and position estimation based on vanishing point determination
EP3012804A1 (en) * 2012-10-16 2016-04-27 Qualcomm Incorporated Sensor calibration and position estimation based on vanishing point determination
WO2014062293A1 (en) * 2012-10-16 2014-04-24 Qualcomm Incorporated Sensor calibration and position estimation based on vanishing point determination
EP2779102A1 (en) * 2013-03-12 2014-09-17 E.sigma Systems GmbH Method of generating an animated video sequence
CN106990663A (en) * 2017-06-05 2017-07-28 电子科技大学中山学院 A kind of three-dimensional house type projection arrangement of portable and collapsible
KR20200025238A (en) * 2018-08-29 2020-03-10 한국전자통신연구원 Image generating apparatus, imaging system including image generating apparatus and operating method of imaging system
KR102591672B1 (en) 2018-08-29 2023-10-20 한국전자통신연구원 Image generating apparatus, imaging system including image generating apparatus and operating method of imaging system
CN112669388A (en) * 2019-09-30 2021-04-16 上海禾赛科技股份有限公司 Calibration method and device for laser radar and camera device and readable storage medium
CN113012226A (en) * 2021-03-22 2021-06-22 浙江商汤科技开发有限公司 Camera pose estimation method and device, electronic equipment and computer storage medium

Similar Documents

Publication Publication Date Title
CN110568447B (en) Visual positioning method, device and computer readable medium
KR101638378B1 (en) Method and program for modeling 3-dimension structure by 2-dimension floor plan
US9542745B2 (en) Apparatus and method for estimating orientation of camera
US11645781B2 (en) Automated determination of acquisition locations of acquired building images based on determined surrounding room data
US9996974B2 (en) Method and apparatus for representing a physical scene
US9420265B2 (en) Tracking poses of 3D camera using points and planes
WO2012044308A1 (en) Projection matrix
US20110110557A1 (en) Geo-locating an Object from Images or Videos
US20170214899A1 (en) Method and system for presenting at least part of an image of a real object in a view of a real environment, and method and system for selecting a subset of a plurality of images
JP6609640B2 (en) Managing feature data for environment mapping on electronic devices
JP6464938B2 (en) Image processing apparatus, image processing method, and image processing program
WO2014144408A2 (en) Systems, methods, and software for detecting an object in an image
US9183635B2 (en) Method for reconstructing 3D lines from 2D lines in an image
CN105094335A (en) Scene extracting method, object positioning method and scene extracting system
EP3304500A1 (en) Smoothing 3d models of objects to mitigate artifacts
CN112083403A (en) Positioning tracking error correction method and system for virtual scene
CN103793680B (en) Device and method for estimating head pose
WO2014203743A1 (en) Method for registering data using set of primitives
US20220130064A1 (en) Feature Determination, Measurement, and Virtualization From 2-D Image Capture
US11935286B2 (en) Method and device for detecting a vertical planar surface
JP7290602B2 (en) LEARNING DEVICE, METHOD AND PROGRAM
WO2024095744A1 (en) Information processing device, information processing method, and program
CN116124120A (en) Method and apparatus for pose estimation
Ma et al. Door and window image-based measurement using a mobile device
CN117057086A (en) Three-dimensional reconstruction method, device and equipment based on target identification and model matching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10857996

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10857996

Country of ref document: EP

Kind code of ref document: A1