WO2018099556A1

WO2018099556A1 - Image processing device and method for producing in real-time a digital composite image from a sequence of digital images of an interior of a hollow structure

Info

Publication number: WO2018099556A1
Application number: PCT/EP2016/079323
Authority: WO
Inventors: Tobias Bergen; Michaela Benz; Andreas Ernst; Thomas Wittenberg; Christian MÜNZENMAYER; Frederik ZILLY; Malte Avenhaus
Original assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2018-06-07
Also published as: EP3549093A1

Abstract

Image processing device for producing in real-time a digital composite image from a sequence of digital images of an interior of a hollow structure recorded by an endoscopic camera device so that the composite image has a wider field of view than the images of the sequence of images, the image processing device comprising: a selecting unit, a key point detection unit, a transforming unit and a joining unit, wherein the transforming unit comprises a key point matching unit configured for determining key point pairs, wherein the transforming unit comprises a transformation determination unit configured for determining a transformation for transforming a further image into a global coordinate system, wherein the transformation for transforming the further image into the global coordinate system is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius transformation, wherein parameters of the transformation for transforming the further image into the global coordinate system are determined from at least some of the key point pairs, and wherein the transforming unit comprises a transforming execution unit configured for transforming the further image into the global coordinate system by using the transformation for transforming the further image into the global coordinate system in order to produce a transformed further image.

Description

Image processing device and method for producing in real-time a digital composite image from a sequence of digital images of an interior of a hollow structure

Description

The present invention relates to real-time digital image processing. Digital image stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high- resolution composite image.

More specific, the invention relates to producing in real-time a digital composite image from a sequence of digital images of an interior of a hollow struc- ture recorded by an endoscopic camera device.

In general, the shape of hollow structures may be approximated better by a sphere than by a plane. Thus, spherical image stitching algorithms seem to be more appropriate than planar image stitching algorithms in case that a digital composite image needs to be produced from a sequence of digital images of an interior of a hollow structure recorded by an endoscopic camera device.

Two models have been suggested for spherical image stitching. Can et al. developed a stitching algorithm for the human retina which is based on a quadratic motion model [1]. The authors assume that the retina is almost spherical so that a 3-dimensional quadric surface model may be used. Furthermore, Can et al. presume the camera to view the retina almost perpendicularly at all times, so perspective distortion can be neglected. The result- ing quadratic motion model is described by a 2 ^χ 6 matrix to relate corresponding global key points in a reference image and local key points in a further image to be stitched to the global image. The quadratic model can be considered as an adaption of the affine motion model to quadric surfaces. Another approach is proposed by Shashua and Toelg [2] as well as Shashua and Wexler [3], who consider the theoretical basis for a mapping between two perspective views of a quadric surface. They formulate a motion model with 21 unknowns and propose an algorithm to obtain the transformation from 9 key point correspondences. Later, Raskar et al. applied this model to register the images of two projectors displaying on a common curved screen. They also propose a simplified formulation for the mapping between two corresponding pixel positions. Using the simplified formulation, it is possible to re duce the number of unknowns from 21 to 17 [4, 5]. This approach implicitly creates a projective reconstruction of the 3-dimensional quadric using epipo- lar geometry. In this respect, building a spherical stitching algorithm around this method resembles a projective reconstruction method of a quadric from a set of perspective views.

Several problems arise from the methods mentioned above: First, the high number of parameters adds complexity to the problem, making it difficult to develop a robust stitching algorithm. Second, there is no self- evident way of refining the quadric scene constraint to a spherical one, since the reconstruction itself is a projective reconstruction, i.e. determined up to an unknown global projective transform [6]. Finally, in medical endoscopy, such as cystoscopy, the expected field of view is small in the sense that the spherical patch visible in one image is nearly planar. This makes the inherent estimation of the fundamental matrix unstable. As a consequence, incrementally stitching further images using such methods do not work robustly.

An object of the present invention is to provide an improved image pro- cessing device for producing in real-time a digital composite image from a sequence of digital images of an interior of a hollow structure recorded by an endoscopic camera device.

This object is achieved by an image processing device for producing in real- time a digital composite image from a sequence of digital images of an interior of a hollow structure recorded by an endoscopic camera device, in particular of an interior of a hollow organ, such as an urinary bladder, recorded by an medical endoscopic camera device, so that the composite image has a wider field of view than the images of the sequence of images, the image processing device comprising: a selecting unit configured for selecting a reference image and a further image from the sequence of images, wherein the reference image is specified in a global coordinate system of the composite image as a stereographic projection of a part of the interior of the hollow structure in a complex plane, wherein the further image is specified in a local coordinate system of the further image as a projection of a further part of the interior of the hollow structure in a projective space, and wherein the further image is overlapping the reference image; a key point detection unit configured for detecting global key points in the reference image and for detecting local key points in the further image; a transforming unit configured for transforming the further image into the global coordinate system based on the global key points and based on the local key points in order to produce a transformed further image, wherein the transforming unit comprises a key point matching unit configured for determining key point pairs, wherein each of the key point pairs comprises one global key point of the global key points and one local key point of the local key points, wherein the global key point and the local key point of each of the key point pairs correspond to a same feature of the hollow structure, wherein the transforming unit comprises a transformation determination unit configured for determining a transformation for transforming the further image into the global coordinate system, wherein the transformation is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius transformation, wherein parameters of the transformation for transforming the further image into the global coord i- nate system are determined from at least some of the key point pairs, wherein the transforming unit comprises an transforming execution unit configured for transforming the further image into the global coordinate system by using the transformation for transforming the further image into the global coordinate system in order to produce the transformed further image; and a joining unit configured for joining the reference image and the transformed further image in the global coordinate system in order to produce at least a part of the composite image.

The term "real-time" has to be understood in such way that each of the further images is added to the composite image in less than a second.

The present invention may be useful in all applications in which a composite image of an interior of a hollow structure needs to be produced. However, the main applications of the invention may be seen in the field of medical endoscopy of an interior of a hollow organ, such as a urinary bladder, recorded by a medical endoscopic camera device. The invention allows producing composite images of an interior of a hollow structure which have fewer perspective distortions than composite images produced with prior art devices using a linear or quadratic stitching method. This is beneficial in all cases in which a composite image of an interior of a hollow structure needs to be produced. However, the invention may be used especially in the field of medical endoscopy of an interior of a hollow organ, such as a urinary bladder, as the techniques involved require a high degree of orientation, coordination, and fine motor skills on the part of the medical practitioner, due to the very limited field of view provided by the endoscope and the lack of relation between the orientation of the image and the physical environment.

Compared to devices using a quadric stitching method, the device according to the invention needs to use less parameter so that the computational effort is lowered. This leads to a reduced processing time for adding a further im- age to the global image. Furthermore, the inventive device is more reliable as the needed parameters are determined by using a method which is more stable, even if the field of view is small, so that the results are more robust in that sense that the likelihood of a misalignment of the further image is reduced. The stereographic projection is a mapping function of the unit sphere S¹ onto a plane. Imagine a center of projection at the north pole (X, Y, Z)^T = (0, 0, 1)^T of the sphere and a plane Z= -1, touching the sphere at the south pole. Each point Xon the sphere is mapped onto the plane by extending the ray from the north pole through Xonto the plane. Now, let us interpret the sphere as the Riemann sphere and the projection plane as the complex plane C extended by the additional number infinity, denoted as C∞. Then, there exists a bijective and conformal mapping (i.e. a mapping which preserves angles) of the Cartesian point to the point z 6 C∞, given by

The inverse mapping is defined as

The stereographic projection s transforms the south pole (0, 0, -1)^T to the origin of the complex plane z— 0, the equator of the sphere to a circle with radius r— 2, and the north pole (0, 0, l)^T to∞. In C∞ the point∞ can be imag- ined to lie at a "very large distance" from the origin and this point turns the complex plane into a geometrical surface of the nature of a sphere. Mapping the surface of a sphere onto a plane is free of distortion at the center of the projection plane and distortion increases with the distance from the center. Angles are locally preserved [7, pp. 22; 8, pp. 162].

Now stereographic reproduction of a spherical mosaic will be explained: Two perspective views of a plane are related through a homography, i.e. x' ~ Hx. As a consequence, the image coordinates of any perspective camera can be related to the coordinate system of a virtual reference camera by a homogra- phy. Let us define this fictive reference camera as a perspective camera with a focal length of /= 2, positioned at the north pole of the unit sphere and viewing straight towards the south pole:

Then projecting a point X∈ R³ by this camera yields

Let further

define an isomorphism between the image plane and the complex plane:

It can now be observed that the action of the perspective camera φ (/¾¾ = z is identical to the stereographic projection as defined in (1 ). The important consequence is that a planar map of a sphere constructed via a homogra- phy-based stitching algorithm can be interpreted as the stereographic map of that sphere. This map can easily be transformed to a spherical panorama by stereographic re-projection as defined in (2). These considerations lead to the following motion model to relate image coordinates in the i-th frame x,- and its corresponding world point X.

While the planar map may appear heavily distorted, the re-projection yields a spherical model, almost free of distortions. The homography is able to accu- rately model spherical image motion under stereographic projection, assuming that the sphere is locally planar. The error induced by this approximation depends on the relative position of the camera. The error increases with the distance of the plane to the sphere. This is plausible since this distance cor- responds to how well the plane approximates the spherical surface patch. Another observation may be more surprising. The error also increases when the field of view of the camera inside the sphere is displaced further from the south pole. This shows that there exists a systematic error when using a perspective motion model to stitch on a planar surface which is then projected onto a sphere as described above. The distortion effects which occur with increasing size of the panorama have two reasons: First, the projection of the sphere onto a planar map causes dilation with increasing distance from the center. This ef- feet can be compensated for by choosing a spherical surface model onto which the map is re-projected. Second, the error has an increasing effect with increasing panorama size. Reducing this distortion is more challenging. The invention addresses this problem by combining a perspective transformation and a Mobius transformation to model spherical image stitching. More specif- ic, the invention addresses this problem by using a transformation for transforming the further image into the global coordinate system, wherein the transformation comprises a Mobius transformation in the complex plane, a isomorphic mapping between the complex plane and the projective space and a perspective transformation in the projective space.

The Mobius transformation is a rational function of the complex plane, defined as

with the coefficients a, b, c, de C_∞ and ad - bc≠0. Mobius transformations are bijective conformal mappings of the Riemann sphere to itself. In fact, any bi- jective conformal automorphism of the Riemann sphere is a Mobius transformation. Therefore, any rigid motion of the Riemann sphere can be expressed as a Mobius transformation. These motions include translation in any direction and rotation about any axis. This implies that any transformation according to (7) of the complex plane corresponds to some movement of the Riemann sphere [7, Chap. 2; 8, Chap. 3].

Mobius transformations have several appealing features. Mobius transformations form a group under composition, i.e. any composition of two Mobius transformations is a Mobius transformation. Just as linear transformations in the 2- dimensional projective space, Mobius transformations can be represented by matrices in the 1- dimensional complex projective space CP¹. Real valued vectors and matrices are denoted with round brackets (), complex vectors and matrices will be denoted with square brackets [] in the following. A point z = [z, 1]^T in complex homogeneous coordinates is transformed by

There exists a group homomorphism between the general linear group of complex 2 x 2 matrices and Mobius transformations. As a result, many operations can be expressed straight forwardly using matrix operations: the inverse transformation

is given by Af¾ and a concatenation of two Mobius transformations corresponds to the matrix product

MiMiz [8, Chap. 3].

As Mobius transformations are conformal mappings, they preserve angles and map circles to circles. We can see a relation to similarity transformations in the Euclidean case, which also preserve angles. Similarity transformations can only describe the action of a camera with the optical axis perpendicular to the scene plane. Analogously, a Mobius transformation is able to model optical flow that results from a camera moving along the surface of the sphere with the optical axis perpendicular to the plane that is tangent to the sphere's surface. This can be explained by the characteristics of stereo- graphic projection. In (3), the stereographic projection has been shown to be equivalent to the action of a projective camera located at the north pole. (1 ) and (3) only describe one possible way of defining a stereographic projection. In fact, instead of the north pole any point on the sphere can be chosen as projection center Ce &. The projection plane can then be any plane perpendicular to the diameter through C(\.e. the projection plane is parallel to the plane through ^tangential to the sphere) [9]. So, the projection by any projective camera positioned at Cwith viewing direction through the sphere's center and focal length f≠Q is equivalent to a stereographic projection.

As stated above there is a relationship between Mobius transformations and movements of the Riemann sphere. Consequently, correspondences exist between certain movements and their respective transformations of the complex plane: A rotation of the sphere around the Z-axis corresponds to a rotation of the complex plane around its origin. A translation of the sphere along the Z-axis leads to dilation or contraction of the complex plane, and a rotation of the sphere around any other axis than the Z-axis causes an inversion of the plane. It has been shown that any Mobius transformation can be expressed via stereographic projection and a rigid motion of the sphere and vice versa [10, 1]. It can be concluded that the images of two projective cameras resembling two different stereographic projections (related by a rigid motion of the sphere) are related by a Mobius transformation. Therefore, the transformation between two images originating from camera motion along the sphere's surface can be modeled by a Mobius transformation.

A Mobius transformation has six degrees of freedom and can therefore be determined from three point correspondences. While a Mobius transformation is defined by four complex coefficients a, b, c, d, these are only unique up to a common scale factor. For

This leaves 8 - 2 = 6 degrees of freedom [8]. If, for example, in cystoscopy the camera could be assumed to move along the bladder surface with a perpendicular view onto the bladder wall, the Mo- bius transformation would provide a very convenient way to model the resulting optical flow. Although this assumption does not hold, this scenario can be approximated according to the invention by first applying a perspective transformation to each further image.

Such perspective transformation can be represented by a 3 * 3 matrix H, mapping homogeneous pixel coordinates. H is called a projectivity or homog- raphy. The relation between two projections

of a world point X e P³ by two independent perspective cameras is given by

The general two-dimensional perspective transformation has 8 degrees of freedom. The 3 x 3 matrix His a unique representation up to an arbitrary scale factor. Uniqueness can be enforced by posing an additional constraint, e.g. /¾,3 = 1. The general homography can be used to model image motion which results from a perspective camera undergoing arbitrary motion. De- tailed derivations of this relationship from general perspective projections can be found in Hartey and Zissermann [6, pp. 325] and Szeliski [11 , pp. 56].

Under the assumption that the small surface patch visible in one image is (nearly) planar, this perspective transform "virtually" aligns the projection plane (image sensor of the camera) with the surface patch. As a result, we can constitute the following: Let Z E C∞ be the stereographic projection of Xe 9- according to (1 ). Then, the relation of an image point x viewed by a projective camera located inside the unit sphere and the point ze L can be expressed by the concatenation of a perspective transformation and a Mobius transformation. Let the perspective transformation h be defined in terms of the homography Has

Let further

denote a Mobius transformation as defined in (7).

Together with the isomorphic mapping φ : P²→ C between image and complex plane defined by (5) the concatenation of

and m is defined as g : C

Since /n, and Λ are invertible, so is g.

wherein is the transformation for transforming the further image into the global coordinate system, which is the concatenation of the perspective transformation Jr¹, the isomorphic mapping 0 and the Mobius transformation m¹, which may use the respective inverse matrices AT¹ and ff\ The composition of a general Mobius transformation and a perspective transform may be called full Mobius perspective transform.

According to a preferred embodiment of the invention the transformation de- termination unit is configured in such way that the Mobius transformation is a simplified Mobius transformation.

If a general homography with eight degrees of freedom with a general Mobius transformation with six degrees of freedom is combined, there several ambiguities exist in the way how certain components of the motion field can be modeled. Rotation around the optical axis, for example, can both be represented by the homography and the Mobius transformation. The same is true for isotropic scaling and translation of the image plane. Consequently, there is a freedom of choice to fix certain parameters without losing any modeling abilities. Therefore, the combined Mobius and perspective transform may be defined in such a way that the unconstrained homography is applied to the image coordinate and the Mobius transformation is restricted to an inversion which corresponds to a rotation of the Riemann sphere. This choice is motivated by the fact that it facilitates calculation of a Mobius perspective transform from corresponding global key points and local key points. Rotation of the Riemann sphere can be defined by a Mobius transformation in the following way: For any point

its antipode on the Riemann sphere

(through stereographic re-projection) is given by -4/a. The Mobius transformation

maps a to 0 and -4/a to∞, which corresponds to a mere rotation of the Riemann sphere [8]. The final representation of the so defined Mobius perspective transform consists of 10 parameters:

and a e C∞.

According to a preferred embodiment of the invention the transformation determination unit is configured in such way that the perspective transformation is a reduced perspective transformation. The number of parameters can be further reduced by replacing the perspective transformation with an affine transformation, which is a reduced perspective transformation. This results in 6 + 2 = 8 degrees of freedom (DOF). This transformation may be called Mobius affine transform. Table 1 summarizes the motion models for spherical stitching.

Table 1

According to a preferred embodiment of the invention the transformation de- termination unit is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs by using a direct linear transformation.

A variation of the Direct Linear Transform algorithm (DLT algorithm) can be applied to calculate a full or simplified Mobius transformation from a set of complex point correspondences

Reformulating (7) yields

A complex linear system of equations can be setup from n >3 point correspondences to determine the transformation parameters:

The equation system Qa = 0 can be solved using singular value decomposition: Q- USV, where the solution vector a is given by the complex conjugate of the right-most column of V. Since the resulting Mobius transformation Mis only unique up to scale, it is normalized by dividing through the determinant:

Mj det(M).

According to a preferred embodiment of the invention the transformation determination unit is configured in such way that the parameters of the trans- formation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs by using a least squares method.

The method of least squares is an approach in regression analysis to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in the results of every single equation.

According to a preferred embodiment of the invention the transformation de- termination unit is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs by using a random sampling consensus method. It is unavoidable that the feature matching algorithm produces some false matches. Random sample consensus (RANSAC) has been established to identify and remove such outliers. The original RANSAC algorithm was introduced in 1981 by Fischler and Bolles [12]. It is still one of the most widely used robust estimators in the field of computer vision [13]. Although it works well in practice, many different contributions improve the original algorithm, aiming either at faster processing or higher robustness. Among the most popular enhancements are MSAC and MLESAC by Torr and Zissermann [14], locally optimized RANSAC and PROSAC by Chum et al. [13], and an extension to RANSAC using guided sampling by Tordoff and Murray [15].

RANSAC is a hypothesize-and-verify method. A model is generated based on a minimal set of point correspondences randomly chosen from all correspondences. This model is verified by the remaining point correspondences. Let, for example, the model be represented by a homography, calculated from four point correspondences. For the verification step, RANSAC calculates an error measure between the model hypothesis and each remaining point correspondence. If this error measure is below a given threshold, the point correspondence is considered an inlier correspondence, otherwise an outlier correspondence. The quality of the current model hypo paper is given by the number of inliers. This hypothesize and verification procedure is repeated iteratively until no further improvement of the model is expected. A theoretical discussion of the optimal termination criterion can be found in [6, pp. 120-121]. The final model is accepted if a minimal number of inliers ί_ηι is reached and if a minimal ratio of inliers versus outliers exceeds a given threshold t_mt. If a model has been found which satisfies both conditions, a final refinement step re-calculates the model from all inlier correspondences by least squares optimization.

According to a preferred embodiment of the invention the transformation de- termination unit is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs by using a guided sampling method. Several improvements have been suggested in literature to reduce the processing time. The guided sampling method was proposed by Tordoff and Murray [15] and adapted for PROSAC by Chum et al. [13]. It is applied here in order to speed up the search for the image transformation. Tordoff and Murray replaced the random sampling of the original RANSAC by a guided sampling. It uses information about the quality of point correspondences which is readily available during feature-based image registration. A correspondence score is often calculated during feature matching, as e.g. the distance between two feature vectors (in feature space) or the nearest-neighbor distance ratio. Assuming that a point correspondence with a higher feature score has a higher probability of being a correct feature match, these correspondences should be drawn with higher probability. As a consequence, the evenly distributed sampling of the initial RANSAC algorithm is replaced by a Monte-Carlo method, choosing the samples according to their matching score. Tordoff and Murray showed that this strategy significantly reduces the number of RANSAC iterations needed to find a consistent model. Therefore, it is of great value for real-time image stitching.

In a further aspect, the invention provides an endoscopic camera system for producing in real-time a digital composite image, the endoscopic camera sys- tern comprising: an endoscopic camera device configured for recording a sequence of digital images of an interior of a hollow structure, in particular a medical endoscopic camera device configured for recording a sequence of digital images of an interior of a hollow organ, such as an urinary bladder; and an image processing device according to the invention .

In another aspect the invention provides a method for producing in real-time a digital composite image from a sequence of digital images of an interior of a hollow structure recorded by an endoscopic camera device, in particular of an interior of a hollow organ, such as an urinary bladder, recorded by an medical endoscopic camera device, so that the composite image has a wider field of view than the images of the sequence of images, the image pro- cessing device comprising: selecting a reference image and a further image from the sequence of images by using a selecting unit, wherein the reference image is specified in a global coordinate system of the composite image as a stereographic projec- tion of a part of the interior of the hollow structure in a complex plane, wherein the further image is specified in a local coordinate system of the further image as a projection of a further part of the interior of the hollow structure in a projective space, and wherein the further image is overlapping the reference image; detecting global key points in the reference image and detecting local key points in the further image by using a key point detection unit; transforming the further image into the global coordinate system based on the global key points and based on the local key points by using a transforming unit in order to produce a transformed further image, wherein key point pairs are determined by a key point matching unit of the transforming unit, wherein each of the key point pairs comprises one global key point of the global key points and one local key point of the local key points, wherein the global key point and the local key point of each of the key point pairs correspond to a same feature of the hollow structure, wherein a transformation for transforming the further image into the global coordinate system is determined by a transformation determina- tion unit of the transforming unit, wherein the transformation is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius transformation, wherein parameters of the transformation for transforming the further image into the global coordinate system are determined from at least some of the key point pairs, wherein the further image is transformed into the global coordinate system by using a transforming execution unit of the transforming unit, which uses the transformation for transforming the further image into the global coordinate system in order to produce the transformed further image; and joining the reference image and the transformed further image in the global coordinate system by using a joining unit in order to produce at least a part of the composite image.

In a further aspect the invention provides a computer program for, when running on a processor, executing the method according to the invention.

Preferred embodiments of the invention are subsequently discussed with respect to the accompanying drawings, in which:

Fig. 1 illustrates an embodiment of an endoscopic camera system

comprising an image processing device according to the invention in a schematic view;

Fig. 2 depicts an example of a stereographic projection to a complex plane, wherein the projection center is located at the north pole of a unit sphere, and wherein the complex plane is tangent to the south pole of the unit sphere;

Fig. 3 illustrates that an action of a fixed camera being positioned at a north pole of a unit sphere is identical to the stereographic projection shown in Fig. 2; depicts an example of mapping image points of an movable camera being positioned at an arbitrary position within the unit sphere and points on the sphere being represented by their respective complex equivalent;

Fig. 5 depicts an example of a stereographic projection to a complex plane, wherein the projection center is located at an arbitrary position at a unit sphere, and wherein the complex plane is ar- bitrary, but perpendicular to a diameter starting at the respective projection center; and

Figs. 6 to 8 illustrate the transformation of a further image into the global coordinate system by using the transformation for transforming the further image into the global coordinate system.

Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals.

Fig. 1 illustrates an embodiment of an endoscopic camera system comprising an image processing device 1 according to the invention in a schematic view.

The invention provides an image processing device 1 for producing in real- time a digital composite image CI from a sequence SI of digital images of an interior of a hollow structure HS (see Figs. 2 to 8) recorded by an endoscopic camera device 2, in particular of an interior of a hollow organ HS, such as an urinary bladder HS, recorded by an medical endoscopic camera device 2, so that the composite image CI has a wider field of view than the images of the sequence SI of images, the image processing device 1 comprising: a selecting unit 3 configured for selecting a reference image Rl and a further image FI from the sequence of images SI, wherein the reference image Rl is specified in a global coordinate system of the composite image CI as a ste- reographic projection of a part of the interior of the hollow structure HS in a complex plane CP (see Figs. 2 to 8), wherein the further image FI is specified in a local coordinate system of the further image Fl as a projection of a further part of the interior of the hollow structure HS in a projective space PS (see Figs. 4 and 6), and wherein the further image Fl is overlapping the reference image Rl; a key point detection unit 4 configured for detecting global key points GKP in the reference image Rl and for detecting local key points LKP in the further image Fl; a transforming unit 5 configured for transforming the further image Fl into the global coordinate system based on the global key points GKP and based on the local key points LKP in order to produce a transformed further image TFI, wherein the transforming unit 5 comprises a key point matching unit 6 configured for determining key point pairs KPP, wherein each of the key point pairs KPP comprises one global key point GKP of the global key points GKP and one local key point LKP of the local key points LKP, wherein the global key point GKP and the local key point LKP of each of the key point pairs KPP correspond to a same feature of the hollow struc- ture HS, wherein the transforming unit 5 comprises a transformation determination unit 7 configured for determining a transformation for transforming the further image Fl into the global coordinate system, wherein the transfor- mation for transforming the further image Fl into the global coordinate system is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius transformation, wherein parameters of the transformation for transforming the further image Fl into the global coordinate system are determined from at least some of the key point pairs KPP, wherein the transforming unit 5 comprises an transforming execution unit 8 configured for transforming the further image Fl into the global coordinate system by using the transformation for transforming the further image Fl into the global coordinate system in order to produce the trans- formed further image TFI; and a joining unit 9 configured for joining the reference image Rl and the transformed further image TFI in the global coordinate system in order to produce at least a part of the composite image CI.

According to a preferred embodiment of the invention the transformation determination unit 7 is configured in such way that the Mobius transformation is a simplified Mobius transformation. According to a preferred embodiment of the invention the transformation determination unit 7 is configured in such way that the perspective transformation is a reduced perspective transformation.

According to a preferred embodiment of the invention the transformation de- termination unit 7 is configured in such way that the parameters of the transformation for transforming the further image Fl into the global coordinate system are determined from the at least some of the key point pairs KPP by using a direct linear transformation. According to a preferred embodiment of the invention the transformation determination unit 7 is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs KPP by using a least squares method.

According to a preferred embodiment of the invention the transformation determination unit 7 is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs KPP by using a random sampling consensus method.

According to a preferred embodiment of the invention the transformation determination unit 7 is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs KPP by using a guided sampling method. In a further aspect the invention provides an endoscopic camera system for producing in real-time a digital composite image CI, the endoscopic camera system comprising: an endoscopic camera device 2 configured for recording a sequence SI of digital images of an interior of a hollow structure HS, in particular a medical endoscopic camera device 2 configured for recording a sequence SI of digital images of an interior of a hollow organ HS, such as an urinary bladder; and an image processing device 1 according to the invention .

In another aspect the invention provides a method for producing in real-time a digital composite image CI from a sequence SI of digital images of an inte- rior of a hollow structure HS recorded by an endoscopic camera device 2, in particular of an interior of a hollow organ HS, such as an urinary bladder HS, recorded by an medical endoscopic camera device 2, so that the composite image CI has a wider field of view than the images of the sequence SI of images, the image processing device 1 comprising: selecting a reference image Rl and a further image Fl from the sequence of images SI by using a selecting unit 3, wherein the reference image Rl is specified in a global coordinate system of the composite image CI as a ste- reographic projection of a part of the interior of the hollow structure HS in a complex plane CP, wherein the further image Fl is specified in a local coordinate system of the further image Fl as a projection of a further part of the interior of the hollow structure HS in a projective space PS, and wherein the further image Fl is overlapping the reference image Rl; detecting global key points GKP in the reference image Rl and detecting local key points LKP in the further image Fl by using a key point detection unit 4; transforming the further image Fl into the global coordinate system based on the global key points GKP and based on the local key points LKP by using a transforming unit 5 in order to produce a transformed further image TFI, wherein key point pairs KPP are determined by a key point matching unit 6 of the transforming unit 5, wherein each of the key point pairs KPP comprises one global key point GKP of the global key points GKP and one local key point LKP of the local key points LKP, wherein the global key point GKP and the local key point LKP of each of the key point pairs KPP correspond to a same feature of the hollow structure HS, wherein a transformation for transforming the further image Fl into the global coordinate system is determined by a transformation determination unit 7 of the transforming unit 5, wherein the transformation for transforming the further image Fl into the global coordinate system for transforming the further image into the global coordinate system is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius trans- formation, wherein parameters of the transformation for transforming the further image Fl into the global coordinate system are determined from at least some of the key point pairs KPP, wherein the further image Fl is transformed into the global coordinate sys- tern by using a transforming execution unit 8 of the transforming unit 5, which uses the transformation for transforming the further image Fl into the global coordinate system in order to produce the transformed further image TFI; and joining the reference image Rl and the transformed further image TFI in the global coordinate system by using a joining unit 9 in order to produce at least a part of the composite image CI.

In a further aspect the invention provides a computer program for, when run- ning on a processor, executing the method according to the invention.

Fig. 2 depicts an example of a stereographic projection to a complex plane CP, wherein the projection center C is located at the north pole of a unit sphere, which is an approximation for the shape of a hollow structure HS, and wherein the complex plane is tangent to the south pole of the unit sphere. The stereographic projection maps points Xon the unit sphere to z in the complex plane. The projection center is the north pole N= (0, 0, 1)^T and the projection plane is Z- -1. Mathematically the stereographic projection may be described according to (1 ) and (2).

Fig. 3 illustrates that an action of an imaginary fixed camera FC being positioned at a north pole of a unit sphere is identical to the stereographic projection shown in Fig. 2. The imaginary fixed camera FC may have the properties P₀ as mathematically described by (3). Projecting a point X∈ R³ by this camera may be described according to (4).

Fig. 4 depicts an example of mapping image points x of a movable camera MC being positioned at an arbitrary position within the unit sphere and points Xon the sphere being represented by their respective complex equivalent z at the complex plane CP.

The mapping between image points x of the movable camera MC and points Xon the sphere represented by their respective complex equivalent points z can be described by a homography assuming that the sphere is planar within the field of view of the movable camera MC. Such perspective transformation can be represented by a 3 x 3 matrix H, mapping homogeneous pixel coordinates as defined in (10).

Fig. 5 depicts an example of a stereographic projection to a projection plane, which may be a complex plane CP as discussed above, wherein the projection center C is located at an arbitrary position at a unit sphere, and wherein the projective plane CP' is arbitrary, but perpendicular to a diameter starting at the respective projection center C. It has to be noted that any definition with the projection center C on the surface of the unit sphere and the projection plane CP' perpendicular to the respective diameter is a valid definition of the stereographic projection.

As stated above, the projection by any projective camera positioned at the unit sphere viewing direction through the sphere's center and focal length f≠ 0 is equivalent to a stereographic projection. So, changing the projection center C as well as the projection plane CP' is tantamount to moving a projective camera along the sphere's surface (and altering its focal length). In Fig. 2, the camera located at the north pole projects the world point Xto the image point represented by z. in Fig: 5, the camera located at projection center C projects the world point X to the image point represented by

Points z maybe transformed to points z' using a Mobius transform as defined in (8).

Figs. 6 to 8 illustrate the transformation of a further image FI into the global coordinate system by using the transformation for transforming the further image into the global coordinate system.

Fig. 6 illustrates a first step of the transformation. The further image FI is specified in a local coordinate system of the further image FI as a projection of a part of the interior of the hollow structure HS to an image plane IP in a projective space PS. A perspective projection

which is the inverse of the perspective projection

specified in (10) transforms each point xof the further image FI to a point

of a further image plane FIP in the protective space PS', which locally approximates the interior surface of the hollow structure HS. The perspective projection uses a 3 * 3 matrix

which is the in

verse of the full matrix

or of the reduced matrix H.

Fig. 7 illustrates a second step of the transformation. The isomorphic mapping φ is the inverse of φ ¹ as defined above. The isomorphic mapping φ maps each point of a further image plane FIP to a point z' in an intermediate complex plane CP'. Position and orientation of the intermediate complex plane CP' is identical to the further image plane FIP shown in Fig. 6. The position of the intermediate protection center C may be determined by a Mobius transform as defined in (8) or (14). Fig. 8 illustrates a third step of the transformation. The Mobius transformation m^-1, which may be the inverse of the full Mobius transformation m as defined in (8) or the inverse of the reduced Mobius transformation m as defined (14), maps each point z' to a point z of the complex plane CP in which the reference image Rl is specified, so that each point z is transformed into the global coordinate system of the reference image Rl.

In the following description, a plurality of details is set forth to provide a more thorough explanation of embodiments of the present invention. However, it will be apparent to one skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present inven- tion. In addition, features of the different embodiments described hereinafter may be combined with each other, unless specifically noted otherwise.

Depending on certain implementation requirements, embodiments of the inventive device and system can be implemented in hardware and/or in soft- ware. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-ray Disc, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that one or more or all of the functionalities of the inventive device or system is performed.

In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform one or more or all of the functionalities of the devices and systems described herein. In some embod- iments, a field programmable gate array may cooperate with a microprocessor in order to perform one or more or all of the functionalities of the devices and systems described herein.

Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a fea- ture of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Depending on certain implementation requirements, embodiments of the inventive method can be implemented using an apparatus comprising hardware and/or software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-ray Disc, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having elec- tronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.

Depending on certain implementation requirements, embodiments of the in- ventive method can be implemented using an apparatus comprising hardware and/or software.

Some or all of the method steps may be executed by (or using) a hardware apparatus, like a microprocessor, a programmable computer or an electronic circuit. Some one or more of the most important method steps may be executed by such an apparatus.

Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system such that one of the methods described herein is performed.

Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.

Other embodiments comprise the computer program for performing one of the methods described herein, which is stored on a machine readable carrier or a non-transitory storage medium. A further embodiment comprises a processing means, for example a computer, or a programmable logic device, in particular a processor comprising hardware, configured or adapted to perform one of the methods described herein.

A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein. Generally, the methods are advantageously performed by any apparatus comprising hardware and or software.

While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.

Reference signs:

1 image processing device

2 camera device

3 selecting unit

4 key point detection unit

5 transforming unit

6 key point matching unit

7 transformation determination unit

8 transforming execution unit

9 joining unit

CI composite image

SI sequence of digital images

HS hollow structure Rl reference image

Fl further image

CP complex plane

PS projective space

GKP global key point

LKP local key point

KPP key point pair

TFT transformation for transforming the further image into the global coordinate system

C projection center

FC fixed camera

MC movable camera

IP image plane

FIP further image plane

References:

[1] Ali Can, Charles V. Stewart, Badrinath Roysam, and Howard L.

Tanenbaum. A feature-based, robust, hierarchical algorithm for registering pairs of images of the curved human retina. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):347-364, 2002.

[2] Amnon Shashua and Sebastian Toelg. The Quadric Reference Surface: Theory and Ap plications. International Journal of Computer Vision, 23(2): 185-198, 1997.

[3] A. Shashua and Y. Wexler. Q-warping: Direct computation of quadratic reference sur- faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8):920-925, 2001.

[4] Ramesh Raskar, Jeroen Baar, Thomas Willwacher, and Srinivas Rao.

Quadric transfer for immersive curved screen displays. Computer Graphics Forum, 23(3):451-460, 2004.

[5] Ramesh Raskar, Jeroen van Baar, Paul Beardsley, Thomas Willacher, Srinivas Rao, and Clifton Forlines. iLamps: Geometrically Aware and Self-configuring Projectors. In international Conference on Computer Graphics and interactive Techniques (SIGGRAPH), SIGGRAPH Ό6, New York, NY, USA, 2006. ACM. [6] R. 1. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004.

[7] Hans Schwerdtfeger. Geometry of Complex Numbers: Circle Geometry, Moebius Transformation, Non-euclidean Geometry. Courier Corpo- ration, 1979.

[8] Tristan Needham. Anschauliche Funktionentheorie . Oldenbourg Ver- lag, Munchen, 2. Edition, 201 1. [9] Dan Pedoe. Geometry: A Comprehensive Course. Courier Corporation, 2013.

[10] Rob Silicianoa. Constructing Mobius Transformations with Spheres.

Rose-Hulman Undergraduate Mathematics Journal, 13(2):114-124, 2012.

[1 1] Richard Szeliski. Computer Vision: Algorithms and Applications.

Springer-Veriag New York, 1 edition, 2010.

[12] Martin A. Fischler and Robert C. Bolles. Random Sample Consensus:

A Paradigm for Model Fitting with Applications to Image Analysis and

Automated Cartography. Communications of the ACM, 24(6):381~395, 1981.

[13] Ondrej Chum, Jiri Matas, and Josef Kittler. Locally Optimized RAN- SAC. In Bernd Michaelis and Gerald Krell, editors, Joint Pattern

Recognition Symposium, Lecture Notes in Computer Science, pages 236-243. Springer Berlin Heidelberg, 2003.

[14] P.H.S. Torr and A. Zisserman. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry. Computer Vision and Image Understanding, 78(1 ): 138-156, 2000. Ben Tordoff and David W. Murray. Guided Sampling and Consensus for Motion Estimation. In Anders Heyden, Gunnar Sparr, Mads Nielsen, and Peter Johansen, editors, European Conference On Computer Vision (ECCV), Lecture Notes in Computer Science, pages 82-96. Springer Berlin Heidelberg, 2002.

Claims

1. Image processing device for producing in real-time a digital composite image (CI) from a sequence (SI) of digital images of an interior of a hollow structure (HS) recorded by an endoscopic camera device (2), in particular of an interior of a hollow organ (HS), such as an urinary bladder (HS), recorded by an medical endoscopic camera device (2), so that the composite image (CI) has a wider field of view than the images of the sequence (SI) of images, the image processing device (1 ) comprising: a selecting unit (3) configured for selecting a reference image (Rl) and a further image (Fl) from the sequence of images (SI), wherein the reference image (Rl) is specified in a global coordinate system of the composite image (CI) as a stereographic projection of a part of the interior of the hollow structure (HS) in a complex plane (CP), wherein the further image (Fl) is specified in a local coordinate system of the further image (Fl) as a projection of a further part of the interior of the hollow structure (HS) in a projective space (PS), and wherein the further image (Fl) is overlapping the reference image (Rl); a key point detection unit (4) configured for detecting global key points (GKP) in the reference image (Rl) and for detecting local key points (LKP) in the further image (Fl); a transforming unit (5) configured for transforming the further image (Fl) into the global coordinate system based on the global key points (GKP) and based on the local key points (LKP) in order to produce a transformed further image (TFI), wherein the transforming unit (5) comprises a key point matching unit (6) configured for determining key point pairs (KPP), wherein each of the key point pairs (KPP) comprises one global key point (GKP) of the global key points (GKP) and one local key point (LKP) of the local key points (LKP), wherein the global key point (GKP) and the local key point (LKP) of each of the key point pairs (KPP) correspond to a same feature of the hollow structure (HS), wherein the transforming unit (5) comprises a transformation determination unit (7) configured for determining a transformation for transforming the further image (Fl) into the global coordinate system, wherein the transformation for transforming the further image (Fl) into the global coordinate system is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius transformation, wherein parameters of the transformation for transforming the further image (Fl) into the global coordinate system are determined from at least some of the key point pairs (KPP), wherein the transforming unit (5) comprises an transforming execution unit (8) configured for transforming the further image (Fl) into the global coordinate system by using the transformation for transforming the further image (Fl) into the global coordinate system in order to produce the transformed further image (TFI); and a joining unit (9) configured for joining the reference image (Rl) and the transformed further image (TFI) in the global coordinate system in order to produce at least a part of the composite image (CI).

2. Image processing device according to the preceding claim, wherein the transformation determination unit (7) is configured in such way that the Mobius transformation is a simplified Mobius transformation.

3. Image processing device according to one of the preceding claims,

wherein the transformation determination unit (7) is configured in such way that the perspective transformation is a reduced perspective transformation.

4. Image processing device according to one of the preceding claims,

wherein the transformation determination unit (7) is configured in such way that the parameters of the transformation for transforming the further image (Fl) into the global coordinate system are determined from the at least some of the key point pairs (KPP) by using a direct linear transformation.

5. Image processing device according to one of the preceding claims, wherein the transformation determination unit (7) is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs (KPP) by using a least squares method.

6. Image processing device according to one of the preceding claims,

wherein the transformation determination unit (7) is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs (KPP) by using a random sampling consensus method.

7. Image processing device according to one of the preceding claims,

wherein the transformation determination unit (7) is configured in such way that the parameters of the transformation for transforming the further image into the global coordinate system are determined from the at least some of the key point pairs (KPP) by using a guided sampling method.

8. Endoscopic camera system for producing in real-time a digital composite image (CI), the endoscopic camera system comprising: an endoscopic camera device (2) configured for recording a sequence (SI) of digital images of an interior of a hollow structure (HS), in particular a medical endoscopic camera device (2) configured for recording a sequence (SI) of digital images of an interior of a hollow organ (HS), such as an urinary bladder; and an image processing device (1 ) according to one of the preceding claims.

9. Method for producing in real-time a digital composite image (CI) from a sequence (SI) of digital images of an interior of a hollow structure (HS) recorded by an endoscopic camera device (2), in particular of an interior of a hollow organ (HS), such as an urinary bladder (HS), recorded by an medical endoscopic camera device (2), so that the composite image (CI) has a wider field of view than the images of the sequence (SI) of images, the image processing device (1 ) comprising: selecting a reference image (Rl) and a further image (Fl) from the se- quence of images (SI) by using a selecting unit (3), wherein the reference image (Rl) is specified in a global coordinate system of the composite image (CI) as a stereographic projection of a part of the interior of the hollow structure (HS) in a complex plane (CP), wherein the further image (Fl) is specified in a local coordinate system of the further image (Fl) as a pro- jection of a further part of the interior of the hollow structure (HS) in a projective space (PS), and wherein the further image (Fl) is overlapping the reference image (Rl); detecting global key points (GKP) in the reference image (Rl) and detect- ing local key points (LKP) in the further image (Fl) by using a key point detection unit (4); transforming the further image (Fl) into the global coordinate system based on the global key points (GKP) and based on the local key points (LKP) by using a transforming unit (5) in order to produce a transformed further image (TFI), wherein key point pairs (KPP) are determined by a key point matching unit (6) of the transforming unit (5), wherein each of the key point pairs (KPP) comprises one global key point (GKP) of the global key points (GKP) and one local key point (LKP) of the local key points (LKP), wherein the global key point (GKP) and the local key point (LKP) of each of the key point pairs (KPP) correspond to a same feature of the hollow structure (HS), wherein a transformation for transforming the further image (Fl) into the global coordinate system is determined by a transformation determination unit (7) of the transforming unit (5), wherein the transformation for transforming the further image (Fl) into the global coordi- nate system is a concatenation of a perspective transformation, an isomorphic mapping and a Mobius transformation, wherein parame- ters of the transformation for transforming the further image (Fl) into the global coordinate system are determined from at least some of the key point pairs (KPP), wherein the further image (Fl) is transformed into the global coordinate system by using a transforming execution unit (8) of the transforming unit (5), which uses the transformation for transforming the further image (Fl) into the global coordinate system in order to produce the transformed further image (TFI); and joining the reference image (Rl) and the transformed further image (TFI) in the global coordinate system by using a joining unit (9) in order to produce at least a part of the composite image (CI).

10. Computer program for, when running on a processor, executing the

method according to the preceding claim.