INFORMATION PROCESSING FOR DISTINGUISHING AN OBJECT
Field of the Invention
The present invention relates to a method of identifying an object in a digital microscope image which reproduces biological material according to the preamble to claim 1, an arrangement according to the preamble to claim 12, a computer program according to the preamble to claim 13 and a digital storage medium according to claim 14.
Cross reference to related applications: This application claims benefit from Swedish patent application no SE-0101712-8, filed 16 May, 2001, and US provisional patent application no US-60/316 277, filed 4 September, 2001. Background Art
Segmentation, or distinguishing, is a frequent operation in processing of digital microscope images which reproduce biological material. In segmentation, a number of picture elements (often referred to as pixels) in an image are identified and selected, which together reproduce an object. A perfect segmentation means that all picture elements that reproduce the object are selected, i.e. no other picture elements are selected.
An example of utilisation of segmentation is in analysing a specimen of blood. Such an analysis can be carried out in such manner that all or at least several white blood cells in a blood smear are searched and reproduced by a digital microscope. The white blood cells are segmented from the images and analysed by means of image analysis. The image analysis can result in the blood cells being divided into different classes based on their respective appearances. The distribution of the blood cells in the different classes can then constitute a basis for making a diagnosis regarding the organism from which the specimen of blood originates.
A common method for segmenting is so-called thresholding. In thresholding, properties of picture elements in the image are compared with a threshold value to determine whether the picture elements reproduce part of a searched object or not. If, for instance, bright objects against a dark background are to be identified, the thresholding can be carried out by assuming that all picture elements having a light intensity that exceeds a threshold value reproduce objects of the searched type. It can be assumed that a connected cluster of such picture elements then reproduces an object.
A problem of prior art segmenting methods arises when several objects of the searched type are positioned
close to each other, as shown in Fig. 1 where two white blood cells engage each other. An obvious risk in this case is that the two cells are segmented as a single cell, which results in incorrect sizing of the cells and a deteriorated result of the analysis.
One method of avoiding undersegmentation, i.e. that several objects are segmented as a single object, is to utilise a sharper segmentation condition (in the case of thresholding, a higher or lower threshold value) . However, this involves a risk of oversegmentation, i.e. that an object is segmented as if it were several objects. This happens not only to objects adjoining each other, but also separate objects.
A segmentation condition which is made sharper may thus result in incorrect segmentation, which deteriorates the result of the analysis. Summary of the Invention
An object of the present invention is to wholly or partly remedy the above-mentioned problems.
This object is achieved by a method of identifying an object in a digital microscope image which reproduces biological material according to claim 1, an arrangement according to claim 12, a computer program according to claim 13 and a digital storage medium according to claim 14.
According to a first aspect, the invention concerns more specifically a method in distinguishing an object of a predetermined type in a digital microscope image which reproduces biological material and which comprises a set of connected segments. The object consists of at least one segment in the set and the method is characterized by the steps of a) selecting a start segment, b) calculating a value of a measuring function for a first segment region which comprises at least the start segment in the set, the measuring function being dependent on at least the relationship between the segment region area and - perimeter, and c) distinguishing the object on the basis of the value of said measuring function.
Using such a method it will be possible to put to- gether oversegmented objects in a reliable fashion. This affords a possibility of setting a segmentation condition high so that undersegmentation does certainly not take place, which gives an overall segmentation with great accuracy. Preferably, the calculating step is repeated for a second segment region which differs from the first segment region. If a plurality of segment regions are tested, the segment region that gives a most favourable value can be selected, which results in greater reli- ability in the assessment of which segments are to be put together.
The second segment region may consist of the segments in the first segment region, for which the value
of the measuring function has already been calculated, and an added segment which is connected with the first segment region. Preferably, the measuring function of the second segment region is then calculated as the measuring function of the first segment region adjusted with an addition regarding the added segment. This results in quicker calculation of the measuring function of a segment region since previous calculations can be reused.
According to a preferred embodiment, measuring func- tions for a plurality of segment regions are calculated and the object is distinguished as the segment region whose measuring function value is closest to a value which is ideal for objects of the predetermined type. If a plurality of segment regions are tested, there will be greater possibilities of finding one that actually reproduces an object.
Preferably, measuring functions are calculated for essentially all segment regions, in which the Euclidean distance between all pairs of picture elements included in the segment region is smaller than a distance threshold value. If the distinguishing is carried out in this way, all segment regions that are not too large to constitute an object can be tested.
Alternatively, measuring functions corresponding to segment regions can be calculated until the value of a measuring function has been found, the deviation of which from a value which is ideal for objects of the predetermined type is below a measuring function threshold value,
and wherein the object is identified as the corresponding segment region of this measuring function. This results in a comparatively quick process.
According to a preferred embodiment, the distin- guished object is excluded from said set of connected segments, after which a process according to the method is repeated on the resulting remaining quantity. In this manner, a whole cluster of segments can be analysed. The distinguishing will be quicker and quicker, the smaller the remaining cluster.
Preferably, in step (a) a segment which, probably not on its own, reproduces an object of the searched type is selected as the start segment. This may result in a more reliable joining of segments. Preferably, the method according to the invention can be repeated several times with different choices of start segment, the correspondence between the results from the different times being checked. Also a statistical processing of the results can be carried out, in which case distinguishing takes place based on the result of this processing. This results in greater safety in the distinguishing of the objects in a cluster of segments. Furthermore a measure of how reliable the distinguishing is can be obtained. According to a second aspect, the invention relates to an arrangement for distinguishing an object of a predetermined type in a digital microscope image which reproduces biological material and which comprises a set of
connected segments, the object consisting of at least one segment in the set. The arrangement is characterized by means for selecting a start segment in the set, means for calculating a value of a measuring function for a first segment region which comprises at least the start segment in the set, the measuring function being dependent on at least the relationship between the area and perimeter of the segment region, and means for distinguishing the object based on the value of said measuring function. This arrangement implies advantages corresponding to those of the method and can be varied accordingly.
According to a third aspect, the invention relates to a computer program for distinguishing an object of a predetermined type in a digital microscope image which reproduces biological material and which comprises a set of connected segments, the object consisting of at least one segment in the set. The computer program is characterized by instructions corresponding to the steps of a) selecting a start segment, b) calculating a value of a measuring function for a first segment region which comprises at least the start segment in the set, the measuring function being dependent on at least the relationship between the area and - perimeter of the segment region, and c) distinguishing the object based on the value of said measuring function.
The computer program implies advantages corresponding to those of the method and can be varied accordingly.
According to a fourth aspect, the invention concerns a digital storage medium comprising such a computer program.
Brief Description of the Drawings
Fig. 1 shows an example of a digital microscope image.
Fig. 2 shows the microscope image in Fig. 1 where segmenting has been carried out so that oversegmenting of a white blood cell has taken place.
Fig. 3a shows schematically the segments in the image in Fig. 2.
Fig. 3b shows a graph corresponding to Fig. 3a.
Fig. 4a shows a graph which constitutes a first proposal for joining of the segments in Fig. 2.
Fig. 4b indicates a segment region corresponding to the graph in Fig. 4a.
Fig. 4c shows a graph which constitutes a second proposal for joining of the segments in Fig. 2.
Fig. 4d indicates a segment region corresponding to the graph in Fig. 4c.
Fig. 5 illustrates a method according to a first embodiment of the invention.
Fig. 6 illustrates a method according to a second embodiment of the invention.
Fig. 7 illustrates, according to an embodiment of the invention, a joining of the segments in Fig. 2.
Fig. 8 illustrates an example of the appearance of the segments in a cluster of segments.
Fig. 9 shows a system in which the present invention can be applied.
Fig. 10 shows an arrangement according to an embodiment of the invention. Description of Preferred Embodiments
When using a microscope system of such a type as will be described in more detail in connection with Fig. 9, digital microscope images are generated.
Fig. 1 shows an example of a digital microscope image . The image reproduces two neighbouring white blood cells, and a large number of surrounding red blood cells. The white blood cells have dark nuclei, which within the membrane of the cells are surrounded by cytoplasm.
An operation that can be executed on the image in Fig. 1 is segmentation of white blood cells. As mentioned above, such a segmentation is difficult if white blood cells in the image adjoin each other, which is the case in Fig. 1. With a segmentation condition which is set low, there is a risk of the two cells being segmented as a single cell.
A possibility of segmenting white blood cells is to use the fact that they normally have a low green component, i.e. they are dark in the green part of the colour spectrum. This can be utilized, for example, in segmentation by thresholding. If the segmentation condition is made sharper, i.e. in this case if the green component threshold value is set lower, there is a risk of overseg- menting .
Fig. 2 shows the microscope image in Fig. 1 where segmentation has been carried out so that oversegmenting of one of the white blood cells has occurred. The reason for the overseg enting is that the nuclei of the upper white blood cell has indentations which are perceived by the system as boundary lines between white blood cells. As a result, the upper white blood cell has been divided into three segments .
An object of the present invention is to provide a method for quick and accurate correction of such overseg- mentations .
Fig. 3a shows schematically the segments in the image in Fig. 2. The segments in the oversegmented image have been numbered. The background surrounding the cluster of segments can be referred to as the zero segment.
Fig. 3b shows a graph corresponding to Fig. 3a and describing how the segments are connected. Segment 1 is
connected with segment 2 and segment 3. Segment 2 is connected with segment 1 and segment 3. Segment 3 is connected with segment 1, segment 2 and segment 4. Segment 4 is connected with segment 3 only. The person looking at Fig. 1 realizes that all the connections are "true" except for the one between segment 3 and segment 4. By a connection being true is here meant that it connects two segments which reproduce the same object.
The graph shown in Fig. 3b can be said to constitute a tree. According to an embodiment of the invention, this tree is to be divided into (one or more) subtrees which only contain true connections.
To achieve this, an evaluation of proposals for subtrees is to be carried out. Fig. 4a shows a graph which constitutes a first proposal for joining the segments in Fig. 2. The connection between segment 1 and segment 3 constitutes a proposal for subtrees . The other connections are indicated by dashed lines. Fig. 4b shows a segment region drawn by full lines and corresponding to the graph in Fig. 4a. Segment 1 and segment 3 together constitute a segment region. By segment region is here generally meant a set of connected segments in an image that has been segmented. For the segment region in Fig. 4b, a corresponding measuring function can be calculated, as will be described below. The measuring function is pref-
erably based on the relationship between the perimeter and area of the segment region. Moreover, parameters such as the greatest extent of the segment region and variations in appearance, for instance texture, between different segments in a segment region can be utilized in the measuring function.
Fig. 4c shows a graph which constitutes a second proposal for joining of the segments in Fig. 2. In relation to the tree in Fig. 4a, segment 2 has been added. Fig. 4d shows a segment region corresponding to the graph in Fig. 4c.
When calculating a measuring function for this increased segment region, calculations that have already been made for the previous segment region can be used to a great extent, as will be shown below. The calculations can thus be carried out in a cumulative manner, which makes them quick.
A process according to an embodiment of the invention comprises at least the following steps: selection of a start segment in a cluster of segments, which start segment will be the starting point of the process; calculation of a value of a measuring function for a segment region which comprises at least the start segment in the cluster, and distinguishing a searched object based on the value of the measuring function. SELECTION OF START SEGMENT
The start segment can basically be selected randomly or arbitrarily from the segments in a cluster. Preferably, however, a start segment which probably does not itself reproduce an object is selected since this may result in a more reliable process. The reason for this is apparent from the example in Fig. 8. There two segments 801 and 802 are reproduced, which are included in a cluster of segments. The two segments together reproduce an object. The first segment 801 is relatively round and will therefore probably give a high value of a measuring function of the type that will be described below. The second segment 802 is shaped as a crescent and will therefore probably obtain a low corresponding value .
The second segment 802 here constitutes a suitable start segment. If the first segment 801 should be selected, there is in fact a risk that it will be incor-
rectly assessed by the system as an object since it has a high corresponding measuring function value. This risk, however, does not exist for the second segment 802. To be able to make such a choice, it is convenient to calculate, for at least part of the segments in a cluster, the measuring function or some other suitable measure for an isolated segment. The start segment can then be selected, for instance, as the segment that has the lowest measuring function value, or as the first segment that has a measuring function value which is lower than a threshold value . CALCULATION OF MEASURING FUNCTION
A measuring function should generally be designed so as to give a value, close to an ideal value, for a segment region which is very similar to an object of the searched type .
White blood cells are normally round or oval. Therefore, a measuring function is preferably based on the relationship between the area and perimeter of a segment region. A common measure of this relationship is the - compactness C (T) =P (T) 2/A(T) , where P (T) is the perimeter and A(T) is the area of a segment region T. The square of P makes the compactness dimension-less. C can be minimum
4π («12.6), which is applicable if the segment region is a circle. If the segment region is a square, C=16.
The compactness is merely an example of how the relationship between the perimeter and area of a segment region can be expressed. Other ways are conceivable. The area and perimeter of the segment region can preferably be calculated cumulatively. Now assume that the area and perimeter of a first segment region have been calculated. The area and perimeter of a second segment region are to be calculated. The second segment region consists of the first segment region, to which a further segment has been added. In a cumulative process, the previously effected calculation of the area and perimeter of the first segment region is used. This calculation is adjusted only in respect of the added segment (analogously if a segment has been removed) .
By calculating in advance a(Ni), i.e. the area, of all regions i, it is easy to cumulatively update
A(T) =Σ (a (Ni) ) (where Ni is the segments that are included in the segment region T) in the course of the search. When calculating A(T) for the segment region in Fig. 4b, for instance a(l) and a (3) are summed up. When the similar tree i Fig. 4c is to be evaluated, it is sufficient to update A(T) by adding a (2) .
It is also possible to update the perimeter P (T) cumulatively. It is sufficient to calculate in advance p(i,j), i.e. the length of the common boundary line or
the common boundary lines, of all possible pairs of segments Ni, Nj .
P(i,0) is the length of the boundary line between the segment i and the zero segment, i.e. the background in Fig. 3a. Also the boundary lines of the zero segment are calculated. It is to be noted that p(3,0) in Fig. 4b consists of two non-connected parts.
For the pairs of segments that do not adjoin each other, p(i,j)=0, which can be used to limit the number p(i,j) that need be calculated. The fact that p (i, j ) =p (j , i) also saves calculations. Nor is it meaningful to calculate p(i,i) .
Based on these p(i,j), P (T) =Σ (p (i, j ) ) where i is included in T and j is not included in T, can be calculated. When T is changed, it is easy to add or subtract corresponding p^s to or from P. Consequently, also this quantity can be calculated cumulatively. For the tree in Fig. 4a, for instance the following will be obtained P(T) =p(l, 0)+p(l,2) +p(l,4) +p(3, 0) +p(3,2) +p(3,4) , where p(l,4)=0. When the similar tree in Fig. 4c is to be evaluated, a few easily executed changes are sufficient. Region 2 is changed from not being included to being included, and therefore all p's for region 2 will have to be used for the updating. Previously, p(2,l) and p(2,3) were included in P (T) , which means that they should be
subtracted, while p(2,0) and p(2,4), which were previously not included, are to be added relative to the just mentioned P (T) . The new P (T) will thus be P(T)=p(l,0)+p(l,2)+p(l,4)+p(3,0)+p(3,2)+p(3,4)-p(2,l)- p(2,3)+p(2,0)+p(2,4) = p (1, 0) +p(l,4) +p (2, 0) +p (2,4) +p (3, 0) +p (3,4) , which agrees with the definition of P(T) above.
In addition to compactness or another measure of the relationship between the perimeter and area of a segment region, other parameters can be included in the measuring function. Generally, the measuring function should "punish" segment regions with such parameters as deviate from expected parameters of objects of the searched type.
An example of such a parameter can be the size of the segment region. It may be expected that a white blood cell has a size within a certain range. The measuring function can reward segment regions in which the longest Euclidian distance d between two picture elements in the segment region is below a threshold value D. The distance d can quickly be calculated approximately by approximating the segments in the segment region as polygons. Alternatively, the total area of the segment region can be compared with one or more threshold values. One more alternative is to base the measuring function on the deviation of a longest distance or an area from a corre-
sponding mode or mean value. A function of the size of the segment region can generally be a factor or a term in the measuring function.
Another parameter can be the variation in appearance between different segments in the segment region. As a rule, segments with significantly different colourings or textures are probably not connected as an object. In general, measures of colouring or texture can then be calculated for each of the segments in a cluster. For a segment region, the variance is calculated for each of these measures. This variance can be included as a factor or term in the measuring function. A great variance indicates that the segments in a segment region do not belong to the same object. It should be noted that also variances can preferably be cumulatively calculated for segment regions. When a segment is added, this segments variance is also included, and the total is adjusted using the square mean values . DISTINGUISHING OF OBJECTS
On the basis of the values of the measuring function for different segment regions, distinguishing of the object can be carried out. In a typical process, the measuring function is first calculated for a segment region which comprises merely the start segment . Subsequently calculation can take place for segment regions which com-
prise the start segment and a further segment which is connected with the start segment . Then calculation for a segment region can take place, which comprises the start segment and two additional segments etc.
Basically, this can proceed as long as topologically possible and untested segment regions (including the start segment) are still in the cluster. For small clusters of segments, as the one shown in Fig. 2, this is also suitable. When all combinations have been tested, the object is distinguished as the segment region that obtains an optimally corresponding measuring function value .
For great clusters, however, this is less convenient. Great clusters can be found, for instance, in smears of bone marrow where white blood cells are often positioned very close to each other. It can be assumed that the size of the searched object will be below a certain threshold value and thus, as will be shown below, calculations will only be carried out for segment regions where the above-mentioned distance d is below the threshold value. It is also possible to make the threshold value dynamic, i.e. allow great distances if merely poor measuring function values have appeared and require smaller distances if good measuring function values have
appeared, which may indicate that the object has already been found .
Another alternative is to test different segment regions until a sufficiently good measuring function value has been obtained, which will be described below.
Fig. 5 shows a flow chart of a method 500 according to an embodiment of the invention. In a first step 501, a start segment is then selected as shown above. In a second step 502, the value of a measuring function is calculated for a segment region which contains the start segment. In the first calculation, the segment region may comprise merely the start segment. Subsequently it is checked in a third step whether segment regions which satisfy a size condition and for which measuring functions have not been calculated still remain. The size condition can be expressed in such manner that the above distance d is below a threshold value D.
If this is the case, the segment region is changed in a fourth step 504 into conformity with one of the remaining segment regions. This can be carried out by adding and/or subtracting a segment, optionally repeatedly. Then the second step 502 is repeated, where the measuring function for the modified segment region is calculated.
If no such segment regions remain, the segment region is distinguished in a fifth step 505, which segment
region has obtained the best (usually the greatest) measuring function value, as the searched object. Then this segment region can be analysed using, for example, image analysis .
In a sixth step 506, the distinguished segment region is excluded from the cluster of segments to which it belongs, so that calculations will henceforth not be carried out with segments in the distinguished segment region.
Then it is checked in a seventh step 507 whether segments remain in the cluster. If this is the case, the procedure is repeated from the first step 501. Otherwise, the process has been completed.
Fig. 6 shows a flow chart for a method 600 according to an alternative embodiment of the invention. In a first step 601, a start segment is selected, as shown above. In a second step 602, the value of a measuring function is calculated for a segment region containing the start segment. In the first calculation, the segment region can comprise merely the start segment . Then it is checked in a third step 603 whether the measuring function of the segment region exceeds (or with a different design of the measuring function, is below) a threshold value. This indicates that the segment region is "sufficiently similar to" an object of the searched type.
If this is not the case, the segment region is changed in a fourth step 604. This can be carried out by adding and/or subtracting a segment, optionally repeatedly. Then the second step 602 is repeated, where a measuring function for the modified segment region is calculated.
If the condition in the third step is satisfied, that segment region is distinguished in a fifth step 605 as the searched obj ect . Then the segment region can be analysed using, for instance, image analysis.
In a sixth step 606, the distinguished segment region is excluded from the cluster of segments to which it belongs, so that henceforth segments in the distinguished segment region will not be taken into consideration.
Subsequently it is checked in a seventh step 607 whether segments remain in the cluster. If this is the case, the procedure is repeated from the first step 601. Otherwise, the process has been completed.
In a preferred embodiment, the methods illustrated in Figs 5 and 6 can be repeated once or several times with new first choices of start segment . The correspondence between the results the different times can be checked. Optionally, statistical processing of the results can be carried out and distinguishing can be effected based on the results of this processing.
Fig. 7 illustrates a joining of the segments in
Fig. 2 according to an embodiment of the invention. As is evident from Fig. 7, the segments in Fig. 2 have been joined in a correct manner.
Fig. 9 shows a system 900 in which the method described above can be applied. The system 900 comprises a digital microscope 901 which is connected to a computer unit 902 with an associated display 903 and keyboard 904. The digital microscope 901 generates digital microscope images which are processed and analysed by the computer unit 902. The computer unit 902 can be integrated into the digital microscope 901, but it can also be positioned at a great distance therefrom and then obtain digital microscope images, for example, via the Internet.
Fig. 10 shows an arrangement 1000 for distinguishing a searched object according to an embodiment of the invention. Input data to the arrangement is an image which has been segmented in a segmenting module 1001 in such manner that oversegmentation can have taken place. In any case, the image may comprise a number of segments which are connected in a cluster. The arrangement comprises means 1002 for selecting a start segment. This carries out selection of the start segment in such manner as has been described above . Furthermore the arrangement comprises means 1003 for calculating a value of a measuring
function for at least a first segment region, which comprises the selected start segment in the set, as shown above. The arrangement further comprises means 1004 for distinguishing the object based on the value of the measuring function. Distinguished objects can be forwarded to an analysing module 1005 in which, for instance, image analysis can be carried out .
The means and modules illustrated in Fig. 10 can be implemented in terms of software in a computer unit 902 as the one shown in Fig. 9. In some cases it is also possible to implement a means in terms of hardware in the form of an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array) .
The invention is not restricted to the embodiments illustrated above and may be varied within the scope of the appended claims .