CN109460768B

CN109460768B - Text detection and removal method for histopathology microscopic image

Info

Publication number: CN109460768B
Application number: CN201811361398.6A
Authority: CN
Inventors: 李晨; 薛丹; 姚育东; 许宁
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2021-09-21
Anticipated expiration: 2038-11-15
Also published as: CN109460768A

Abstract

The invention relates to the technical field of medical microscopic image processing, in particular to a text detection and removal method for histopathology microscopic images, which comprises the following steps: preprocessing an input image under different channels respectively; then, carrying out image segmentation to obtain sub-images; carrying out binarization processing on the sub-images and then carrying out projection analysis to obtain candidate text regions; performing feature extraction on the candidate text region by using Haar wavelet transform; performing first-level classification on the candidate text regions to obtain the determined text regions and the characteristics thereof; secondly, performing secondary classification on the determined text area to obtain a single text word and the characteristics of the single text word; and removing the determined single text word, outputting the result of each channel, and combining the images output by all the channels to obtain a final result. The invention applies the character detection and removal technology to the histopathology microscopic image, and removes primary obstacles for the follow-up research of the histopathology microscopic image.

Description

Text detection and removal method for histopathology microscopic image

Technical Field

The invention relates to the technical field of image processing, in particular to a text detection and removal method for histopathology microscopic images.

Background

The microscopic cell Image analysis system composed of microscopic images by adopting a computer technology is a very hot topic in the current international computer technology application, particularly, the development speed of the computer technology is very high, and the software function for development and application is continuously upgraded, so that a certain number of scientific research institutions and higher schools in China are applied to the application research of the microscopic cell Image analysis system in the fields of medicine, biology and the like, and various general Image processing analysis systems are provided, and the professional Image cell analysis technology (Image Cytometry, ICM, namely the technology specially used for quantitative analysis research of molecular levels of biological tissue cells and even gene units) is much more complicated from hardware and software composition to system structure than the current general Image processing analysis system. The basic process of microscopic image processing is as follows: (1) making slices; (2) collecting microscopic cell images; (3) preprocessing an image; (4) image segmentation; (5) extracting characteristic parameters; (6) carrying out statistical analysis; (7) and outputting the result.

For the current medical diagnosis, the analysis of the histopathological microscopic image plays an important role in the diagnosis of cancer, but at present, some histopathological microscopic images often have some character labeling information, the characters on the images can generate interference effect on the analysis of the images, the interference effect is mainly embodied in two aspects, on one hand, the interference effect is generated when the images are segmented by a computer, and on the other hand, the judgment of the disease condition is realized when the characters labeling part can cover the influence of key parts by blocking the images. It is therefore highly desirable to process it to remove text from the microscopic image to aid in the continued study of histopathological microscopic images. For removing text from an image, the prior art uses region-based text detection methods and texture-based methods for text detection.

For region-based text detection, one is that Shim et al uses the homogeneity of the intensity of the text regions of an image to merge pixels with similar gray levels into one group, remove the larger region as the background, sharpen the text regions by performing region boundary analysis using gray level contrast, then verify the candidate regions using size, area, fill factor and contrast, and examine neighboring text regions to extract any text strings; another is that r.jiang et al introduces a new connection assembly (CC) method, which works as follows: first, the input image is decomposed into CCs using a color clustering algorithm. To segment text from the background, a two-stage classification module is employed, first verifying all CCs by a cascade classifier, and then further classifying the remaining components by a Support Vector Machine (SVM).

For a text detection method based on textures, one is a text detection method based on two-step textures proposed by d.chen, a machine learning localization scheme is adopted, which mainly comprises two steps: the first step is to quickly locate potential text regions with low rejection rate and reasonable accuracy, and the second step is to apply machine learning verification to ignore false positives; the other method is a video frame text detection method with mixed characteristics, which is proposed by Z.Ji et al, and comprises the steps of firstly scanning a small overlapped sliding window on an image, extracting language-independent, texture-based and edge-based characteristics from the image, then classifying each window into a text window or a non-text window through an SVM classifier, then judging each small block into a text or a non-text by using a voting mechanism, and finally accurately positioning a text region through morphological filtering.

However, with the above prior art, there is a requirement that the contrast of the background of the text is high, the edge density of the text contour region is higher than that of other parts of the image, and the text in the image is required to have different texture attributes to be distinguished from the background.

Therefore, it is highly desirable to provide a text detection and removal technique for histopathological microscopic images with a small computational burden.

Disclosure of Invention

Technical problem to be solved

In order to solve the above problems in the prior art, the present invention provides a text detection and removal method for histopathology microscopic images with less image extraction feature calculation burden.

(II) technical scheme

In order to achieve the purpose, the invention adopts the main technical scheme that:

the invention provides a text detection and removal method for a histopathology microscopic image, which comprises the following steps:

step S1: preprocessing an input image under different channels respectively;

step S2: carrying out image segmentation on each preprocessed channel image to obtain a sub-image of each channel;

step S3: performing binarization processing on the sub-image of each channel obtained in the step S2, and then performing projection analysis on the processed sub-image to obtain a candidate text region;

step S4: performing feature extraction on the candidate text region obtained in step S3 to obtain a candidate text region feature, where performing feature extraction on the candidate text region includes: performing primary decomposition on the candidate text region through Haar wavelet, collecting sample points according to a max-min combination rule, obtaining the characteristics of the sample points to obtain the texture characteristics of the candidate text region, and then performing dimension reduction processing on the texture characteristics of the candidate text region to obtain the characteristics of the candidate text region;

step S5: performing first-level classification on the candidate text regions according to the candidate text region characteristics obtained in the step S4 to determine the text regions and obtain the text region characteristics at the same time;

step S6: performing second-level classification according to the text region characteristics obtained in the step S5 to determine a single text word and obtain the characteristics of the single text word;

step S7: and removing the single text word determined in the step S6 according to the single text word characteristics obtained in the step S6, outputting the result of each channel after the single text word is removed, combining all the channels, and combining the output images to obtain the final result.

According to the present invention, when performing projection analysis in step S3, horizontal projection and vertical projection are performed on the sub-image respectively to obtain horizontal projection and vertical projection of the sub-image, and the intersection of the horizontal projection and the vertical projection is marked as a candidate text region, and if no sub-image is marked as a candidate text region, the result is determined that the input image does not contain text.

According to the invention, before executing step S5, a first support vector machine is pre-trained by using a text region of a training image and a background image to obtain a first-stage classification model;

accordingly, step S5 includes: performing first-level classification by using the necessary texture features of the candidate text regions obtained in the step S4 as input vectors of a first-level classification model to obtain determined text regions and the features of the determined text regions;

before step S6, training a second support vector machine by using text words of the training images and the background images to obtain a second-stage classification model;

accordingly, step S6 includes: and S5, performing second-level classification by taking the determined text region characteristics obtained in the step S5 as input vectors of a second-level classification model to obtain a single text word and obtain the characteristics of the single text word.

According to the present invention, the image segmentation in step S2 is specifically to iteratively process each channel image with a sliding window of 100 × 100 pixels in a sliding step of 100 pixels to obtain a sub-image of each channel.

According to the invention, the image input in step S1 is a histopathology microscopic image, the different channels are three RGB channels, and the preprocessing is to perform size normalization processing on the three RGB channels and apply wavelet transform to remove noise.

According to the present invention, the wavelet transform used for removing noise in step S1 can be replaced with a neighborhood averaging method or a wiener filtering method.

According to the invention, in the step S4, the dimensionality reduction process is to perform principal component analysis by using a covariance method to remove redundancy of texture features of the candidate text regions, so as to obtain candidate text region features.

According to the present invention, the principal component analysis using the covariance method in step S4 can be replaced by one of the methods of LAD, SVD, t-SNE, LASSO, wavelet analysis, laplacian mapping, and sparse coding.

(III) advantageous effects

The invention has the beneficial effects that:

(1) the text detection and removal method for the histopathology microscopic image converts basic research into practical technology, applies an algorithm combining region-based text detection and texture-based text detection, applies character detection and removal technology to the histopathology microscopic image, provides a method for removing character annotations from the histopathology microscopic image, is beneficial to distinguishing images, and improves the medical quality of diagnosis according to the histopathology microscopic image.

(2) The feature extraction algorithm used by the method is based on two-dimensional Haar wavelet decomposition, is an extension of a co-occurrence histogram method, improves the feature extraction algorithm developed by P.S. Hiremath and S.Shivashankar, reduces the 384 finally obtained features to 45 features through principal component analysis by adopting a covariance method, greatly reduces the calculated amount, avoids the condition of feature overlapping and enables the result of subsequently classifying the features to be more accurate.

(3) The extracted features are classified by using a support vector machine, false alarms in the candidate text regions are eliminated, the determined text regions are obtained, the support vector machine trained by using the text words of the training images and the background images is used for further classification, the single text word in the text regions is obtained, and the accuracy of character recognition in the images is improved.

Drawings

FIG. 1 is a flow chart of the overall scheme of the present invention;

FIG. 2 is a schematic illustration of a feature extraction algorithm of the present invention;

FIG. 3(a) is an original image that is demonstrated by the image text removal process of the present invention;

FIG. 3(b) is a text diagram of an image text removal process demonstration of the present invention;

FIG. 3(c) is a diagram of a processing result of an image text removal process demonstration of the present invention;

FIG. 4(a) is an R-channel original image according to an embodiment of the present invention;

FIG. 4(b) is an image after the removal of R-channel text according to the embodiment of the present invention;

FIG. 5(a) is a G-channel original image according to an embodiment of the present invention;

FIG. 5(b) is a diagram of an image with G-channel text removed according to an embodiment of the present invention;

FIG. 6(a) is a B-channel original image according to an embodiment of the present invention;

fig. 6(B) is a B-channel text-removed image according to an embodiment of the present invention.

Detailed Description

For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.

Example 1

step S1: preprocessing an input image under different channels respectively;

step S5: performing first-level classification on the candidate text regions according to the candidate text region characteristics obtained in the step S4 to obtain determined text regions and simultaneously obtain determined text region characteristics;

step S6: performing second-level classification according to the determined text region characteristics obtained in the step S5 to obtain a single text word and obtain the characteristics of the single text word;

Specifically, the image acquired in step S1 is a histopathological microscopic image, and as shown in fig. 3(a), the histopathological microscopic image acquired, that is, fig. 3(a), is preprocessed in the RGB three channels, that is, as shown in fig. 4(a), fig. 5(a), and fig. 6(a), respectively; the preprocessing mainly comprises size normalization processing, namely, the sizes of the image pixels of all channels are kept consistent through the size normalization processing, the abnormal results caused by different sizes of the image pixels are prevented, and the noise is removed by using wavelet transformation.

Specifically, in step S2, image segmentation is performed on the RGB three channel images preprocessed in step S1, and specifically, each channel image, that is, fig. 4(a), fig. 5(a), and fig. 6(a), is iteratively processed by a sliding window with a size of 100 × 100 pixels and a sliding step of 100 pixels, so as to obtain sub-images.

Specifically, in step S3, after the binarization processing is performed on each sub-image obtained in step S2, horizontal projection and vertical projection analysis are performed on the processed sub-image to obtain horizontal projection and vertical projection, an intersection of the horizontal projection and the vertical projection is marked as a candidate text region, and if no sub-image is marked as a candidate text region, it is determined that the input image does not contain text, and the following steps are not performed.

Specifically, when the feature extraction is performed in step S4, the Haar wavelet transform is used to calculate texture features of the candidate text region, and the principal component analysis is performed by using the covariance method to perform the dimension reduction processing, which is to remove redundancy from the texture features of the candidate text region to reduce unnecessary texture features, so as to further obtain necessary texture features of the candidate text region, where the necessary texture features of the candidate text region are the final result obtained in this step, that is, the candidate text region features.

Specifically, before step S5 is executed, a first support vector machine is pre-trained using a text region of a training image and a background image to obtain a first-level classification model;

accordingly, step S5 includes: and (5) performing first-level classification by taking the candidate text region characteristics obtained in the step (S4) as input vectors of a first support vector machine to obtain the determined text region and obtain the determined text region characteristics.

The support vector machine in step S5 is specifically configured to distinguish between a text region and a non-text region to eliminate false alarms and finally determine whether the text region marked as a candidate is a determined text region or a non-text region, and since machine recognition may mistake a non-text region as a text region, the false alarms, i.e., the actual non-text region in the candidate text region, are eliminated by the first-level classification, so as to obtain the determined text region and obtain the determined text region features at the same time.

Specifically, before step S6, a second support vector machine is trained by using text words of a training image and a background image to obtain a second-stage classification model;

accordingly, step S6 includes: and performing secondary classification on the determined characteristics of the text region obtained in the step S5 as an input vector of a second support vector machine to exclude the part of the text region which is not the text word, further obtaining a single text word and simultaneously obtaining the characteristics of the single text word.

Specifically, step S7 includes: the single text word determined in step S6 is removed as shown in fig. 3(b), the result of each channel from which the single text word is removed is output as shown in fig. 4(b), 5(b), and 6(b), the three RGB channels are merged, and the merged output image is the final result as shown in fig. 3 (c).

As shown in fig. 1, wherein "image input, preprocessing" belongs to step S1, "projection analysis" belongs to step S2, and "image segmentation" belongs to step S3; "feature extraction" belongs to step S4; "first-level classification, determining that the text is non-text" belongs to step S5; "second-level classification, determining that a single text word" belongs to step S6; "text removal, output result" belongs to step S7.

Specifically, in the step S4, the sub-image marked as the candidate text region in the step S3 is subjected to texture feature extraction, and since the number of texture features finally obtained through Haar wavelet transform is large, a phenomenon of feature overlap occurs, and the result is incorrect, the method is improved in that the principal component analysis is performed by adopting a covariance method, so that the texture features finally calculated through Haar wavelet transform can be subjected to dimension reduction processing, the condition of feature overlap caused by large feature quantity is eliminated, and the calculation load is reduced.

More specifically, when the Haar wavelet transform is used for feature extraction in step S4, the current feature extraction algorithm is improved as follows:

(1) firstly, inputting image X and its complement

Performing a first-level decomposition on the input image X by using a Haar wavelet, where the image X is the candidate text region in step S3 as described above, and obtaining an approximation coefficient (a) and a detail coefficient: horizontal (H), vertical (V) and diagonal (D), i.e. the input image is divided into four sub-images, which are: a low-frequency image (a), a vertical high-frequency image (V), a horizontal high-frequency image (H) and a diagonal high-frequency image (D), which sub-images are necessary for calculating a co-occurrence histogram, or in other words, which sub-images uniquely determine the original image, the co-occurrence histogram being constructed by means of different wavelet coefficients of the image.

The combinations considered are (a, V), (a, H), (a, D), (a, | (dhv) |). The features of each texture block are computed separately by considering the 3x3 neighborhood, where a is the angle and d is the distance, with the midpoint of the 3x3 neighborhood as the origin and the translation vector t [ a, d ] representing the 3x3 neighborhood. The distance of the translation vector is set as a unit distance (i.e., d is 1), the angles of the translation vector are 8, i.e., a is 0 °, 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, 315 °, where the angle is 0 ° corresponding to the position of the aperture circle in a or H in fig. 2, and the remaining angles correspond to the orientations of the 3x3 neighborhood xi alignment center x in a or the 3x3 neighborhood yi alignment center y in H in fig. 2.

The following description will be made by taking a pair (a, H) of histogram calculation and feature extraction method as an example, as shown in fig. 2.

(2) Two histograms F are constructed for A according to the following max-min combination rule₁And F₂:

α＝max(min(x,h_i)，min(y,a_i))

x∈F₁,ifα＝min(x,h_i)

x∈F₂,ifα＝min(y,a_i)

Where i 1, 2, 8 in each field, building a according to the above rules yields 16 histograms.

(3) Next, each histogram F of the 16 histograms obtained in (1) above is accumulated to obtain a CH (cumulative histogram), the CH is Normalized so that the value of the histogram is in a range from 0 to 1 to obtain an NCH (Normalized cumulative histogram), and a point on the NCH (NCH) is calculated₁,nch₂，…，nch₂₅₆) As sample points, from the sample points, the following three features are calculated:

feature a, average slope between two sample points for the entire NCH:

in the formula: s_nchAverage slope over the entire NCH; slopei denotes a slope between two sample points in the ith combination, i ═ 1, 2, 3, 4 denotes that the ith combination includes (a, V), (a, H), (a, D), (a, | (dhv) |);

feature b, mean of NCH sample points:

in the formula: nch_iIs a sample point, i represents a sample point number, μ_nchMean values for NCH sample points;

characteristic c, standard deviation:

in the formula: nch_iIs a sample point, i represents a sample point number, μ_nchIs the mean value of the NCH sample points, D_nchIs the standard deviation of the NCH sample points.

(4) Repeating the above steps for the remaining combinations (a, V), (a, D), (a, | (dhv) |) to obtain features, this results in 192 features (4 combinations × 3 features × 16 histograms).

(5) Then, for the complementary image

The above steps are repeated, so that the final result is 384 features (i.e., 2 images × 4 combinations × 3 features × 16 histograms), which are the texture features of the candidate text regions in step S4 as described above.

The prior method is to directly use 384 features to form a feature vector for subsequent classification processing, but the feature vector is large, so that the features are overlapped, and the result is incorrect.

Therefore, the invention is improved here by carrying out the following steps:

(6) performing dimensionality reduction on the obtained 384 features, performing principal component analysis by using a covariance method, that is, linearly combining the 384 features to obtain a new feature subset so as to achieve an effect of reducing dimensionality, simply as for a result, further performing redundancy reduction on the 384 features to 45 features, and performing subsequent classification processing by using 45 features instead of the 384 features, where the 45 features are the result finally obtained in the step S4, that is, the candidate text region features.

Alternatively, the principal component analysis can be performed by Linear Discriminant Analysis (LDA), SVD, t-SNE, LASSO, wavelet analysis, laplace mapping, sparse coding instead of using covariance method.

Because the number of the features is reduced, the calculated amount is greatly reduced, the condition of feature overlapping can be avoided, and the result of subsequently classifying the features is more accurate.

Example 2

The present embodiment is slightly different from embodiment 1 in the following points:

alternatively, the wavelet transform method used for denoising by preprocessing in step S1 can be replaced by a neighborhood averaging method or a wiener filtering method.

Optionally, in step S5, the first classification model can be obtained by training with a Random Forest (Random Forest) algorithm or an ANN deep learning network instead of the first support vector machine.

The invention converts basic research into practical technology, applies an algorithm combining text detection based on region and text detection based on texture, applies character detection and removal technology to a histopathology microscopic image, provides a method for removing character annotation on the histopathology microscopic image, is beneficial to distinguishing the image and improving the diagnosis quality according to the image, compared with the prior art, the provided method has smaller requirement on the edge density of a text outline region, has smaller requirement on the texture attribute of the text in the image, has small calculation burden on extracting the texture characteristic of the image, provides some experiences for the follow-up research on the histopathology microscopic image processing technology, and is beneficial to the development on the histopathology microscopic image processing technology.

It should be understood that the above description of specific embodiments of the present invention is only for the purpose of illustrating the technical lines and features of the present invention, and is intended to enable those skilled in the art to understand the contents of the present invention and to implement the present invention, but the present invention is not limited to the above specific embodiments. It is intended that all such changes and modifications as fall within the scope of the appended claims be embraced therein.

Claims

1. A text detection and removal method for histopathological microscopic images is characterized by comprising the following steps:

step S1: preprocessing an input image under different channels respectively;

step S4: performing feature extraction on the candidate text region obtained in step S3 to obtain a candidate text region feature, where performing feature extraction on the candidate text region includes: performing primary decomposition on the candidate text region through a Haar wavelet, collecting sample points according to a max-min combination rule, solving the characteristics of the sample points to obtain candidate text region texture characteristics, and performing dimension reduction processing on the candidate text region texture characteristics to obtain candidate text region characteristics;

step S7: and removing the single text word determined in the step S6, outputting the result of each channel after the single text word is removed, combining all the channels, and outputting the combined image as a final result.

2. The method of text detection and removal for histopathological microscopic images according to claim 1,

when performing projection analysis in step S3, performing horizontal projection and vertical projection on the sub-image respectively to obtain horizontal projection and vertical projection of the sub-image, marking the intersection of the horizontal projection and the vertical projection as a candidate text region, and if no sub-image is marked as a candidate text region, determining that the input image does not contain text as a result.

3. The method of text detection and removal for histopathological microscopic images according to claim 1,

before executing step S5, pre-training a first support vector machine using a text region of a training image and a background image to obtain a first-stage classification model;

accordingly, step S5 includes: performing first-level classification by using the candidate text region characteristics obtained in the step S4 as input vectors of a first-level classification model to obtain a determined text region and characteristics of the determined text region;

accordingly, step S6 includes: and S5, performing second-level classification by taking the determined characteristics of the text area obtained in the step S5 as an input vector of a second-level classification model to obtain a single text word and obtain the characteristics of the single text word.

4. The method of text detection and removal for histopathological microscopic images according to claim 1,

in step S2, the image segmentation is specifically to iteratively process each channel image by using a sliding window with a size of 100 × 100 pixels and a sliding step size of 100 pixels, so as to obtain a sub-image of each channel.

5. The method of text detection and removal for histopathological microscopic images according to claim 1,

the image input in the step S1 is a histopathology microscopic image, the different channels are three RGB channels, and the preprocessing includes performing size normalization on the three RGB channels and removing noise by applying wavelet transform.

6. The method of text detection and removal for histopathological microscopic images according to claim 5,

the wavelet transform used for removing noise in step S1 can be replaced with a neighborhood averaging method or a wiener filtering method.

7. The method of text detection and removal for histopathological microscopic images according to claim 1,

in the step S4, the dimensionality reduction process is to perform principal component analysis by using a covariance method to remove redundancy of the texture features of the candidate text regions, so as to obtain the candidate text region features.

8. The method of text detection and removal for histopathological microscopic images according to claim 7,

the principal component analysis using the covariance method described in step S4 can be replaced with one of the methods of LAD, SVD, t-SNE, LASSO, wavelet analysis, laplacian mapping, and sparse coding.