CN109460768B - Text detection and removal method for histopathology microscopic image - Google Patents

Text detection and removal method for histopathology microscopic image Download PDF

Info

Publication number
CN109460768B
CN109460768B CN201811361398.6A CN201811361398A CN109460768B CN 109460768 B CN109460768 B CN 109460768B CN 201811361398 A CN201811361398 A CN 201811361398A CN 109460768 B CN109460768 B CN 109460768B
Authority
CN
China
Prior art keywords
text
image
text region
candidate
candidate text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811361398.6A
Other languages
Chinese (zh)
Other versions
CN109460768A (en
Inventor
李晨
薛丹
姚育东
许宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201811361398.6A priority Critical patent/CN109460768B/en
Publication of CN109460768A publication Critical patent/CN109460768A/en
Application granted granted Critical
Publication of CN109460768B publication Critical patent/CN109460768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention relates to the technical field of medical microscopic image processing, in particular to a text detection and removal method for histopathology microscopic images, which comprises the following steps: preprocessing an input image under different channels respectively; then, carrying out image segmentation to obtain sub-images; carrying out binarization processing on the sub-images and then carrying out projection analysis to obtain candidate text regions; performing feature extraction on the candidate text region by using Haar wavelet transform; performing first-level classification on the candidate text regions to obtain the determined text regions and the characteristics thereof; secondly, performing secondary classification on the determined text area to obtain a single text word and the characteristics of the single text word; and removing the determined single text word, outputting the result of each channel, and combining the images output by all the channels to obtain a final result. The invention applies the character detection and removal technology to the histopathology microscopic image, and removes primary obstacles for the follow-up research of the histopathology microscopic image.

Description

Text detection and removal method for histopathology microscopic image
Technical Field
The invention relates to the technical field of image processing, in particular to a text detection and removal method for histopathology microscopic images.
Background
The microscopic cell Image analysis system composed of microscopic images by adopting a computer technology is a very hot topic in the current international computer technology application, particularly, the development speed of the computer technology is very high, and the software function for development and application is continuously upgraded, so that a certain number of scientific research institutions and higher schools in China are applied to the application research of the microscopic cell Image analysis system in the fields of medicine, biology and the like, and various general Image processing analysis systems are provided, and the professional Image cell analysis technology (Image Cytometry, ICM, namely the technology specially used for quantitative analysis research of molecular levels of biological tissue cells and even gene units) is much more complicated from hardware and software composition to system structure than the current general Image processing analysis system. The basic process of microscopic image processing is as follows: (1) making slices; (2) collecting microscopic cell images; (3) preprocessing an image; (4) image segmentation; (5) extracting characteristic parameters; (6) carrying out statistical analysis; (7) and outputting the result.
For the current medical diagnosis, the analysis of the histopathological microscopic image plays an important role in the diagnosis of cancer, but at present, some histopathological microscopic images often have some character labeling information, the characters on the images can generate interference effect on the analysis of the images, the interference effect is mainly embodied in two aspects, on one hand, the interference effect is generated when the images are segmented by a computer, and on the other hand, the judgment of the disease condition is realized when the characters labeling part can cover the influence of key parts by blocking the images. It is therefore highly desirable to process it to remove text from the microscopic image to aid in the continued study of histopathological microscopic images. For removing text from an image, the prior art uses region-based text detection methods and texture-based methods for text detection.
For region-based text detection, one is that Shim et al uses the homogeneity of the intensity of the text regions of an image to merge pixels with similar gray levels into one group, remove the larger region as the background, sharpen the text regions by performing region boundary analysis using gray level contrast, then verify the candidate regions using size, area, fill factor and contrast, and examine neighboring text regions to extract any text strings; another is that r.jiang et al introduces a new connection assembly (CC) method, which works as follows: first, the input image is decomposed into CCs using a color clustering algorithm. To segment text from the background, a two-stage classification module is employed, first verifying all CCs by a cascade classifier, and then further classifying the remaining components by a Support Vector Machine (SVM).
For a text detection method based on textures, one is a text detection method based on two-step textures proposed by d.chen, a machine learning localization scheme is adopted, which mainly comprises two steps: the first step is to quickly locate potential text regions with low rejection rate and reasonable accuracy, and the second step is to apply machine learning verification to ignore false positives; the other method is a video frame text detection method with mixed characteristics, which is proposed by Z.Ji et al, and comprises the steps of firstly scanning a small overlapped sliding window on an image, extracting language-independent, texture-based and edge-based characteristics from the image, then classifying each window into a text window or a non-text window through an SVM classifier, then judging each small block into a text or a non-text by using a voting mechanism, and finally accurately positioning a text region through morphological filtering.
However, with the above prior art, there is a requirement that the contrast of the background of the text is high, the edge density of the text contour region is higher than that of other parts of the image, and the text in the image is required to have different texture attributes to be distinguished from the background.
Therefore, it is highly desirable to provide a text detection and removal technique for histopathological microscopic images with a small computational burden.
Disclosure of Invention
Technical problem to be solved
In order to solve the above problems in the prior art, the present invention provides a text detection and removal method for histopathology microscopic images with less image extraction feature calculation burden.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
the invention provides a text detection and removal method for a histopathology microscopic image, which comprises the following steps:
step S1: preprocessing an input image under different channels respectively;
step S2: carrying out image segmentation on each preprocessed channel image to obtain a sub-image of each channel;
step S3: performing binarization processing on the sub-image of each channel obtained in the step S2, and then performing projection analysis on the processed sub-image to obtain a candidate text region;
step S4: performing feature extraction on the candidate text region obtained in step S3 to obtain a candidate text region feature, where performing feature extraction on the candidate text region includes: performing primary decomposition on the candidate text region through Haar wavelet, collecting sample points according to a max-min combination rule, obtaining the characteristics of the sample points to obtain the texture characteristics of the candidate text region, and then performing dimension reduction processing on the texture characteristics of the candidate text region to obtain the characteristics of the candidate text region;
step S5: performing first-level classification on the candidate text regions according to the candidate text region characteristics obtained in the step S4 to determine the text regions and obtain the text region characteristics at the same time;
step S6: performing second-level classification according to the text region characteristics obtained in the step S5 to determine a single text word and obtain the characteristics of the single text word;
step S7: and removing the single text word determined in the step S6 according to the single text word characteristics obtained in the step S6, outputting the result of each channel after the single text word is removed, combining all the channels, and combining the output images to obtain the final result.
According to the present invention, when performing projection analysis in step S3, horizontal projection and vertical projection are performed on the sub-image respectively to obtain horizontal projection and vertical projection of the sub-image, and the intersection of the horizontal projection and the vertical projection is marked as a candidate text region, and if no sub-image is marked as a candidate text region, the result is determined that the input image does not contain text.
According to the invention, before executing step S5, a first support vector machine is pre-trained by using a text region of a training image and a background image to obtain a first-stage classification model;
accordingly, step S5 includes: performing first-level classification by using the necessary texture features of the candidate text regions obtained in the step S4 as input vectors of a first-level classification model to obtain determined text regions and the features of the determined text regions;
before step S6, training a second support vector machine by using text words of the training images and the background images to obtain a second-stage classification model;
accordingly, step S6 includes: and S5, performing second-level classification by taking the determined text region characteristics obtained in the step S5 as input vectors of a second-level classification model to obtain a single text word and obtain the characteristics of the single text word.
According to the present invention, the image segmentation in step S2 is specifically to iteratively process each channel image with a sliding window of 100 × 100 pixels in a sliding step of 100 pixels to obtain a sub-image of each channel.
According to the invention, the image input in step S1 is a histopathology microscopic image, the different channels are three RGB channels, and the preprocessing is to perform size normalization processing on the three RGB channels and apply wavelet transform to remove noise.
According to the present invention, the wavelet transform used for removing noise in step S1 can be replaced with a neighborhood averaging method or a wiener filtering method.
According to the invention, in the step S4, the dimensionality reduction process is to perform principal component analysis by using a covariance method to remove redundancy of texture features of the candidate text regions, so as to obtain candidate text region features.
According to the present invention, the principal component analysis using the covariance method in step S4 can be replaced by one of the methods of LAD, SVD, t-SNE, LASSO, wavelet analysis, laplacian mapping, and sparse coding.
(III) advantageous effects
The invention has the beneficial effects that:
(1) the text detection and removal method for the histopathology microscopic image converts basic research into practical technology, applies an algorithm combining region-based text detection and texture-based text detection, applies character detection and removal technology to the histopathology microscopic image, provides a method for removing character annotations from the histopathology microscopic image, is beneficial to distinguishing images, and improves the medical quality of diagnosis according to the histopathology microscopic image.
(2) The feature extraction algorithm used by the method is based on two-dimensional Haar wavelet decomposition, is an extension of a co-occurrence histogram method, improves the feature extraction algorithm developed by P.S. Hiremath and S.Shivashankar, reduces the 384 finally obtained features to 45 features through principal component analysis by adopting a covariance method, greatly reduces the calculated amount, avoids the condition of feature overlapping and enables the result of subsequently classifying the features to be more accurate.
(3) The extracted features are classified by using a support vector machine, false alarms in the candidate text regions are eliminated, the determined text regions are obtained, the support vector machine trained by using the text words of the training images and the background images is used for further classification, the single text word in the text regions is obtained, and the accuracy of character recognition in the images is improved.
Drawings
FIG. 1 is a flow chart of the overall scheme of the present invention;
FIG. 2 is a schematic illustration of a feature extraction algorithm of the present invention;
FIG. 3(a) is an original image that is demonstrated by the image text removal process of the present invention;
FIG. 3(b) is a text diagram of an image text removal process demonstration of the present invention;
FIG. 3(c) is a diagram of a processing result of an image text removal process demonstration of the present invention;
FIG. 4(a) is an R-channel original image according to an embodiment of the present invention;
FIG. 4(b) is an image after the removal of R-channel text according to the embodiment of the present invention;
FIG. 5(a) is a G-channel original image according to an embodiment of the present invention;
FIG. 5(b) is a diagram of an image with G-channel text removed according to an embodiment of the present invention;
FIG. 6(a) is a B-channel original image according to an embodiment of the present invention;
fig. 6(B) is a B-channel text-removed image according to an embodiment of the present invention.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
Example 1
The invention provides a text detection and removal method for a histopathology microscopic image, which comprises the following steps:
step S1: preprocessing an input image under different channels respectively;
step S2: carrying out image segmentation on each preprocessed channel image to obtain a sub-image of each channel;
step S3: performing binarization processing on the sub-image of each channel obtained in the step S2, and then performing projection analysis on the processed sub-image to obtain a candidate text region;
step S4: performing feature extraction on the candidate text region obtained in step S3 to obtain a candidate text region feature, where performing feature extraction on the candidate text region includes: performing primary decomposition on the candidate text region through Haar wavelet, collecting sample points according to a max-min combination rule, obtaining the characteristics of the sample points to obtain the texture characteristics of the candidate text region, and then performing dimension reduction processing on the texture characteristics of the candidate text region to obtain the characteristics of the candidate text region;
step S5: performing first-level classification on the candidate text regions according to the candidate text region characteristics obtained in the step S4 to obtain determined text regions and simultaneously obtain determined text region characteristics;
step S6: performing second-level classification according to the determined text region characteristics obtained in the step S5 to obtain a single text word and obtain the characteristics of the single text word;
step S7: and removing the single text word determined in the step S6 according to the single text word characteristics obtained in the step S6, outputting the result of each channel after the single text word is removed, combining all the channels, and combining the output images to obtain the final result.
Specifically, the image acquired in step S1 is a histopathological microscopic image, and as shown in fig. 3(a), the histopathological microscopic image acquired, that is, fig. 3(a), is preprocessed in the RGB three channels, that is, as shown in fig. 4(a), fig. 5(a), and fig. 6(a), respectively; the preprocessing mainly comprises size normalization processing, namely, the sizes of the image pixels of all channels are kept consistent through the size normalization processing, the abnormal results caused by different sizes of the image pixels are prevented, and the noise is removed by using wavelet transformation.
Specifically, in step S2, image segmentation is performed on the RGB three channel images preprocessed in step S1, and specifically, each channel image, that is, fig. 4(a), fig. 5(a), and fig. 6(a), is iteratively processed by a sliding window with a size of 100 × 100 pixels and a sliding step of 100 pixels, so as to obtain sub-images.
Specifically, in step S3, after the binarization processing is performed on each sub-image obtained in step S2, horizontal projection and vertical projection analysis are performed on the processed sub-image to obtain horizontal projection and vertical projection, an intersection of the horizontal projection and the vertical projection is marked as a candidate text region, and if no sub-image is marked as a candidate text region, it is determined that the input image does not contain text, and the following steps are not performed.
Specifically, when the feature extraction is performed in step S4, the Haar wavelet transform is used to calculate texture features of the candidate text region, and the principal component analysis is performed by using the covariance method to perform the dimension reduction processing, which is to remove redundancy from the texture features of the candidate text region to reduce unnecessary texture features, so as to further obtain necessary texture features of the candidate text region, where the necessary texture features of the candidate text region are the final result obtained in this step, that is, the candidate text region features.
Specifically, before step S5 is executed, a first support vector machine is pre-trained using a text region of a training image and a background image to obtain a first-level classification model;
accordingly, step S5 includes: and (5) performing first-level classification by taking the candidate text region characteristics obtained in the step (S4) as input vectors of a first support vector machine to obtain the determined text region and obtain the determined text region characteristics.
The support vector machine in step S5 is specifically configured to distinguish between a text region and a non-text region to eliminate false alarms and finally determine whether the text region marked as a candidate is a determined text region or a non-text region, and since machine recognition may mistake a non-text region as a text region, the false alarms, i.e., the actual non-text region in the candidate text region, are eliminated by the first-level classification, so as to obtain the determined text region and obtain the determined text region features at the same time.
Specifically, before step S6, a second support vector machine is trained by using text words of a training image and a background image to obtain a second-stage classification model;
accordingly, step S6 includes: and performing secondary classification on the determined characteristics of the text region obtained in the step S5 as an input vector of a second support vector machine to exclude the part of the text region which is not the text word, further obtaining a single text word and simultaneously obtaining the characteristics of the single text word.
Specifically, step S7 includes: the single text word determined in step S6 is removed as shown in fig. 3(b), the result of each channel from which the single text word is removed is output as shown in fig. 4(b), 5(b), and 6(b), the three RGB channels are merged, and the merged output image is the final result as shown in fig. 3 (c).
As shown in fig. 1, wherein "image input, preprocessing" belongs to step S1, "projection analysis" belongs to step S2, and "image segmentation" belongs to step S3; "feature extraction" belongs to step S4; "first-level classification, determining that the text is non-text" belongs to step S5; "second-level classification, determining that a single text word" belongs to step S6; "text removal, output result" belongs to step S7.
Specifically, in the step S4, the sub-image marked as the candidate text region in the step S3 is subjected to texture feature extraction, and since the number of texture features finally obtained through Haar wavelet transform is large, a phenomenon of feature overlap occurs, and the result is incorrect, the method is improved in that the principal component analysis is performed by adopting a covariance method, so that the texture features finally calculated through Haar wavelet transform can be subjected to dimension reduction processing, the condition of feature overlap caused by large feature quantity is eliminated, and the calculation load is reduced.
More specifically, when the Haar wavelet transform is used for feature extraction in step S4, the current feature extraction algorithm is improved as follows:
(1) firstly, inputting image X and its complement
Figure GDA0003161348480000081
Performing a first-level decomposition on the input image X by using a Haar wavelet, where the image X is the candidate text region in step S3 as described above, and obtaining an approximation coefficient (a) and a detail coefficient: horizontal (H), vertical (V) and diagonal (D), i.e. the input image is divided into four sub-images, which are: a low-frequency image (a), a vertical high-frequency image (V), a horizontal high-frequency image (H) and a diagonal high-frequency image (D), which sub-images are necessary for calculating a co-occurrence histogram, or in other words, which sub-images uniquely determine the original image, the co-occurrence histogram being constructed by means of different wavelet coefficients of the image.
The combinations considered are (a, V), (a, H), (a, D), (a, | (dhv) |). The features of each texture block are computed separately by considering the 3x3 neighborhood, where a is the angle and d is the distance, with the midpoint of the 3x3 neighborhood as the origin and the translation vector t [ a, d ] representing the 3x3 neighborhood. The distance of the translation vector is set as a unit distance (i.e., d is 1), the angles of the translation vector are 8, i.e., a is 0 °, 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, 315 °, where the angle is 0 ° corresponding to the position of the aperture circle in a or H in fig. 2, and the remaining angles correspond to the orientations of the 3x3 neighborhood xi alignment center x in a or the 3x3 neighborhood yi alignment center y in H in fig. 2.
The following description will be made by taking a pair (a, H) of histogram calculation and feature extraction method as an example, as shown in fig. 2.
(2) Two histograms F are constructed for A according to the following max-min combination rule1And F2:
α=max(min(x,hi),min(y,ai))
x∈F1,ifα=min(x,hi)
x∈F2,ifα=min(y,ai)
Where i 1, 2, 8 in each field, building a according to the above rules yields 16 histograms.
(3) Next, each histogram F of the 16 histograms obtained in (1) above is accumulated to obtain a CH (cumulative histogram), the CH is Normalized so that the value of the histogram is in a range from 0 to 1 to obtain an NCH (Normalized cumulative histogram), and a point on the NCH (NCH) is calculated1,nch2,…,nch256) As sample points, from the sample points, the following three features are calculated:
feature a, average slope between two sample points for the entire NCH:
Figure GDA0003161348480000091
in the formula: snchAverage slope over the entire NCH; slopei denotes a slope between two sample points in the ith combination, i ═ 1, 2, 3, 4 denotes that the ith combination includes (a, V), (a, H), (a, D), (a, | (dhv) |);
feature b, mean of NCH sample points:
Figure GDA0003161348480000101
in the formula: nchiIs a sample point, i represents a sample point number, μnchMean values for NCH sample points;
characteristic c, standard deviation:
Figure GDA0003161348480000102
in the formula: nchiIs a sample point, i represents a sample point number, μnchIs the mean value of the NCH sample points, DnchIs the standard deviation of the NCH sample points.
(4) Repeating the above steps for the remaining combinations (a, V), (a, D), (a, | (dhv) |) to obtain features, this results in 192 features (4 combinations × 3 features × 16 histograms).
(5) Then, for the complementary image
Figure GDA0003161348480000103
The above steps are repeated, so that the final result is 384 features (i.e., 2 images × 4 combinations × 3 features × 16 histograms), which are the texture features of the candidate text regions in step S4 as described above.
The prior method is to directly use 384 features to form a feature vector for subsequent classification processing, but the feature vector is large, so that the features are overlapped, and the result is incorrect.
Therefore, the invention is improved here by carrying out the following steps:
(6) performing dimensionality reduction on the obtained 384 features, performing principal component analysis by using a covariance method, that is, linearly combining the 384 features to obtain a new feature subset so as to achieve an effect of reducing dimensionality, simply as for a result, further performing redundancy reduction on the 384 features to 45 features, and performing subsequent classification processing by using 45 features instead of the 384 features, where the 45 features are the result finally obtained in the step S4, that is, the candidate text region features.
Alternatively, the principal component analysis can be performed by Linear Discriminant Analysis (LDA), SVD, t-SNE, LASSO, wavelet analysis, laplace mapping, sparse coding instead of using covariance method.
Because the number of the features is reduced, the calculated amount is greatly reduced, the condition of feature overlapping can be avoided, and the result of subsequently classifying the features is more accurate.
Example 2
The present embodiment is slightly different from embodiment 1 in the following points:
alternatively, the wavelet transform method used for denoising by preprocessing in step S1 can be replaced by a neighborhood averaging method or a wiener filtering method.
Optionally, in step S5, the first classification model can be obtained by training with a Random Forest (Random Forest) algorithm or an ANN deep learning network instead of the first support vector machine.
The invention converts basic research into practical technology, applies an algorithm combining text detection based on region and text detection based on texture, applies character detection and removal technology to a histopathology microscopic image, provides a method for removing character annotation on the histopathology microscopic image, is beneficial to distinguishing the image and improving the diagnosis quality according to the image, compared with the prior art, the provided method has smaller requirement on the edge density of a text outline region, has smaller requirement on the texture attribute of the text in the image, has small calculation burden on extracting the texture characteristic of the image, provides some experiences for the follow-up research on the histopathology microscopic image processing technology, and is beneficial to the development on the histopathology microscopic image processing technology.
It should be understood that the above description of specific embodiments of the present invention is only for the purpose of illustrating the technical lines and features of the present invention, and is intended to enable those skilled in the art to understand the contents of the present invention and to implement the present invention, but the present invention is not limited to the above specific embodiments. It is intended that all such changes and modifications as fall within the scope of the appended claims be embraced therein.

Claims (8)

1. A text detection and removal method for histopathological microscopic images is characterized by comprising the following steps:
step S1: preprocessing an input image under different channels respectively;
step S2: carrying out image segmentation on each preprocessed channel image to obtain a sub-image of each channel;
step S3: performing binarization processing on the sub-image of each channel obtained in the step S2, and then performing projection analysis on the processed sub-image to obtain a candidate text region;
step S4: performing feature extraction on the candidate text region obtained in step S3 to obtain a candidate text region feature, where performing feature extraction on the candidate text region includes: performing primary decomposition on the candidate text region through a Haar wavelet, collecting sample points according to a max-min combination rule, solving the characteristics of the sample points to obtain candidate text region texture characteristics, and performing dimension reduction processing on the candidate text region texture characteristics to obtain candidate text region characteristics;
step S5: performing first-level classification on the candidate text regions according to the candidate text region characteristics obtained in the step S4 to obtain determined text regions and simultaneously obtain determined text region characteristics;
step S6: performing second-level classification according to the determined text region characteristics obtained in the step S5 to obtain a single text word and obtain the characteristics of the single text word;
step S7: and removing the single text word determined in the step S6, outputting the result of each channel after the single text word is removed, combining all the channels, and outputting the combined image as a final result.
2. The method of text detection and removal for histopathological microscopic images according to claim 1,
when performing projection analysis in step S3, performing horizontal projection and vertical projection on the sub-image respectively to obtain horizontal projection and vertical projection of the sub-image, marking the intersection of the horizontal projection and the vertical projection as a candidate text region, and if no sub-image is marked as a candidate text region, determining that the input image does not contain text as a result.
3. The method of text detection and removal for histopathological microscopic images according to claim 1,
before executing step S5, pre-training a first support vector machine using a text region of a training image and a background image to obtain a first-stage classification model;
accordingly, step S5 includes: performing first-level classification by using the candidate text region characteristics obtained in the step S4 as input vectors of a first-level classification model to obtain a determined text region and characteristics of the determined text region;
before step S6, training a second support vector machine by using text words of the training images and the background images to obtain a second-stage classification model;
accordingly, step S6 includes: and S5, performing second-level classification by taking the determined characteristics of the text area obtained in the step S5 as an input vector of a second-level classification model to obtain a single text word and obtain the characteristics of the single text word.
4. The method of text detection and removal for histopathological microscopic images according to claim 1,
in step S2, the image segmentation is specifically to iteratively process each channel image by using a sliding window with a size of 100 × 100 pixels and a sliding step size of 100 pixels, so as to obtain a sub-image of each channel.
5. The method of text detection and removal for histopathological microscopic images according to claim 1,
the image input in the step S1 is a histopathology microscopic image, the different channels are three RGB channels, and the preprocessing includes performing size normalization on the three RGB channels and removing noise by applying wavelet transform.
6. The method of text detection and removal for histopathological microscopic images according to claim 5,
the wavelet transform used for removing noise in step S1 can be replaced with a neighborhood averaging method or a wiener filtering method.
7. The method of text detection and removal for histopathological microscopic images according to claim 1,
in the step S4, the dimensionality reduction process is to perform principal component analysis by using a covariance method to remove redundancy of the texture features of the candidate text regions, so as to obtain the candidate text region features.
8. The method of text detection and removal for histopathological microscopic images according to claim 7,
the principal component analysis using the covariance method described in step S4 can be replaced with one of the methods of LAD, SVD, t-SNE, LASSO, wavelet analysis, laplacian mapping, and sparse coding.
CN201811361398.6A 2018-11-15 2018-11-15 Text detection and removal method for histopathology microscopic image Active CN109460768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811361398.6A CN109460768B (en) 2018-11-15 2018-11-15 Text detection and removal method for histopathology microscopic image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811361398.6A CN109460768B (en) 2018-11-15 2018-11-15 Text detection and removal method for histopathology microscopic image

Publications (2)

Publication Number Publication Date
CN109460768A CN109460768A (en) 2019-03-12
CN109460768B true CN109460768B (en) 2021-09-21

Family

ID=65610574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811361398.6A Active CN109460768B (en) 2018-11-15 2018-11-15 Text detection and removal method for histopathology microscopic image

Country Status (1)

Country Link
CN (1) CN109460768B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110797097B (en) * 2019-10-11 2020-10-16 武汉兰丁智能医学股份有限公司 Artificial intelligence cloud diagnosis platform
CN111310758A (en) * 2020-02-13 2020-06-19 上海眼控科技股份有限公司 Text detection method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593277A (en) * 2008-05-30 2009-12-02 电子科技大学 A kind of complicated color image Chinese version zone automatic positioning method and device
CN101833664A (en) * 2010-04-21 2010-09-15 中国科学院自动化研究所 Video image character detecting method based on sparse expression
CN102081731A (en) * 2009-11-26 2011-06-01 中国移动通信集团广东有限公司 Method and device for extracting text from image
CN107038409A (en) * 2016-02-03 2017-08-11 斯特拉德视觉公司 Method, device and the computer readable recording medium storing program for performing of contained text in detection image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG10201510667SA (en) * 2012-06-27 2016-01-28 Agency Science Tech & Res Text detection devices and text detection methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593277A (en) * 2008-05-30 2009-12-02 电子科技大学 A kind of complicated color image Chinese version zone automatic positioning method and device
CN102081731A (en) * 2009-11-26 2011-06-01 中国移动通信集团广东有限公司 Method and device for extracting text from image
CN101833664A (en) * 2010-04-21 2010-09-15 中国科学院自动化研究所 Video image character detecting method based on sparse expression
CN107038409A (en) * 2016-02-03 2017-08-11 斯特拉德视觉公司 Method, device and the computer readable recording medium storing program for performing of contained text in detection image

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ROBUST APPROACH TO DETECT AND LOCALIZE TEXT FROM NATURAL SCENE IMAGES;Khushbu C. Saner;《International Journal of Electrical, Electronics and Computer Engineering》;20141231;第3卷(第2期);全文 *
Text extraction from gray scale document images using edge information;Q.Yuan etc.;《IEEExplorer》;20011231;全文 *
Texture characteristic extraction of medical images based on pyramid structure wavelet transform;Shurong Liu etc.;《201O International Conference On Computer Design And Appliations》;20101231;第1卷;全文 *
基于Gabor滤波与边缘特征的场景文字检测;邓勇等;《计算机应用与软件》;20121231;第29卷(第12期);全文 *
拉曼光谱成像技术及其在生物医学中的应用;姚育东等;《中国激光》;20180331;第45卷(第03期);全文 *
视频图像中文本区域提取算法的研究;颜子夜;《万方数据知识服务平台》;20130427;全文 *

Also Published As

Publication number Publication date
CN109460768A (en) 2019-03-12

Similar Documents

Publication Publication Date Title
US10943346B2 (en) Multi-sample whole slide image processing in digital pathology via multi-resolution registration and machine learning
US10839510B2 (en) Methods and systems for human tissue analysis using shearlet transforms
Jung et al. An automatic nuclei segmentation method based on deep convolutional neural networks for histopathology images
CN110334706B (en) Image target identification method and device
Karatzas et al. ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email)
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
Xu et al. An efficient technique for nuclei segmentation based on ellipse descriptor analysis and improved seed detection algorithm
Wu et al. Research on image text recognition based on canny edge detection algorithm and k-means algorithm
CN108537751B (en) Thyroid ultrasound image automatic segmentation method based on radial basis function neural network
CN109376717A (en) Personal identification method, device, electronic equipment and the storage medium of face comparison
De Automatic data extraction from 2D and 3D pie chart images
Yindumathi et al. Analysis of image classification for text extraction from bills and invoices
CN109460768B (en) Text detection and removal method for histopathology microscopic image
Somasundaram et al. Automatic segmentation of nuclei from pap smear cell images: A step toward cervical cancer screening
Han et al. Segmenting images with complex textures by using hybrid algorithm
Lin The cell image segmentation based on the KL transform and OTSU method
Sertel et al. An image analysis approach for detecting malignant cells in digitized H&E-stained histology images of follicular lymphoma
Lina et al. White blood cells detection from unstained microscopic images using modified watershed segmentation
CN113850792A (en) Cell classification counting method and system based on computer vision
Ajao et al. Yoruba handwriting word recognition quality evaluation of preprocessing attributes using information theory approach
CN114581928A (en) Form identification method and system
Gim et al. A novel framework for white blood cell segmentation based on stepwise rules and morphological features
Reddy et al. Detection of Lung cancer using Digital image Processing techniques and Artificial Neural Networks
Salvi et al. cyto‐Knet: An instance segmentation approach for multiple myeloma plasma cells using conditional kernels
Murthy et al. A Novel method for efficient text extraction from real time images with diversified background using haar discrete wavelet transform and k-means clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant