CN111553361A - Pathological section label identification method - Google Patents

Pathological section label identification method Download PDF

Info

Publication number
CN111553361A
CN111553361A CN202010199537.0A CN202010199537A CN111553361A CN 111553361 A CN111553361 A CN 111553361A CN 202010199537 A CN202010199537 A CN 202010199537A CN 111553361 A CN111553361 A CN 111553361A
Authority
CN
China
Prior art keywords
characters
pathological section
identification method
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010199537.0A
Other languages
Chinese (zh)
Other versions
CN111553361B (en
Inventor
王杰
郑众喜
向旭辉
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202010199537.0A priority Critical patent/CN111553361B/en
Publication of CN111553361A publication Critical patent/CN111553361A/en
Application granted granted Critical
Publication of CN111553361B publication Critical patent/CN111553361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a pathological section label identification method, which adopts a deep learning method to identify pathological section label images, wherein the basic network of a model adopted by the deep learning is a RetinaNet network based on ResNet-50 and a module used for helping the basic network to identify direction-sensitive characters, the module comprises a vertical self-attention mechanism branch, a horizontal self-attention mechanism branch and a middle branch, and the fusion method of the modules is as follows: o ═ Cvβ+Ch(1- β) (1) wherein in the formula (1), O represents an output and C representsvIndicating a vertical self-attentive mechanism branch, ChIndicating level fromNote that the mechanism branch, β, is the output of the middle branch.

Description

Pathological section label identification method
Technical Field
The invention relates to the field of medical detection, in particular to a pathological section label identification method.
Background
One of the current methods for pathological section label recognition is Optical Character Recognition (OCR). The mainstream OCR algorithms all comprise the following two steps:
1. detecting characters in a scene;
2. the detected text is identified.
The output of the first step in the above steps is usually the position information of a word or a line of characters, and the currently used technology is mostly based on a general target detection algorithm; second, the corresponding text is cut out of the image according to the detection result of the first step and scaled into a fixed-height image, and then recognized by using a CTC or attention mechanism based method, and they generally assume that the text satisfies the forward direction and is from left to right at the time of recognition. Most of the current research is focused on the first step and the main focus is on how to recognize irregular text.
The mainstream OCR algorithm is directly applied to the pathological section label recognition, and the following problems exist:
1. at present, the mainstream OCR technology needs a large amount of training data, usually the first step needs 10 k-50 k of marking data, and the second step needs more than 1000k of training data, and collection of pathological section data of the order is almost impossible, wherein the number of marking data used in the patent is less than 2000, which is far smaller than the data volume used by the mainstream OCR technology;
2. most of the mainstream OCR technology focuses on how to detect irregular characters, as shown in fig. 1, the labels of pathological sections are scanned by a digital section scanner, as shown in fig. 2, there is almost no deformation;
3. the characters in the label of the pathological section can be in any direction (different directions may exist in the same label at the same time), and the mainstream OCR technology has little interest in this aspect, and most OCR methods directly assume that the characters are arranged upwards and from left to right;
4. most of the mainstream OCR detection is natural language, the recognized target is a word, semantic correlation exists between words, characters in a pathological label have high randomness, and the correlation between the characters is small;
5. the technology that the part can directly process the character in any direction has the limitation of using scenes, such as that the character is generated in a fixed position according to a rule, a locator which needs to be assisted is required, a fixed font is used, and the like.
As described above, since the current mainstream OCR technology and tag recognition have great differences in data amount and attention point, it is not possible to achieve a good effect by directly using the OCR technology in tag recognition.
Disclosure of Invention
The invention aims to provide a pathological section label identification method which can correctly process characters in different directions.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention discloses a pathological section label identification method, which adopts a deep learning method to identify pathological section label images, wherein the basic network of a model adopted by the deep learning is a RetinaNet network based on ResNet-50 and a module used for helping the basic network to identify direction-sensitive characters, the module comprises a vertical self-attention mechanism branch, a horizontal self-attention mechanism branch and a middle branch, and the fusion method of the modules is as follows:
O=Cvβ+Ch(1-β) (1)
in formula (1): o represents an output, CvIndicating a vertical self-attentive mechanism branch, ChIndicating a horizontal self-attention mechanism branch, β is the output result of the middle branch.
Preferably, the ratio of the top layer Anchor box of the basic network is 1:1,1:7 and 7:1, and the ratio of the Anchor box of the middle layer is 1:1,1:5 and 5: 1; the bottom most layer has Anchor box ratios of 1:1,1:2 and 2: 1.
Preferably, the topmost output network and the middle output network of the model share weights, and the bottommost network uses a separate weight.
Preferably, the loss function of the training network is as follows:
L=Lcls(p,u)+λ[u≥1]Lloc(tu,v)+γLdre(p,w) (2)
in formula (2): l iscls(p,u)=-log puU is the type of the target box in the output result, wherein the category number of the background is 0, LlocIs the regression loss of the target box, Ldre(p,w)=-log pwW is the direction of the target box in the output result, and λ, γ are the weights of the corresponding losses.
Preferably, λ is 10 and γ is 1.
Preferably, the deep learning training phase processing steps are as follows:
step 1, preprocessing an input image;
step 2, carrying out random cutting, left-right turning, up-down turning, rotation at any angle, color disturbance, random brightness transformation and random noise addition on the preprocessed image to carry out data enhancement
Step 3, zooming the image processed in the step 2 into a fixed size;
step 4, forming a batch by the zoomed images;
step 5, forward propagation is carried out by using the model;
step 6, calculating loss by using a loss function, reversely propagating, and updating training parameters;
and 7, carrying out iterative training until the model converges.
Preferably, the prediction stage processing steps of the deep learning are as follows:
a. preprocessing an input image;
b. scaling the pre-processed image to a fixed size;
c. forward propagation using the model;
d. dividing the result output in the step c into two groups of words and characters;
e. aggregating the characters into words according to whether the words and the characters are overlapped;
f. counting the directions of all characters in the same word, and determining the direction of the current word by using a voting method;
g. arranging the characters in the words according to the direction of the words in sequence;
h. determining whether spaces exist among the characters according to the distance among the characters in the word, and if yes, adding the spaces;
i. and outputting the result.
Preferably, the pretreatment method comprises the following steps:
Figure RE-GDA0002573660630000041
in the formula (3), μ is a mean value of the image, and σ is a variance of the image.
Preferably, the fixed size is 512 by 512, and the number of sheets is 16.
The invention has the following beneficial effects:
1. the present invention requires only a very small number of training samples. Compared with the classical OCR, the network architecture of the invention is easier to train, and meanwhile, the invention uses the training methods of migration training, simulation data addition and the like to greatly reduce the requirement of the algorithm on samples, and less than 1400 training samples used at present are far less than the million-level requirement of the classical OCR.
2. The invention can correctly process characters in different directions. The algorithm of the invention uses a self-defined LineAttention module and increases direction prediction during output, and compared with the mainstream OCR algorithm (generally, characters are assumed to be arranged upwards and from left to right), the method can correctly process characters in different directions.
Drawings
FIG. 1 is a schematic view of a picture with irregular text;
FIG. 2 is an example of pathological section label data;
FIG. 3 is a model architecture diagram of the present invention;
FIG. 4 is a schematic diagram of a LineAttention module;
FIG. 5 is an exemplary graph of synthetic data samples;
FIG. 6 is a diagram illustrating the detection results.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
The invention discloses an algorithm for pathological section label character recognition (hereinafter referred to as label recognition). The algorithm is based on RetinaNet, but RetinaNet is designed for general target detection and cannot correctly identify characters in different directions, in order to identify characters in different directions, a direction prediction branch is newly added in network output, and simultaneously, in order to correctly process characters which are sensitive to directions, such as '6', '9', and the like in different directions, a unique line attention module is designed for effectively processing the characters which are sensitive to directions; another improvement point of the invention to RetinaNet is the special Anchorbox parameter setting for effectively handling the situation of larger aspect ratio in the text detection, and the invention is also adjusted in the aspect of the basic architecture of the model. After the individual characters are detected, the characters are combined into lines and output using a corresponding post-processing algorithm. The specific milk is as follows:
model architecture
The basic structure of the model is shown in FIG. 3. the invention uses RetinaNet [2] based on ResNet-50[3] as the basic network structure of the invention. However, RetinaNet is designed for general purpose target detection, and cannot achieve the optimal effect when being directly used for label character recognition. Therefore, the invention carries out the following improvement on RetinaNet:
the invention designs a module called 'LineAttention' (orange boxes in an architecture diagram) to help the model correctly recognize the direction-sensitive characters. FIG. 4 shows a specific structure of LineAttention, and the fusion (fusion) method in FIG. 4 is:
O=Cvβ+Ch(1-β) (1)
wherein O represents an output, CvShows the vertical attention mechanism branch (third branch in the block diagram), ChShowing the branch of the horizontal self-attention mechanism (the first branch in the block diagram), β being the output result of the intermediate sigmod branch]。
LineAttention can automatically detect the direction of a current character, and increase the recognition accuracy of the current character by analyzing adjacent characters in the same direction as the current character through correlation, and particularly has obvious effect on promoting characters with sensitive directions, such as ' 6 ', ' 9 ' and ' - ', ', and the like.
The RetinaNet model only outputs the position and the size of the target frame and the category information of the target, and the invention increases the direction information of the target in the output. The invention can accurately process the label data in different directions only after the direction information exists.
The invention is optimized on the Anchor box parameters of different output layers, and the proportion of the Anchor box at the topmost layer is 1:1,1:7 and 7: 1; the Anchor box ratios of the middle layer are 1:1,1:5 and 5: 1; the Anchor box ratios of the bottommost layer are 1:1,1:2 and 2:1, the topmost layer and the middle layer are dedicated to processing words with large aspect ratios, and the bottommost layer is dedicated to processing words with small aspect ratios and characters;
another difference from RetinaNet is that the topmost output network and the middle output network share the weight, and the bottommost network uses a single weight, so that the design is based on the assumption that the topmost and middle output networks are mainly used for detecting words, and the bottommost output network is mainly used for detecting characters, and the tasks are different, so that different weight sharing rules are designed, and RetinaNet does not have the requirement, so that all output layers of RetinaNet share the weight.
Loss function
The loss function used by the training network is defined as follows:
L=Lcls(p,u)+λ[u≥1]Lloc(tu,v)+γLdre(p,w) (2)
wherein L iscls(p,u)=-log puU is the type of the target frame in the output result (the category number of the background is 0), LlocRegression loss as target Box (vs Fast R-CNN [5 ]]The same definition). L isdre(p,w)=-log pwAnd w is the direction of the target frame in the output result. And lambda and gamma are corresponding lost weights, and in the experiment, the lambda is 10 and the gamma is 1.
Detailed processing steps
The invention relates to an algorithm based on deep learning, which is divided into a training (learning) stage and a prediction (using) stage, and the corresponding processing steps are respectively explained as follows:
step 1, preprocessing an input image, wherein the preprocessing method comprises the following steps:
Figure RE-GDA0002573660630000071
in the formula (3), μ is the mean value of the image, σ is the variance of the image, and img is the image;
step 2, carrying out random cutting, left-right turning, up-down turning, rotation at any angle, color disturbance, random brightness transformation and random noise addition on the preprocessed image to carry out data enhancement
Step 3, scaling the image processed in the step 2 into a fixed size (512 x 512);
step 4, forming a batch by a plurality of (16) zoomed images;
step 5, forward propagation is carried out by using the model;
step 6, calculating loss by using a loss function, reversely propagating, and updating training parameters;
and 7, carrying out iterative training until the model converges.
Preferably, the prediction stage processing steps of the deep learning are as follows:
a. the method comprises the following steps of preprocessing an input image, wherein the preprocessing method comprises the following steps:
Figure RE-GDA0002573660630000072
in the formula (3), μ is the mean value of the image, σ is the variance of the image, and img is the image;
b. scaling the pre-processed image to a fixed size (512 x 512);
c. forward propagation using the model;
d. dividing the result output in the step c into two groups of words and characters;
e. aggregating the characters into words according to whether the words and the characters are overlapped;
f. counting the directions of all characters in the same word, and determining the direction of the current word by using a voting method;
g. arranging the characters in the words according to the direction of the words in sequence;
h. determining whether spaces exist among the characters according to the distance among the characters in the word, and if yes, adding the spaces;
i. and outputting the result.
Results of the experiment
In the experiment we used 1900 more than 1900 medical slice data from more than ten hospitals as samples, 1400 as training data and 500 as test data. For deep learning, 1400 samples are very few, and we use the following method to alleviate the data shortage problem:
1. the model is pre-trained on COCO 6, and then is transferred to the label character recognition problem;
2. as shown in fig. 5, we automatically generated about 50000 samples using the program, but the weight of the automatically generated samples at the time of training was 1/30 for the real samples;
3. and data enhancement methods such as random up-down turning, random left-right turning, random rotation, random color disturbance, random brightness disturbance and the like are used.
The final properties of our model are shown in Table 1
TABLE 1 model characters and test results
Number of samples tested Rate of accuracy Recall rate Rate of accuracy of direction mAP@0.5
500 96.5% 95.7% 95.9% 93.1%
Through our post-processing algorithm, if only the label samples are classified, the classification is Her-2, Ki-67, ER, PR and the like. Automatic classification of the tags may provide the necessary prerequisites for subsequent automatic processing of digital pathological sections. The test results of the model are shown in table 2:
TABLE 2 model Classification results
Number of samples tested Rate of accuracy Recall rate
925 100.0% 97.5%
As shown in fig. 6 as an example of the detection result, the colors of the target boxes in fig. 6 represent different directions, such as yellow for right, blue for up, green for left, and the text in the label may be in any direction, if simple character-level detection using a general target detector such as RetinaNet is unable to correctly distinguish the direction-sensitive characters such as "6", "9" and "-", "_" and the like, we can correctly distinguish the direction-sensitive characters with the help of lineattention module.
The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.
The prior art documents to which the present invention relates are as follows:
[1].Yuliang L,Lianwen J,Shuaitao Z,et al.Detecting Curve Text in theWild:New Dataset and New Solution[J].2017.
[2].Lin T Y,Goyal P,Girshick R,et al.Focal Loss for Dense ObjectDetection[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,PP(99):2999-3007.
[3].Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun.Deep ResidualLearning for Image Recognition.The IEEE Conference on Computer Vision andPattern Recognition(CVPR),2016,pp.770-778
[4].A.Vaswani,N.Shazeer,N.Parmar,J.Uszkoreit,L.Jones,A.N.Gomez,L.Kaiser,and I.Polosukhin.Attention is all you need.In Neural InformationProcessing Systems(NIPS),2017.2,3,6
[5].R.Girshick,“Fast R-CNN,”in IEEE International Conference onComputer Vision(ICCV),2015.
[6].T.-Y.Lin,M.Maire,S.Belongie,J.Hays,P.Perona,D.Ramanan,P.Dollár,and C.L.Zitnick.Microsoft coco: Common objects in context.In EuropeanConference on Computer Vision,pages 740–755.Springer,2014.4。

Claims (9)

1. a pathological section label identification method is characterized in that: the pathological section label image is identified by adopting a deep learning method, a basic network of a model adopted by the deep learning is a RetinaNet network based on ResNet-50, and a module for helping the basic network to identify direction-sensitive characters is adopted, the module comprises a vertical self-attention mechanism branch, a horizontal self-attention mechanism branch and a middle branch, and the fusion method of the module is as follows:
O=Cvβ+Ch(1-β) (1)
in formula (1): o represents an output, CvIndicating a vertical self-attentive mechanism branch, ChIndicating a horizontal self-attention mechanism branch, β is the output result of the middle branch.
2. The pathological section tag identification method according to claim 1, wherein: the ratio of the Anchor box at the topmost layer of the model is 1:1,1:7 and 7:1, and the ratio of the Anchor box at the middle layer is 1:1,1:5 and 5: 1; the bottom most layer has Anchor box ratios of 1:1,1:2 and 2: 1.
3. The pathological section tag identification method according to claim 1, wherein: the topmost output network and the middle output network of the base network share weights, and the bottommost network uses a single weight.
4. The pathological section tag identification method according to any one of claims 1 to 3, wherein: the loss function of the training network is as follows:
L=Lcls(p,u)+λ[u≥1]Lloc(tu,v)+γLdre(p,w) (2)
in formula (2): l iscls(p,u)=-logpuU is the type of the target box in the output result, wherein the category number of the background is 0, LlocIs the regression loss of the target box, Ldre(p,w)=-logpwW is the direction of the target box in the output result, and λ, γ are the weights of the corresponding losses.
5. The pathological section tag identification method according to claim 4, wherein: λ is 10 and γ is 1.
6. The pathological section tag identification method according to claim 4, wherein: the deep learning training phase comprises the following processing steps:
step 1, preprocessing an input image;
step 2, carrying out random cutting, left-right turning, up-down turning, rotation at any angle, color disturbance, random brightness transformation and random noise addition on the preprocessed image to carry out data enhancement;
step 3, zooming the image processed in the step 2 into a fixed size;
step 4, forming a batch by the zoomed images;
step 5, forward propagation is carried out by using the model;
step 6, calculating loss by using a loss function, reversely propagating, and updating training parameters;
and 7, carrying out iterative training until the model converges.
7. The pathological section tag identification method according to claim 6, wherein: the prediction stage processing steps of the deep learning are as follows:
a. preprocessing an input image;
b. scaling the pre-processed image to a fixed size;
c. forward propagation using the model;
d. dividing the result output in the step c into two groups of words and characters;
e. aggregating the characters into words according to whether the words and the characters are overlapped;
f. counting the directions of all characters in the same word, and determining the direction of the current word by using a voting method;
g. arranging the characters in the words according to the direction of the words in sequence;
h. determining whether spaces exist among the characters according to the distance among the characters in the word, and if yes, adding the spaces;
i. and outputting the result.
8. The pathological section tag identification method according to claim 6 or 7, wherein: the pretreatment method comprises the following steps:
Figure FDA0002418171650000021
in the formula (3), μ is a mean value of the image, and σ is a variance of the image.
9. The pathological section tag identification method according to claim 6 or 7, wherein: the fixed size is 512 x 512, and the number of sheets is 16.
CN202010199537.0A 2020-03-19 2020-03-19 Pathological section label identification method Active CN111553361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010199537.0A CN111553361B (en) 2020-03-19 2020-03-19 Pathological section label identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010199537.0A CN111553361B (en) 2020-03-19 2020-03-19 Pathological section label identification method

Publications (2)

Publication Number Publication Date
CN111553361A true CN111553361A (en) 2020-08-18
CN111553361B CN111553361B (en) 2022-11-01

Family

ID=72001858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010199537.0A Active CN111553361B (en) 2020-03-19 2020-03-19 Pathological section label identification method

Country Status (1)

Country Link
CN (1) CN111553361B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634279A (en) * 2020-12-02 2021-04-09 四川大学华西医院 Medical image semantic segmentation method based on attention Unet model
CN114648680A (en) * 2022-05-17 2022-06-21 腾讯科技(深圳)有限公司 Training method, device, equipment, medium and program product of image recognition model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180246873A1 (en) * 2017-02-28 2018-08-30 Cisco Technology, Inc. Deep Learning Bias Detection in Text
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109697414A (en) * 2018-12-13 2019-04-30 北京金山数字娱乐科技有限公司 A kind of text positioning method and device
CN109753954A (en) * 2018-11-14 2019-05-14 安徽艾睿思智能科技有限公司 The real-time positioning identifying method of text based on deep learning attention mechanism
CN109977861A (en) * 2019-03-25 2019-07-05 中国科学技术大学 Offline handwritten form method for identifying mathematical formula
CN110245657A (en) * 2019-05-17 2019-09-17 清华大学 Pathological image similarity detection method and detection device
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110781305A (en) * 2019-10-30 2020-02-11 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180246873A1 (en) * 2017-02-28 2018-08-30 Cisco Technology, Inc. Deep Learning Bias Detection in Text
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109753954A (en) * 2018-11-14 2019-05-14 安徽艾睿思智能科技有限公司 The real-time positioning identifying method of text based on deep learning attention mechanism
CN110569832A (en) * 2018-11-14 2019-12-13 安徽艾睿思智能科技有限公司 text real-time positioning and identifying method based on deep learning attention mechanism
CN109697414A (en) * 2018-12-13 2019-04-30 北京金山数字娱乐科技有限公司 A kind of text positioning method and device
CN109977861A (en) * 2019-03-25 2019-07-05 中国科学技术大学 Offline handwritten form method for identifying mathematical formula
CN110245657A (en) * 2019-05-17 2019-09-17 清华大学 Pathological image similarity detection method and detection device
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN110781305A (en) * 2019-10-30 2020-02-11 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HONGTAO XIE等: "Convolutional attention networks for scene text recognition", 《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS, AND APPLICATIONS》 *
徐清泉: "基于注意力机制的中文识别算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
牛作东等: "引入注意力机制的自然场景文本检测算法研究", 《计算机应用与软件》 *
郑众喜: "拥抱数字病理时代", 《实用医院临床杂志》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634279A (en) * 2020-12-02 2021-04-09 四川大学华西医院 Medical image semantic segmentation method based on attention Unet model
CN114648680A (en) * 2022-05-17 2022-06-21 腾讯科技(深圳)有限公司 Training method, device, equipment, medium and program product of image recognition model

Also Published As

Publication number Publication date
CN111553361B (en) 2022-11-01

Similar Documents

Publication Publication Date Title
US7570816B2 (en) Systems and methods for detecting text
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
Meier et al. Fully convolutional neural networks for newspaper article segmentation
Jain et al. Unconstrained scene text and video text recognition for arabic script
CN102385592B (en) Image concept detection method and device
CN113343989B (en) Target detection method and system based on self-adaption of foreground selection domain
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN111553361B (en) Pathological section label identification method
CN109213886B (en) Image retrieval method and system based on image segmentation and fuzzy pattern recognition
CN114663904A (en) PDF document layout detection method, device, equipment and medium
CN113361432A (en) Video character end-to-end detection and identification method based on deep learning
Feng et al. Robust shared feature learning for script and handwritten/machine-printed identification
Nguyen TableSegNet: a fully convolutional network for table detection and segmentation in document images
Li et al. Image pattern recognition in identification of financial bills risk management
Xue Optical character recognition
US20230154217A1 (en) Method for Recognizing Text, Apparatus and Terminal Device
Zhang et al. Text extraction for historical Tibetan document images based on connected component analysis and corner point detection
CN113205049A (en) Document identification method and identification system
Rabby et al. A novel deep learning character-level solution to detect language and printing style from a bilingual scanned document
Rani et al. Object Detection in Natural Scene Images Using Thresholding Techniques
Ding et al. Improving GAN-based feature extraction for hyperspectral images classification
Bumbu On classification of 17th century fonts using neural networks
Hotwani et al. Hybrid models for offline handwritten character recognition system without using any prior database images
US11972626B2 (en) Extracting multiple documents from single image
Zhou et al. Region selection model with saliency constraint for fine-grained recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant