CN111553361A - Pathological section label identification method - Google Patents
Pathological section label identification method Download PDFInfo
- Publication number
- CN111553361A CN111553361A CN202010199537.0A CN202010199537A CN111553361A CN 111553361 A CN111553361 A CN 111553361A CN 202010199537 A CN202010199537 A CN 202010199537A CN 111553361 A CN111553361 A CN 111553361A
- Authority
- CN
- China
- Prior art keywords
- characters
- pathological section
- identification method
- network
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention discloses a pathological section label identification method, which adopts a deep learning method to identify pathological section label images, wherein the basic network of a model adopted by the deep learning is a RetinaNet network based on ResNet-50 and a module used for helping the basic network to identify direction-sensitive characters, the module comprises a vertical self-attention mechanism branch, a horizontal self-attention mechanism branch and a middle branch, and the fusion method of the modules is as follows: o ═ Cvβ+Ch(1- β) (1) wherein in the formula (1), O represents an output and C representsvIndicating a vertical self-attentive mechanism branch, ChIndicating level fromNote that the mechanism branch, β, is the output of the middle branch.
Description
Technical Field
The invention relates to the field of medical detection, in particular to a pathological section label identification method.
Background
One of the current methods for pathological section label recognition is Optical Character Recognition (OCR). The mainstream OCR algorithms all comprise the following two steps:
1. detecting characters in a scene;
2. the detected text is identified.
The output of the first step in the above steps is usually the position information of a word or a line of characters, and the currently used technology is mostly based on a general target detection algorithm; second, the corresponding text is cut out of the image according to the detection result of the first step and scaled into a fixed-height image, and then recognized by using a CTC or attention mechanism based method, and they generally assume that the text satisfies the forward direction and is from left to right at the time of recognition. Most of the current research is focused on the first step and the main focus is on how to recognize irregular text.
The mainstream OCR algorithm is directly applied to the pathological section label recognition, and the following problems exist:
1. at present, the mainstream OCR technology needs a large amount of training data, usually the first step needs 10 k-50 k of marking data, and the second step needs more than 1000k of training data, and collection of pathological section data of the order is almost impossible, wherein the number of marking data used in the patent is less than 2000, which is far smaller than the data volume used by the mainstream OCR technology;
2. most of the mainstream OCR technology focuses on how to detect irregular characters, as shown in fig. 1, the labels of pathological sections are scanned by a digital section scanner, as shown in fig. 2, there is almost no deformation;
3. the characters in the label of the pathological section can be in any direction (different directions may exist in the same label at the same time), and the mainstream OCR technology has little interest in this aspect, and most OCR methods directly assume that the characters are arranged upwards and from left to right;
4. most of the mainstream OCR detection is natural language, the recognized target is a word, semantic correlation exists between words, characters in a pathological label have high randomness, and the correlation between the characters is small;
5. the technology that the part can directly process the character in any direction has the limitation of using scenes, such as that the character is generated in a fixed position according to a rule, a locator which needs to be assisted is required, a fixed font is used, and the like.
As described above, since the current mainstream OCR technology and tag recognition have great differences in data amount and attention point, it is not possible to achieve a good effect by directly using the OCR technology in tag recognition.
Disclosure of Invention
The invention aims to provide a pathological section label identification method which can correctly process characters in different directions.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention discloses a pathological section label identification method, which adopts a deep learning method to identify pathological section label images, wherein the basic network of a model adopted by the deep learning is a RetinaNet network based on ResNet-50 and a module used for helping the basic network to identify direction-sensitive characters, the module comprises a vertical self-attention mechanism branch, a horizontal self-attention mechanism branch and a middle branch, and the fusion method of the modules is as follows:
O=Cvβ+Ch(1-β) (1)
in formula (1): o represents an output, CvIndicating a vertical self-attentive mechanism branch, ChIndicating a horizontal self-attention mechanism branch, β is the output result of the middle branch.
Preferably, the ratio of the top layer Anchor box of the basic network is 1:1,1:7 and 7:1, and the ratio of the Anchor box of the middle layer is 1:1,1:5 and 5: 1; the bottom most layer has Anchor box ratios of 1:1,1:2 and 2: 1.
Preferably, the topmost output network and the middle output network of the model share weights, and the bottommost network uses a separate weight.
Preferably, the loss function of the training network is as follows:
L=Lcls(p,u)+λ[u≥1]Lloc(tu,v)+γLdre(p,w) (2)
in formula (2): l iscls(p,u)=-log puU is the type of the target box in the output result, wherein the category number of the background is 0, LlocIs the regression loss of the target box, Ldre(p,w)=-log pwW is the direction of the target box in the output result, and λ, γ are the weights of the corresponding losses.
Preferably, λ is 10 and γ is 1.
Preferably, the deep learning training phase processing steps are as follows:
step 5, forward propagation is carried out by using the model;
step 6, calculating loss by using a loss function, reversely propagating, and updating training parameters;
and 7, carrying out iterative training until the model converges.
Preferably, the prediction stage processing steps of the deep learning are as follows:
a. preprocessing an input image;
b. scaling the pre-processed image to a fixed size;
c. forward propagation using the model;
d. dividing the result output in the step c into two groups of words and characters;
e. aggregating the characters into words according to whether the words and the characters are overlapped;
f. counting the directions of all characters in the same word, and determining the direction of the current word by using a voting method;
g. arranging the characters in the words according to the direction of the words in sequence;
h. determining whether spaces exist among the characters according to the distance among the characters in the word, and if yes, adding the spaces;
i. and outputting the result.
Preferably, the pretreatment method comprises the following steps:
in the formula (3), μ is a mean value of the image, and σ is a variance of the image.
Preferably, the fixed size is 512 by 512, and the number of sheets is 16.
The invention has the following beneficial effects:
1. the present invention requires only a very small number of training samples. Compared with the classical OCR, the network architecture of the invention is easier to train, and meanwhile, the invention uses the training methods of migration training, simulation data addition and the like to greatly reduce the requirement of the algorithm on samples, and less than 1400 training samples used at present are far less than the million-level requirement of the classical OCR.
2. The invention can correctly process characters in different directions. The algorithm of the invention uses a self-defined LineAttention module and increases direction prediction during output, and compared with the mainstream OCR algorithm (generally, characters are assumed to be arranged upwards and from left to right), the method can correctly process characters in different directions.
Drawings
FIG. 1 is a schematic view of a picture with irregular text;
FIG. 2 is an example of pathological section label data;
FIG. 3 is a model architecture diagram of the present invention;
FIG. 4 is a schematic diagram of a LineAttention module;
FIG. 5 is an exemplary graph of synthetic data samples;
FIG. 6 is a diagram illustrating the detection results.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
The invention discloses an algorithm for pathological section label character recognition (hereinafter referred to as label recognition). The algorithm is based on RetinaNet, but RetinaNet is designed for general target detection and cannot correctly identify characters in different directions, in order to identify characters in different directions, a direction prediction branch is newly added in network output, and simultaneously, in order to correctly process characters which are sensitive to directions, such as '6', '9', and the like in different directions, a unique line attention module is designed for effectively processing the characters which are sensitive to directions; another improvement point of the invention to RetinaNet is the special Anchorbox parameter setting for effectively handling the situation of larger aspect ratio in the text detection, and the invention is also adjusted in the aspect of the basic architecture of the model. After the individual characters are detected, the characters are combined into lines and output using a corresponding post-processing algorithm. The specific milk is as follows:
model architecture
The basic structure of the model is shown in FIG. 3. the invention uses RetinaNet [2] based on ResNet-50[3] as the basic network structure of the invention. However, RetinaNet is designed for general purpose target detection, and cannot achieve the optimal effect when being directly used for label character recognition. Therefore, the invention carries out the following improvement on RetinaNet:
the invention designs a module called 'LineAttention' (orange boxes in an architecture diagram) to help the model correctly recognize the direction-sensitive characters. FIG. 4 shows a specific structure of LineAttention, and the fusion (fusion) method in FIG. 4 is:
O=Cvβ+Ch(1-β) (1)
wherein O represents an output, CvShows the vertical attention mechanism branch (third branch in the block diagram), ChShowing the branch of the horizontal self-attention mechanism (the first branch in the block diagram), β being the output result of the intermediate sigmod branch]。
LineAttention can automatically detect the direction of a current character, and increase the recognition accuracy of the current character by analyzing adjacent characters in the same direction as the current character through correlation, and particularly has obvious effect on promoting characters with sensitive directions, such as ' 6 ', ' 9 ' and ' - ', ', and the like.
The RetinaNet model only outputs the position and the size of the target frame and the category information of the target, and the invention increases the direction information of the target in the output. The invention can accurately process the label data in different directions only after the direction information exists.
The invention is optimized on the Anchor box parameters of different output layers, and the proportion of the Anchor box at the topmost layer is 1:1,1:7 and 7: 1; the Anchor box ratios of the middle layer are 1:1,1:5 and 5: 1; the Anchor box ratios of the bottommost layer are 1:1,1:2 and 2:1, the topmost layer and the middle layer are dedicated to processing words with large aspect ratios, and the bottommost layer is dedicated to processing words with small aspect ratios and characters;
another difference from RetinaNet is that the topmost output network and the middle output network share the weight, and the bottommost network uses a single weight, so that the design is based on the assumption that the topmost and middle output networks are mainly used for detecting words, and the bottommost output network is mainly used for detecting characters, and the tasks are different, so that different weight sharing rules are designed, and RetinaNet does not have the requirement, so that all output layers of RetinaNet share the weight.
Loss function
The loss function used by the training network is defined as follows:
L=Lcls(p,u)+λ[u≥1]Lloc(tu,v)+γLdre(p,w) (2)
wherein L iscls(p,u)=-log puU is the type of the target frame in the output result (the category number of the background is 0), LlocRegression loss as target Box (vs Fast R-CNN [5 ]]The same definition). L isdre(p,w)=-log pwAnd w is the direction of the target frame in the output result. And lambda and gamma are corresponding lost weights, and in the experiment, the lambda is 10 and the gamma is 1.
Detailed processing steps
The invention relates to an algorithm based on deep learning, which is divided into a training (learning) stage and a prediction (using) stage, and the corresponding processing steps are respectively explained as follows:
in the formula (3), μ is the mean value of the image, σ is the variance of the image, and img is the image;
step 5, forward propagation is carried out by using the model;
step 6, calculating loss by using a loss function, reversely propagating, and updating training parameters;
and 7, carrying out iterative training until the model converges.
Preferably, the prediction stage processing steps of the deep learning are as follows:
a. the method comprises the following steps of preprocessing an input image, wherein the preprocessing method comprises the following steps:
in the formula (3), μ is the mean value of the image, σ is the variance of the image, and img is the image;
b. scaling the pre-processed image to a fixed size (512 x 512);
c. forward propagation using the model;
d. dividing the result output in the step c into two groups of words and characters;
e. aggregating the characters into words according to whether the words and the characters are overlapped;
f. counting the directions of all characters in the same word, and determining the direction of the current word by using a voting method;
g. arranging the characters in the words according to the direction of the words in sequence;
h. determining whether spaces exist among the characters according to the distance among the characters in the word, and if yes, adding the spaces;
i. and outputting the result.
Results of the experiment
In the experiment we used 1900 more than 1900 medical slice data from more than ten hospitals as samples, 1400 as training data and 500 as test data. For deep learning, 1400 samples are very few, and we use the following method to alleviate the data shortage problem:
1. the model is pre-trained on COCO 6, and then is transferred to the label character recognition problem;
2. as shown in fig. 5, we automatically generated about 50000 samples using the program, but the weight of the automatically generated samples at the time of training was 1/30 for the real samples;
3. and data enhancement methods such as random up-down turning, random left-right turning, random rotation, random color disturbance, random brightness disturbance and the like are used.
The final properties of our model are shown in Table 1
TABLE 1 model characters and test results
Number of samples tested | Rate of accuracy | Recall rate | Rate of accuracy of direction | mAP@0.5 |
500 | 96.5% | 95.7% | 95.9% | 93.1% |
Through our post-processing algorithm, if only the label samples are classified, the classification is Her-2, Ki-67, ER, PR and the like. Automatic classification of the tags may provide the necessary prerequisites for subsequent automatic processing of digital pathological sections. The test results of the model are shown in table 2:
TABLE 2 model Classification results
Number of samples tested | Rate of accuracy | Recall rate |
925 | 100.0% | 97.5% |
As shown in fig. 6 as an example of the detection result, the colors of the target boxes in fig. 6 represent different directions, such as yellow for right, blue for up, green for left, and the text in the label may be in any direction, if simple character-level detection using a general target detector such as RetinaNet is unable to correctly distinguish the direction-sensitive characters such as "6", "9" and "-", "_" and the like, we can correctly distinguish the direction-sensitive characters with the help of lineattention module.
The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.
The prior art documents to which the present invention relates are as follows:
[1].Yuliang L,Lianwen J,Shuaitao Z,et al.Detecting Curve Text in theWild:New Dataset and New Solution[J].2017.
[2].Lin T Y,Goyal P,Girshick R,et al.Focal Loss for Dense ObjectDetection[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,PP(99):2999-3007.
[3].Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun.Deep ResidualLearning for Image Recognition.The IEEE Conference on Computer Vision andPattern Recognition(CVPR),2016,pp.770-778
[4].A.Vaswani,N.Shazeer,N.Parmar,J.Uszkoreit,L.Jones,A.N.Gomez,L.Kaiser,and I.Polosukhin.Attention is all you need.In Neural InformationProcessing Systems(NIPS),2017.2,3,6
[5].R.Girshick,“Fast R-CNN,”in IEEE International Conference onComputer Vision(ICCV),2015.
[6].T.-Y.Lin,M.Maire,S.Belongie,J.Hays,P.Perona,D.Ramanan,P.Dollár,and C.L.Zitnick.Microsoft coco: Common objects in context.In EuropeanConference on Computer Vision,pages 740–755.Springer,2014.4。
Claims (9)
1. a pathological section label identification method is characterized in that: the pathological section label image is identified by adopting a deep learning method, a basic network of a model adopted by the deep learning is a RetinaNet network based on ResNet-50, and a module for helping the basic network to identify direction-sensitive characters is adopted, the module comprises a vertical self-attention mechanism branch, a horizontal self-attention mechanism branch and a middle branch, and the fusion method of the module is as follows:
O=Cvβ+Ch(1-β) (1)
in formula (1): o represents an output, CvIndicating a vertical self-attentive mechanism branch, ChIndicating a horizontal self-attention mechanism branch, β is the output result of the middle branch.
2. The pathological section tag identification method according to claim 1, wherein: the ratio of the Anchor box at the topmost layer of the model is 1:1,1:7 and 7:1, and the ratio of the Anchor box at the middle layer is 1:1,1:5 and 5: 1; the bottom most layer has Anchor box ratios of 1:1,1:2 and 2: 1.
3. The pathological section tag identification method according to claim 1, wherein: the topmost output network and the middle output network of the base network share weights, and the bottommost network uses a single weight.
4. The pathological section tag identification method according to any one of claims 1 to 3, wherein: the loss function of the training network is as follows:
L=Lcls(p,u)+λ[u≥1]Lloc(tu,v)+γLdre(p,w) (2)
in formula (2): l iscls(p,u)=-logpuU is the type of the target box in the output result, wherein the category number of the background is 0, LlocIs the regression loss of the target box, Ldre(p,w)=-logpwW is the direction of the target box in the output result, and λ, γ are the weights of the corresponding losses.
5. The pathological section tag identification method according to claim 4, wherein: λ is 10 and γ is 1.
6. The pathological section tag identification method according to claim 4, wherein: the deep learning training phase comprises the following processing steps:
step 1, preprocessing an input image;
step 2, carrying out random cutting, left-right turning, up-down turning, rotation at any angle, color disturbance, random brightness transformation and random noise addition on the preprocessed image to carry out data enhancement;
step 3, zooming the image processed in the step 2 into a fixed size;
step 4, forming a batch by the zoomed images;
step 5, forward propagation is carried out by using the model;
step 6, calculating loss by using a loss function, reversely propagating, and updating training parameters;
and 7, carrying out iterative training until the model converges.
7. The pathological section tag identification method according to claim 6, wherein: the prediction stage processing steps of the deep learning are as follows:
a. preprocessing an input image;
b. scaling the pre-processed image to a fixed size;
c. forward propagation using the model;
d. dividing the result output in the step c into two groups of words and characters;
e. aggregating the characters into words according to whether the words and the characters are overlapped;
f. counting the directions of all characters in the same word, and determining the direction of the current word by using a voting method;
g. arranging the characters in the words according to the direction of the words in sequence;
h. determining whether spaces exist among the characters according to the distance among the characters in the word, and if yes, adding the spaces;
i. and outputting the result.
9. The pathological section tag identification method according to claim 6 or 7, wherein: the fixed size is 512 x 512, and the number of sheets is 16.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010199537.0A CN111553361B (en) | 2020-03-19 | 2020-03-19 | Pathological section label identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010199537.0A CN111553361B (en) | 2020-03-19 | 2020-03-19 | Pathological section label identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111553361A true CN111553361A (en) | 2020-08-18 |
CN111553361B CN111553361B (en) | 2022-11-01 |
Family
ID=72001858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010199537.0A Active CN111553361B (en) | 2020-03-19 | 2020-03-19 | Pathological section label identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111553361B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634279A (en) * | 2020-12-02 | 2021-04-09 | 四川大学华西医院 | Medical image semantic segmentation method based on attention Unet model |
CN114648680A (en) * | 2022-05-17 | 2022-06-21 | 腾讯科技(深圳)有限公司 | Training method, device, equipment, medium and program product of image recognition model |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180246873A1 (en) * | 2017-02-28 | 2018-08-30 | Cisco Technology, Inc. | Deep Learning Bias Detection in Text |
CN109447078A (en) * | 2018-10-23 | 2019-03-08 | 四川大学 | A kind of detection recognition method of natural scene image sensitivity text |
CN109697414A (en) * | 2018-12-13 | 2019-04-30 | 北京金山数字娱乐科技有限公司 | A kind of text positioning method and device |
CN109753954A (en) * | 2018-11-14 | 2019-05-14 | 安徽艾睿思智能科技有限公司 | The real-time positioning identifying method of text based on deep learning attention mechanism |
CN109977861A (en) * | 2019-03-25 | 2019-07-05 | 中国科学技术大学 | Offline handwritten form method for identifying mathematical formula |
CN110245657A (en) * | 2019-05-17 | 2019-09-17 | 清华大学 | Pathological image similarity detection method and detection device |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN110781305A (en) * | 2019-10-30 | 2020-02-11 | 北京小米智能科技有限公司 | Text classification method and device based on classification model and model training method |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
-
2020
- 2020-03-19 CN CN202010199537.0A patent/CN111553361B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180246873A1 (en) * | 2017-02-28 | 2018-08-30 | Cisco Technology, Inc. | Deep Learning Bias Detection in Text |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN109447078A (en) * | 2018-10-23 | 2019-03-08 | 四川大学 | A kind of detection recognition method of natural scene image sensitivity text |
CN109753954A (en) * | 2018-11-14 | 2019-05-14 | 安徽艾睿思智能科技有限公司 | The real-time positioning identifying method of text based on deep learning attention mechanism |
CN110569832A (en) * | 2018-11-14 | 2019-12-13 | 安徽艾睿思智能科技有限公司 | text real-time positioning and identifying method based on deep learning attention mechanism |
CN109697414A (en) * | 2018-12-13 | 2019-04-30 | 北京金山数字娱乐科技有限公司 | A kind of text positioning method and device |
CN109977861A (en) * | 2019-03-25 | 2019-07-05 | 中国科学技术大学 | Offline handwritten form method for identifying mathematical formula |
CN110245657A (en) * | 2019-05-17 | 2019-09-17 | 清华大学 | Pathological image similarity detection method and detection device |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
CN110781305A (en) * | 2019-10-30 | 2020-02-11 | 北京小米智能科技有限公司 | Text classification method and device based on classification model and model training method |
Non-Patent Citations (4)
Title |
---|
HONGTAO XIE等: "Convolutional attention networks for scene text recognition", 《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS, AND APPLICATIONS》 * |
徐清泉: "基于注意力机制的中文识别算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
牛作东等: "引入注意力机制的自然场景文本检测算法研究", 《计算机应用与软件》 * |
郑众喜: "拥抱数字病理时代", 《实用医院临床杂志》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112634279A (en) * | 2020-12-02 | 2021-04-09 | 四川大学华西医院 | Medical image semantic segmentation method based on attention Unet model |
CN114648680A (en) * | 2022-05-17 | 2022-06-21 | 腾讯科技(深圳)有限公司 | Training method, device, equipment, medium and program product of image recognition model |
Also Published As
Publication number | Publication date |
---|---|
CN111553361B (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7570816B2 (en) | Systems and methods for detecting text | |
US8494273B2 (en) | Adaptive optical character recognition on a document with distorted characters | |
Meier et al. | Fully convolutional neural networks for newspaper article segmentation | |
Jain et al. | Unconstrained scene text and video text recognition for arabic script | |
CN102385592B (en) | Image concept detection method and device | |
CN113343989B (en) | Target detection method and system based on self-adaption of foreground selection domain | |
CN110008899B (en) | Method for extracting and classifying candidate targets of visible light remote sensing image | |
CN111553361B (en) | Pathological section label identification method | |
CN109213886B (en) | Image retrieval method and system based on image segmentation and fuzzy pattern recognition | |
CN114663904A (en) | PDF document layout detection method, device, equipment and medium | |
CN113361432A (en) | Video character end-to-end detection and identification method based on deep learning | |
Feng et al. | Robust shared feature learning for script and handwritten/machine-printed identification | |
Nguyen | TableSegNet: a fully convolutional network for table detection and segmentation in document images | |
Li et al. | Image pattern recognition in identification of financial bills risk management | |
Xue | Optical character recognition | |
US20230154217A1 (en) | Method for Recognizing Text, Apparatus and Terminal Device | |
Zhang et al. | Text extraction for historical Tibetan document images based on connected component analysis and corner point detection | |
CN113205049A (en) | Document identification method and identification system | |
Rabby et al. | A novel deep learning character-level solution to detect language and printing style from a bilingual scanned document | |
Rani et al. | Object Detection in Natural Scene Images Using Thresholding Techniques | |
Ding et al. | Improving GAN-based feature extraction for hyperspectral images classification | |
Bumbu | On classification of 17th century fonts using neural networks | |
Hotwani et al. | Hybrid models for offline handwritten character recognition system without using any prior database images | |
US11972626B2 (en) | Extracting multiple documents from single image | |
Zhou et al. | Region selection model with saliency constraint for fine-grained recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |