CN111881768A - Document layout analysis method - Google Patents
Document layout analysis method Download PDFInfo
- Publication number
- CN111881768A CN111881768A CN202010637093.4A CN202010637093A CN111881768A CN 111881768 A CN111881768 A CN 111881768A CN 202010637093 A CN202010637093 A CN 202010637093A CN 111881768 A CN111881768 A CN 111881768A
- Authority
- CN
- China
- Prior art keywords
- features
- resolution
- image
- layer
- layout
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 29
- 230000011218 segmentation Effects 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 description 7
- 238000012015 optical character recognition Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Abstract
The invention discloses a document layout analysis method, which comprises the steps of zooming an input layout image into images of 3 scales; extracting and fusing the features of the images of all scales; sending the fused image features to a segmentation network backbone for extracting semantic information features; the method comprises the steps of up-sampling high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with rich spatial detail information with the low-layer high-resolution features with rich spatial detail information; and setting corresponding division network branches for division identification according to the attributes of different layout elements, and simultaneously restoring the output characteristic image to a pre-specified resolution ratio to finish document layout analysis. By adopting the technical scheme of the invention, the multi-scale input images can be fused, the adaptability of the segmentation network to the input images with different scales is increased, the influence of the input image scaling operation on the model is reduced, different segmentation network branches are added aiming at different attributes of layout elements, and the mutual influence of different layout elements is reduced.
Description
Technical Field
The invention relates to the technical field of optical character recognition, in particular to a document layout analysis method.
Background
Layout analysis is one of the basic steps of an Optical Character Recognition (OCR) system, and is a process of analyzing, recognizing and understanding the image, text, table features and positional relationships in the layout of a document. The quality of the layout analysis result directly affects the performance of the OCR follow-up module, and with the development of deep learning, a document layout analysis system based on deep learning gradually becomes a mainstream method.
The image semantic segmentation technology has the recognition and positioning capacity of pixel level, so that the method is very suitable for the document layout analysis task. As is known, a character is a sparse non-rigid structure, and has large scale change, a complex structure, a wide variety of types and extremely rich semantic information. Therefore, compared with the image processing process of a general object, the document layout is more sensitive to the zooming operation of the image, and if the operation is improper, the characters are seriously deformed and blurred, and even the semantic information contained in the characters is lost. These reasons lead to the fact that the document layout analysis method based on semantic segmentation requires higher resolution of both the input image and the output feature map to ensure higher accuracy. However, high-resolution document image layout analysis not only increases the complexity of the deep neural network model, but also increases the computational load and video memory requirements thereof.
On the other hand, the structure of the document layout is very complex, and most documents have the phenomena that different layout elements are nested and overlapped with each other in a crossed manner. For example, the complex image is used as the page background of characters, etc., the table contains images, the handwritten fonts and the printed fonts are mixed, and dark watermarks, seals, character icons, etc. exist in the page. However, the labeling mode of the text data mostly follows the labeling method of the general target detection, and the large-area block-shaped labeling of the rectangular box is used. Although the labeling method is simple and convenient and has low cost, the labeling method is not suitable for data labeling applied to image semantic segmentation, and the precision of model training is reduced. The general mode of using polygon annotation semanteme to cut apart data can seriously increase the mark cost, and a pixel still can only match a label moreover to do not solve the problem that layout element alternately overlaps, these phenomena can all lead to layout element interact finally, and the precision is low, cut apart the layout in a jumble and small fragment and irregularly.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present invention provide a document layout analysis method, which can fuse multi-scale input images, increase the adaptability of a segmentation network to input images of different scales, reduce the influence of an input image scaling operation on a model, increase different segmentation network branches for different attributes of layout elements, and reduce the mutual influence of different layout elements.
The embodiment of the invention provides a document layout analysis method, which comprises the following steps:
zooming the input layout image into an image with 3 scales;
extracting and fusing the features of the images of all scales;
sending the fused image features to a segmentation network backbone for extracting semantic information features;
the method comprises the steps of up-sampling high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with rich spatial detail information with the low-layer high-resolution features with rich spatial detail information;
and setting corresponding division network branches for division identification according to the attributes of different layout elements, and simultaneously restoring the output characteristic image to a pre-specified resolution ratio to finish document layout analysis.
The method for scaling the input layout image into the image with 3 scales further comprises the following steps:
the input layout image is subjected to scaling operations of 2 times and 0.5 time, and images of 3 scales are obtained.
The method for extracting and fusing the features of the multi-scale text image further comprises the following steps:
the layout image of 2 times of scale is sampled by a convolution layer of 3 multiplied by 3 with 16 output characteristic channels and 2 step length;
carrying out feature vector splicing with the 3 multiplied by 3 convolution features of which the number of output feature channels of the layout image with the original scale is 32 and the step length is 1;
carrying out first feature fusion by using 13 multiplied by 3 convolutional layer with 64 output feature channels and 1 step length;
using 13 multiplied by 3 convolution layer with 64 output characteristic channels and 2 step length to carry out down sampling;
carrying out feature vector splicing with the 3 x 3 convolution features of which the number of output feature channels of the layout image with the 0.5-time scale is 16 and the step length is 1;
performing second feature fusion by using 13 × 3 convolutional layer with 64 output feature channels and 1 step length;
downsampling is performed using 13 × 3 convolutional layer with 64 output eigen-channels and 2 step size.
Further, when the fused image features are sent to a segmentation network backbone, the resolution is 1/4 of the resolution of the layout image with the original scale, and the number of output feature channels is 64.
Further, the main part of the segmentation network is a residual network, the dense hollow pyramid pooling module is used at the top end of the residual network to extract the convolution characteristics of the multi-scale layout image, the number of output characteristic channels after extraction is 256, and the resolution is 1/32 of the original-scale layout image resolution.
The method comprises the following steps of up-sampling the high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with the low-layer high-resolution features rich in spatial detail information, and further comprises the following steps:
carrying out 8 times bilinear interpolation up-sampling on the high-layer low-resolution features of the high semantic information, and simultaneously carrying out feature smoothing and channel dimensionality reduction on the low-layer high-resolution features through a 1 × 1 convolutional layer with the output feature channel number of 32 and the step length of 1;
in the process of fusing with low-level high-resolution features with abundant spatial detail information, fusing the up-sampled high-level features and the low-level features by using a feature vector splicing mode and 13 x 3 convolutional layer, wherein the number of output feature channels after fusion is 320, and the resolution is 1/4 of the original-scale layout image resolution;
then 3 convolution layers with the output characteristic channel number of 64 and the step length of 1 are respectively used as the heads of 3 different segmentation network branches to extract the characteristics belonging to different object attributes;
then, sampling bilinear interpolation to up-sample the resolution of the features to the pre-specified resolution;
and finally, using 1 convolution layer with the number of output characteristic channels of 64 and the step size of 1 and 1 convolution layer with the number of channels of 1 as the division identification category number of the branch of the division network and the step size of 1 as the top identification structure of the division network.
Furthermore, after all the convolution layers, a regularization BN layer and an activation function ReLU layer are connected.
Further, the high-level features have the same resolution as the low-level features after being upsampled.
Further, the segmentation network branches feature extraction and channel dimension reduction using 1 convolutional layer, upsampling to a pre-specified resolution using bilinear interpolation, and using 13 × 3 convolutional layer and 1 × 1 convolutional layer as top identification structures of the segmentation network.
Further, the number of division recognition classes of the three division network branches is 2.
The technical scheme provided by the embodiment of the invention has the following beneficial effects: because the input images of various scales are fused, the adaptability of a segmentation network to the input images of different scales is increased, and the influence of the scaling operation of the input images on the model is reduced; in addition, aiming at different attributes of layout elements, different division network branches are added, so that the mutual influence of different elements is reduced, the division of cross overlapping elements is more convenient, and the network has the capability of identifying multi-class label elements; meanwhile, the method is more beneficial to the post-processing of the segmentation result.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flowchart of document layout analysis in an embodiment of the present invention.
FIG. 2 is a flowchart of feature extraction and fusion for an image according to an embodiment of the present invention.
FIG. 3 is a flow chart of the fusion of high-level features with low-level features in an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus, and associated applications, methods consistent with certain aspects of the invention, as detailed in the following claims.
The technical scheme of the invention is to solve the problems that complex document layouts are very sensitive to image scaling, input images and output characteristics with higher resolution are needed to retain more detailed information, and meanwhile, labeling modes of layout data cause serious mutual interference of different layout elements and disorder and fine segmentation layouts, and the multi-task layout segmentation network MLSNet for multi-scale input images is provided.
FIG. 1 is a flowchart of document layout analysis in an embodiment of the present invention. As shown in fig. 1, the document layout analysis process includes the following steps:
Specifically, the step is to specify the sizes of an input layout image and an output feature image, and then perform scaling operations of 2 times and 0.5 times on the input layout image respectively. For example, the input RGB image has 3 dimensions such as 1536 × 2048, 768 × 1024, 384 × 512, and the output feature image has a size of 1024 × 1536.
And 11, extracting and fusing the features of the images of all scales.
As shown in fig. 2, the present step further comprises the steps of:
And step 112, performing feature vector splicing with the input 3 × 3 convolution features of the layout image with 768 × 1024 scales, wherein the output feature channel number is 32 and the step length is 1(stride is 1).
Step 113 is followed by performing a first feature fusion using 13 × 3 convolutional layer with an output feature channel number of 64 and a step size of 1(stride 1).
And step 115, performing feature vector splicing with the input 3 × 3 convolution features of the layout image with 384 × 512 scales, wherein the number of output feature channels of the layout image is 16 and the step size is 1(stride is 1).
And step 116, finally, performing second-time feature fusion by using 13 × 3 convolutional layer with the output feature channel number of 64 and the step size of 1(stride 1).
And step 117, performing downsampling by using 13 × 3 convolutional layer with the output characteristic channel number of 64 and the step size of 2(stride is 2).
After the extraction and fusion of the features, when the image features are sent to the main trunk of the segmentation network, the resolution is 1/4 which is the resolution (768 × 1024) of the layout image of the original scale, the number of output feature channels is 64, and the resolution is higher.
And step 12, sending the fused image features to a segmentation network backbone for extracting semantic information.
In the embodiment, the main trunk of the segmentation network is a residual error network (resnet-50), a dense void pyramid pooling module (denseas pp) is used at the top of the residual error network to extract the convolution features of the multi-scale layout image, and after extraction, 1/32 with the number of output feature channels being 256 and the resolution being the resolution (768 × 1024) of the original-scale layout image is output.
And step 13, performing upsampling on the high-level features with high semantic information, wherein the high-level features have the same resolution as the low-level features after the upsampling, and fusing the upsampled high-level features and the low-level features with rich spatial detail information by using a feature vector splicing mode and 13 x 3 convolutional layer.
As shown in fig. 3, the fusion process includes the following steps:
And step 132, in the process of fusing with the low-level high-resolution features with rich space detail information, fusing the up-sampled high-level features and the low-level features by using a feature vector splicing mode and 13 × 3 convolutional layer, wherein the number of output feature channels after fusion is 320, and the resolution is 1/4 of the resolution (768 × 1024) of the original-scale layout image.
Step 133 then extracts features belonging to different object attributes using 3 × 3 or 5 × 5 convolutional layers with an output feature channel number of 64 and a step size of 1(stride 1), respectively, as headers of 3 different split network branches.
The 1 × 1 convolution layer of (stride 1) is used as the top identification structure of the split network.
And after all the convolution layers, connecting a regularization BN layer and an activation function ReLU layer.
And step 14, finally, setting corresponding segmentation network branches for segmentation identification according to the attributes of different layout elements, and restoring the output characteristic image to a pre-specified resolution (1024 × 1536) in the process to finish document layout analysis.
In order to reduce the consumption of video memory, each split net branch uses 1 convolutional layer for feature extraction and channel dimension reduction, then uses bilinear interpolation for up-sampling to a pre-specified resolution (1024 × 1536), and uses 13 × 3 convolutional layer and 1 × 1 convolutional layer as the top structure of the split net. Due to the limitation of the tagging data category, the number of segmentation identification categories of the three segmentation network branches is 2(C1 ═ C2 ═ C3 ═ 2).
By adopting the embodiment of the invention, because the input images of various scales are fused, the adaptability of the segmentation network to the input images of different scales is increased, and the influence of the scaling operation of the input images on the model is reduced; in addition, aiming at different attributes of layout elements, different division network branches are added, so that the mutual influence of different elements is reduced, the division of cross overlapping elements is more convenient, and the network has the capability of identifying multi-class label elements; meanwhile, the method is more beneficial to the post-processing of the segmentation result.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (10)
1. A document layout analysis method is characterized by comprising the following steps:
zooming the input layout image into an image with 3 scales;
extracting and fusing the features of the images of all scales;
sending the fused image features to a segmentation network backbone for extracting semantic information features;
the method comprises the steps of up-sampling high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with rich spatial detail information with the low-layer high-resolution features with rich spatial detail information;
and setting corresponding division network branches for division identification according to the attributes of different layout elements, and simultaneously restoring the output characteristic image to a pre-specified resolution ratio to finish document layout analysis.
2. The document layout analysis method of claim 1 wherein the scaling of the input layout image into 3-scale images further comprises the steps of:
the input layout image is subjected to scaling operations of 2 times and 0.5 time, and images of 3 scales are obtained.
3. The document layout analysis method of claim 2, wherein the extracting and fusing the features of the multi-scale text image further comprises the following steps:
the layout image of 2 times of scale is sampled by a convolution layer of 3 multiplied by 3 with 16 output characteristic channels and 2 step length;
carrying out feature vector splicing with the 3 multiplied by 3 convolution features of which the number of output feature channels of the layout image with the original scale is 32 and the step length is 1;
carrying out first feature fusion by using 13 multiplied by 3 convolutional layer with 64 output feature channels and 1 step length;
using 13 multiplied by 3 convolution layer with 64 output characteristic channels and 2 step length to carry out down sampling;
carrying out feature vector splicing with the 3 x 3 convolution features of which the number of output feature channels of the layout image with the 0.5-time scale is 16 and the step length is 1;
performing second feature fusion by using 13 × 3 convolutional layer with 64 output feature channels and 1 step length;
downsampling is performed using 13 × 3 convolutional layer with 64 output eigen-channels and 2 step size.
4. The document layout analysis method of claim 3 wherein the fused image features are sent to a split network backbone at a resolution 1/4 that is equal to the resolution of the original layout image, and the number of output feature channels is 64.
5. The document layout analysis method according to any one of claims 1 to 4, wherein the main skeleton of the segmentation network is a residual network, a dense hollow pyramid pooling module is used at the top of the residual network to extract the convolution features of the multi-scale layout image, and after extraction, the number of output feature channels is 256, and the resolution is 1/32 of the resolution of the original-scale layout image.
6. The document layout analysis method of claim 1 wherein said upsampling the high level low resolution features with high semantic information and fusing with the low level high resolution features with rich spatial detail information further comprises the steps of:
carrying out 8 times bilinear interpolation up-sampling on the high-layer low-resolution features of the high semantic information, and simultaneously carrying out feature smoothing and channel dimensionality reduction on the low-layer high-resolution features through a 1 × 1 convolutional layer with the output feature channel number of 32 and the step length of 1;
in the process of fusing with low-level high-resolution features with abundant spatial detail information, fusing the up-sampled high-level features and the low-level features by using a feature vector splicing mode and 13 x 3 convolutional layer, wherein the number of output feature channels after fusion is 320, and the resolution is 1/4 of the original-scale layout image resolution;
then 3 convolution layers with the output characteristic channel number of 64 and the step length of 1 are respectively used as the heads of 3 different segmentation network branches to extract the characteristics belonging to different object attributes;
then, sampling bilinear interpolation to up-sample the resolution of the features to the pre-specified resolution;
and finally, using 1 convolution layer with the number of output characteristic channels of 64 and the step size of 1 and 1 convolution layer with the number of channels of 1 as the division identification category number of the branch of the division network and the step size of 1 as the top identification structure of the division network.
7. The document layout analysis method of claim 6, wherein all convolutional layers are followed by a regularized BN layer, an activation function ReLU layer.
8. The document layout analysis method of claim 6 wherein the high-level features are upsampled to the same resolution as the low-level features.
9. The document layout analysis method of claim 1 wherein the split network branches feature extraction and channel dimensionality reduction using 1 convolutional layer, upsampling to a pre-specified resolution using bilinear interpolation, using 13 x 3 convolutional layer and 1 x 1 convolutional layer as top identification structures for the split network.
10. The document layout analysis method of claim 1 wherein the number of segmentation identification categories for three segmentation network branches is 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010637093.4A CN111881768A (en) | 2020-07-03 | 2020-07-03 | Document layout analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010637093.4A CN111881768A (en) | 2020-07-03 | 2020-07-03 | Document layout analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111881768A true CN111881768A (en) | 2020-11-03 |
Family
ID=73151736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010637093.4A Pending CN111881768A (en) | 2020-07-03 | 2020-07-03 | Document layout analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111881768A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966691A (en) * | 2021-04-14 | 2021-06-15 | 重庆邮电大学 | Multi-scale text detection method and device based on semantic segmentation and electronic equipment |
CN113361247A (en) * | 2021-06-23 | 2021-09-07 | 北京百度网讯科技有限公司 | Document layout analysis method, model training method, device and equipment |
CN113420669A (en) * | 2021-06-24 | 2021-09-21 | 武汉工程大学 | Document layout analysis method and system based on multi-scale training and cascade detection |
CN115294412A (en) * | 2022-10-10 | 2022-11-04 | 临沂大学 | Real-time coal rock segmentation network generation method based on deep learning |
CN116129456A (en) * | 2023-02-09 | 2023-05-16 | 广西壮族自治区自然资源遥感院 | Method and system for identifying and inputting property rights and interests information |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100183225A1 (en) * | 2009-01-09 | 2010-07-22 | Rochester Institute Of Technology | Methods for adaptive and progressive gradient-based multi-resolution color image segmentation and systems thereof |
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN110032998A (en) * | 2019-03-18 | 2019-07-19 | 华南师范大学 | Character detecting method, system, device and the storage medium of natural scene picture |
CN110837811A (en) * | 2019-11-12 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Method, device and equipment for generating semantic segmentation network structure and storage medium |
CN110895695A (en) * | 2019-07-31 | 2020-03-20 | 上海海事大学 | Deep learning network for character segmentation of text picture and segmentation method |
-
2020
- 2020-07-03 CN CN202010637093.4A patent/CN111881768A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100183225A1 (en) * | 2009-01-09 | 2010-07-22 | Rochester Institute Of Technology | Methods for adaptive and progressive gradient-based multi-resolution color image segmentation and systems thereof |
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN110032998A (en) * | 2019-03-18 | 2019-07-19 | 华南师范大学 | Character detecting method, system, device and the storage medium of natural scene picture |
CN110895695A (en) * | 2019-07-31 | 2020-03-20 | 上海海事大学 | Deep learning network for character segmentation of text picture and segmentation method |
CN110837811A (en) * | 2019-11-12 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Method, device and equipment for generating semantic segmentation network structure and storage medium |
Non-Patent Citations (1)
Title |
---|
周雯;史天运;李平;马小宁;: "基于深度学习的动车组运行安全图像异物检测", 交通信息与安全, no. 06, 28 December 2019 (2019-12-28) * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966691A (en) * | 2021-04-14 | 2021-06-15 | 重庆邮电大学 | Multi-scale text detection method and device based on semantic segmentation and electronic equipment |
CN113361247A (en) * | 2021-06-23 | 2021-09-07 | 北京百度网讯科技有限公司 | Document layout analysis method, model training method, device and equipment |
CN113420669A (en) * | 2021-06-24 | 2021-09-21 | 武汉工程大学 | Document layout analysis method and system based on multi-scale training and cascade detection |
CN113420669B (en) * | 2021-06-24 | 2022-05-10 | 武汉工程大学 | Document layout analysis method and system based on multi-scale training and cascade detection |
CN115294412A (en) * | 2022-10-10 | 2022-11-04 | 临沂大学 | Real-time coal rock segmentation network generation method based on deep learning |
CN116129456A (en) * | 2023-02-09 | 2023-05-16 | 广西壮族自治区自然资源遥感院 | Method and system for identifying and inputting property rights and interests information |
CN116129456B (en) * | 2023-02-09 | 2023-07-25 | 广西壮族自治区自然资源遥感院 | Method and system for identifying and inputting property rights and interests information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111881768A (en) | Document layout analysis method | |
WO2019201035A1 (en) | Method and device for identifying object node in image, terminal and computer readable storage medium | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
Huang et al. | Rd-gan: Few/zero-shot chinese character style transfer via radical decomposition and rendering | |
US20110052062A1 (en) | System and method for identifying pictures in documents | |
CN113569865B (en) | Single sample image segmentation method based on class prototype learning | |
WO2022257578A1 (en) | Method for recognizing text, and apparatus | |
CN110555433A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN111080660A (en) | Image segmentation method and device, terminal equipment and storage medium | |
CN114283430A (en) | Cross-modal image-text matching training method and device, storage medium and electronic equipment | |
CN115131797B (en) | Scene text detection method based on feature enhancement pyramid network | |
CN110781850A (en) | Semantic segmentation system and method for road recognition, and computer storage medium | |
CN113554032B (en) | Remote sensing image segmentation method based on multi-path parallel network of high perception | |
CN113674146A (en) | Image super-resolution | |
CN113903022A (en) | Text detection method and system based on feature pyramid and attention fusion | |
CN111353544A (en) | Improved Mixed Pooling-Yolov 3-based target detection method | |
CN112766409A (en) | Feature fusion method for remote sensing image target detection | |
CN115311454A (en) | Image segmentation method based on residual error feature optimization and attention mechanism | |
CN111898608B (en) | Natural scene multi-language character detection method based on boundary prediction | |
CN112364709A (en) | Cabinet intelligent asset checking method based on code identification | |
CN115909378A (en) | Document text detection model training method and document text detection method | |
CN115810152A (en) | Remote sensing image change detection method and device based on graph convolution and computer equipment | |
Baloun et al. | ChronSeg: Novel Dataset for Segmentation of Handwritten Historical Chronicles. | |
CN112257708A (en) | Character-level text detection method and device, computer equipment and storage medium | |
CN113610032A (en) | Building identification method and device based on remote sensing image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |