CN111881768A - Document layout analysis method - Google Patents

Document layout analysis method Download PDF

Info

Publication number
CN111881768A
CN111881768A CN202010637093.4A CN202010637093A CN111881768A CN 111881768 A CN111881768 A CN 111881768A CN 202010637093 A CN202010637093 A CN 202010637093A CN 111881768 A CN111881768 A CN 111881768A
Authority
CN
China
Prior art keywords
features
resolution
image
layer
layout
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010637093.4A
Other languages
Chinese (zh)
Inventor
王波
张百灵
周炬
朱华柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Auntec Co ltd
Original Assignee
Auntec Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Auntec Co ltd filed Critical Auntec Co ltd
Priority to CN202010637093.4A priority Critical patent/CN111881768A/en
Publication of CN111881768A publication Critical patent/CN111881768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a document layout analysis method, which comprises the steps of zooming an input layout image into images of 3 scales; extracting and fusing the features of the images of all scales; sending the fused image features to a segmentation network backbone for extracting semantic information features; the method comprises the steps of up-sampling high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with rich spatial detail information with the low-layer high-resolution features with rich spatial detail information; and setting corresponding division network branches for division identification according to the attributes of different layout elements, and simultaneously restoring the output characteristic image to a pre-specified resolution ratio to finish document layout analysis. By adopting the technical scheme of the invention, the multi-scale input images can be fused, the adaptability of the segmentation network to the input images with different scales is increased, the influence of the input image scaling operation on the model is reduced, different segmentation network branches are added aiming at different attributes of layout elements, and the mutual influence of different layout elements is reduced.

Description

Document layout analysis method
Technical Field
The invention relates to the technical field of optical character recognition, in particular to a document layout analysis method.
Background
Layout analysis is one of the basic steps of an Optical Character Recognition (OCR) system, and is a process of analyzing, recognizing and understanding the image, text, table features and positional relationships in the layout of a document. The quality of the layout analysis result directly affects the performance of the OCR follow-up module, and with the development of deep learning, a document layout analysis system based on deep learning gradually becomes a mainstream method.
The image semantic segmentation technology has the recognition and positioning capacity of pixel level, so that the method is very suitable for the document layout analysis task. As is known, a character is a sparse non-rigid structure, and has large scale change, a complex structure, a wide variety of types and extremely rich semantic information. Therefore, compared with the image processing process of a general object, the document layout is more sensitive to the zooming operation of the image, and if the operation is improper, the characters are seriously deformed and blurred, and even the semantic information contained in the characters is lost. These reasons lead to the fact that the document layout analysis method based on semantic segmentation requires higher resolution of both the input image and the output feature map to ensure higher accuracy. However, high-resolution document image layout analysis not only increases the complexity of the deep neural network model, but also increases the computational load and video memory requirements thereof.
On the other hand, the structure of the document layout is very complex, and most documents have the phenomena that different layout elements are nested and overlapped with each other in a crossed manner. For example, the complex image is used as the page background of characters, etc., the table contains images, the handwritten fonts and the printed fonts are mixed, and dark watermarks, seals, character icons, etc. exist in the page. However, the labeling mode of the text data mostly follows the labeling method of the general target detection, and the large-area block-shaped labeling of the rectangular box is used. Although the labeling method is simple and convenient and has low cost, the labeling method is not suitable for data labeling applied to image semantic segmentation, and the precision of model training is reduced. The general mode of using polygon annotation semanteme to cut apart data can seriously increase the mark cost, and a pixel still can only match a label moreover to do not solve the problem that layout element alternately overlaps, these phenomena can all lead to layout element interact finally, and the precision is low, cut apart the layout in a jumble and small fragment and irregularly.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present invention provide a document layout analysis method, which can fuse multi-scale input images, increase the adaptability of a segmentation network to input images of different scales, reduce the influence of an input image scaling operation on a model, increase different segmentation network branches for different attributes of layout elements, and reduce the mutual influence of different layout elements.
The embodiment of the invention provides a document layout analysis method, which comprises the following steps:
zooming the input layout image into an image with 3 scales;
extracting and fusing the features of the images of all scales;
sending the fused image features to a segmentation network backbone for extracting semantic information features;
the method comprises the steps of up-sampling high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with rich spatial detail information with the low-layer high-resolution features with rich spatial detail information;
and setting corresponding division network branches for division identification according to the attributes of different layout elements, and simultaneously restoring the output characteristic image to a pre-specified resolution ratio to finish document layout analysis.
The method for scaling the input layout image into the image with 3 scales further comprises the following steps:
the input layout image is subjected to scaling operations of 2 times and 0.5 time, and images of 3 scales are obtained.
The method for extracting and fusing the features of the multi-scale text image further comprises the following steps:
the layout image of 2 times of scale is sampled by a convolution layer of 3 multiplied by 3 with 16 output characteristic channels and 2 step length;
carrying out feature vector splicing with the 3 multiplied by 3 convolution features of which the number of output feature channels of the layout image with the original scale is 32 and the step length is 1;
carrying out first feature fusion by using 13 multiplied by 3 convolutional layer with 64 output feature channels and 1 step length;
using 13 multiplied by 3 convolution layer with 64 output characteristic channels and 2 step length to carry out down sampling;
carrying out feature vector splicing with the 3 x 3 convolution features of which the number of output feature channels of the layout image with the 0.5-time scale is 16 and the step length is 1;
performing second feature fusion by using 13 × 3 convolutional layer with 64 output feature channels and 1 step length;
downsampling is performed using 13 × 3 convolutional layer with 64 output eigen-channels and 2 step size.
Further, when the fused image features are sent to a segmentation network backbone, the resolution is 1/4 of the resolution of the layout image with the original scale, and the number of output feature channels is 64.
Further, the main part of the segmentation network is a residual network, the dense hollow pyramid pooling module is used at the top end of the residual network to extract the convolution characteristics of the multi-scale layout image, the number of output characteristic channels after extraction is 256, and the resolution is 1/32 of the original-scale layout image resolution.
The method comprises the following steps of up-sampling the high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with the low-layer high-resolution features rich in spatial detail information, and further comprises the following steps:
carrying out 8 times bilinear interpolation up-sampling on the high-layer low-resolution features of the high semantic information, and simultaneously carrying out feature smoothing and channel dimensionality reduction on the low-layer high-resolution features through a 1 × 1 convolutional layer with the output feature channel number of 32 and the step length of 1;
in the process of fusing with low-level high-resolution features with abundant spatial detail information, fusing the up-sampled high-level features and the low-level features by using a feature vector splicing mode and 13 x 3 convolutional layer, wherein the number of output feature channels after fusion is 320, and the resolution is 1/4 of the original-scale layout image resolution;
then 3 convolution layers with the output characteristic channel number of 64 and the step length of 1 are respectively used as the heads of 3 different segmentation network branches to extract the characteristics belonging to different object attributes;
then, sampling bilinear interpolation to up-sample the resolution of the features to the pre-specified resolution;
and finally, using 1 convolution layer with the number of output characteristic channels of 64 and the step size of 1 and 1 convolution layer with the number of channels of 1 as the division identification category number of the branch of the division network and the step size of 1 as the top identification structure of the division network.
Furthermore, after all the convolution layers, a regularization BN layer and an activation function ReLU layer are connected.
Further, the high-level features have the same resolution as the low-level features after being upsampled.
Further, the segmentation network branches feature extraction and channel dimension reduction using 1 convolutional layer, upsampling to a pre-specified resolution using bilinear interpolation, and using 13 × 3 convolutional layer and 1 × 1 convolutional layer as top identification structures of the segmentation network.
Further, the number of division recognition classes of the three division network branches is 2.
The technical scheme provided by the embodiment of the invention has the following beneficial effects: because the input images of various scales are fused, the adaptability of a segmentation network to the input images of different scales is increased, and the influence of the scaling operation of the input images on the model is reduced; in addition, aiming at different attributes of layout elements, different division network branches are added, so that the mutual influence of different elements is reduced, the division of cross overlapping elements is more convenient, and the network has the capability of identifying multi-class label elements; meanwhile, the method is more beneficial to the post-processing of the segmentation result.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flowchart of document layout analysis in an embodiment of the present invention.
FIG. 2 is a flowchart of feature extraction and fusion for an image according to an embodiment of the present invention.
FIG. 3 is a flow chart of the fusion of high-level features with low-level features in an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus, and associated applications, methods consistent with certain aspects of the invention, as detailed in the following claims.
The technical scheme of the invention is to solve the problems that complex document layouts are very sensitive to image scaling, input images and output characteristics with higher resolution are needed to retain more detailed information, and meanwhile, labeling modes of layout data cause serious mutual interference of different layout elements and disorder and fine segmentation layouts, and the multi-task layout segmentation network MLSNet for multi-scale input images is provided.
FIG. 1 is a flowchart of document layout analysis in an embodiment of the present invention. As shown in fig. 1, the document layout analysis process includes the following steps:
step 10, firstly, the same input layout image is zoomed into images with 3 scales.
Specifically, the step is to specify the sizes of an input layout image and an output feature image, and then perform scaling operations of 2 times and 0.5 times on the input layout image respectively. For example, the input RGB image has 3 dimensions such as 1536 × 2048, 768 × 1024, 384 × 512, and the output feature image has a size of 1024 × 1536.
And 11, extracting and fusing the features of the images of all scales.
As shown in fig. 2, the present step further comprises the steps of:
step 111, the input 1536 × 2048-scale layout image is first down-sampled by a 3 × 3 convolution layer with an output characteristic channel number of 16 and a step size of 2(stride 2).
And step 112, performing feature vector splicing with the input 3 × 3 convolution features of the layout image with 768 × 1024 scales, wherein the output feature channel number is 32 and the step length is 1(stride is 1).
Step 113 is followed by performing a first feature fusion using 13 × 3 convolutional layer with an output feature channel number of 64 and a step size of 1(stride 1).
Step 114, the downsampling is performed again using 13 × 3 convolutional layer with the output eigen channel number of 64 and the step size of 2(stride 2).
And step 115, performing feature vector splicing with the input 3 × 3 convolution features of the layout image with 384 × 512 scales, wherein the number of output feature channels of the layout image is 16 and the step size is 1(stride is 1).
And step 116, finally, performing second-time feature fusion by using 13 × 3 convolutional layer with the output feature channel number of 64 and the step size of 1(stride 1).
And step 117, performing downsampling by using 13 × 3 convolutional layer with the output characteristic channel number of 64 and the step size of 2(stride is 2).
After the extraction and fusion of the features, when the image features are sent to the main trunk of the segmentation network, the resolution is 1/4 which is the resolution (768 × 1024) of the layout image of the original scale, the number of output feature channels is 64, and the resolution is higher.
And step 12, sending the fused image features to a segmentation network backbone for extracting semantic information.
In the embodiment, the main trunk of the segmentation network is a residual error network (resnet-50), a dense void pyramid pooling module (denseas pp) is used at the top of the residual error network to extract the convolution features of the multi-scale layout image, and after extraction, 1/32 with the number of output feature channels being 256 and the resolution being the resolution (768 × 1024) of the original-scale layout image is output.
And step 13, performing upsampling on the high-level features with high semantic information, wherein the high-level features have the same resolution as the low-level features after the upsampling, and fusing the upsampled high-level features and the low-level features with rich spatial detail information by using a feature vector splicing mode and 13 x 3 convolutional layer.
As shown in fig. 3, the fusion process includes the following steps:
step 131, 8 times of bilinear interpolation upsampling is performed on the high-layer low-resolution feature of the high semantic information, and meanwhile, feature smoothing and channel dimensionality reduction are performed on the low-layer high-resolution feature through a 1 × 1 convolutional layer with an output feature channel number of 32 and a step length of 1(stride ═ 1).
And step 132, in the process of fusing with the low-level high-resolution features with rich space detail information, fusing the up-sampled high-level features and the low-level features by using a feature vector splicing mode and 13 × 3 convolutional layer, wherein the number of output feature channels after fusion is 320, and the resolution is 1/4 of the resolution (768 × 1024) of the original-scale layout image.
Step 133 then extracts features belonging to different object attributes using 3 × 3 or 5 × 5 convolutional layers with an output feature channel number of 64 and a step size of 1(stride 1), respectively, as headers of 3 different split network branches.
Step 134, next, sample bilinear interpolation upsamples the resolution of the feature to a pre-specified resolution (1024 x 1536).
Step 135, finally, using 1 convolution layer with output characteristic channel number of 64 and step size of 1(stride 1) and 1 channel number as the division identification category number of the division network branch and step size of 1
The 1 × 1 convolution layer of (stride 1) is used as the top identification structure of the split network.
And after all the convolution layers, connecting a regularization BN layer and an activation function ReLU layer.
And step 14, finally, setting corresponding segmentation network branches for segmentation identification according to the attributes of different layout elements, and restoring the output characteristic image to a pre-specified resolution (1024 × 1536) in the process to finish document layout analysis.
In order to reduce the consumption of video memory, each split net branch uses 1 convolutional layer for feature extraction and channel dimension reduction, then uses bilinear interpolation for up-sampling to a pre-specified resolution (1024 × 1536), and uses 13 × 3 convolutional layer and 1 × 1 convolutional layer as the top structure of the split net. Due to the limitation of the tagging data category, the number of segmentation identification categories of the three segmentation network branches is 2(C1 ═ C2 ═ C3 ═ 2).
By adopting the embodiment of the invention, because the input images of various scales are fused, the adaptability of the segmentation network to the input images of different scales is increased, and the influence of the scaling operation of the input images on the model is reduced; in addition, aiming at different attributes of layout elements, different division network branches are added, so that the mutual influence of different elements is reduced, the division of cross overlapping elements is more convenient, and the network has the capability of identifying multi-class label elements; meanwhile, the method is more beneficial to the post-processing of the segmentation result.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A document layout analysis method is characterized by comprising the following steps:
zooming the input layout image into an image with 3 scales;
extracting and fusing the features of the images of all scales;
sending the fused image features to a segmentation network backbone for extracting semantic information features;
the method comprises the steps of up-sampling high-layer low-resolution features with high semantic information, and fusing the up-sampled high-layer low-resolution features with rich spatial detail information with the low-layer high-resolution features with rich spatial detail information;
and setting corresponding division network branches for division identification according to the attributes of different layout elements, and simultaneously restoring the output characteristic image to a pre-specified resolution ratio to finish document layout analysis.
2. The document layout analysis method of claim 1 wherein the scaling of the input layout image into 3-scale images further comprises the steps of:
the input layout image is subjected to scaling operations of 2 times and 0.5 time, and images of 3 scales are obtained.
3. The document layout analysis method of claim 2, wherein the extracting and fusing the features of the multi-scale text image further comprises the following steps:
the layout image of 2 times of scale is sampled by a convolution layer of 3 multiplied by 3 with 16 output characteristic channels and 2 step length;
carrying out feature vector splicing with the 3 multiplied by 3 convolution features of which the number of output feature channels of the layout image with the original scale is 32 and the step length is 1;
carrying out first feature fusion by using 13 multiplied by 3 convolutional layer with 64 output feature channels and 1 step length;
using 13 multiplied by 3 convolution layer with 64 output characteristic channels and 2 step length to carry out down sampling;
carrying out feature vector splicing with the 3 x 3 convolution features of which the number of output feature channels of the layout image with the 0.5-time scale is 16 and the step length is 1;
performing second feature fusion by using 13 × 3 convolutional layer with 64 output feature channels and 1 step length;
downsampling is performed using 13 × 3 convolutional layer with 64 output eigen-channels and 2 step size.
4. The document layout analysis method of claim 3 wherein the fused image features are sent to a split network backbone at a resolution 1/4 that is equal to the resolution of the original layout image, and the number of output feature channels is 64.
5. The document layout analysis method according to any one of claims 1 to 4, wherein the main skeleton of the segmentation network is a residual network, a dense hollow pyramid pooling module is used at the top of the residual network to extract the convolution features of the multi-scale layout image, and after extraction, the number of output feature channels is 256, and the resolution is 1/32 of the resolution of the original-scale layout image.
6. The document layout analysis method of claim 1 wherein said upsampling the high level low resolution features with high semantic information and fusing with the low level high resolution features with rich spatial detail information further comprises the steps of:
carrying out 8 times bilinear interpolation up-sampling on the high-layer low-resolution features of the high semantic information, and simultaneously carrying out feature smoothing and channel dimensionality reduction on the low-layer high-resolution features through a 1 × 1 convolutional layer with the output feature channel number of 32 and the step length of 1;
in the process of fusing with low-level high-resolution features with abundant spatial detail information, fusing the up-sampled high-level features and the low-level features by using a feature vector splicing mode and 13 x 3 convolutional layer, wherein the number of output feature channels after fusion is 320, and the resolution is 1/4 of the original-scale layout image resolution;
then 3 convolution layers with the output characteristic channel number of 64 and the step length of 1 are respectively used as the heads of 3 different segmentation network branches to extract the characteristics belonging to different object attributes;
then, sampling bilinear interpolation to up-sample the resolution of the features to the pre-specified resolution;
and finally, using 1 convolution layer with the number of output characteristic channels of 64 and the step size of 1 and 1 convolution layer with the number of channels of 1 as the division identification category number of the branch of the division network and the step size of 1 as the top identification structure of the division network.
7. The document layout analysis method of claim 6, wherein all convolutional layers are followed by a regularized BN layer, an activation function ReLU layer.
8. The document layout analysis method of claim 6 wherein the high-level features are upsampled to the same resolution as the low-level features.
9. The document layout analysis method of claim 1 wherein the split network branches feature extraction and channel dimensionality reduction using 1 convolutional layer, upsampling to a pre-specified resolution using bilinear interpolation, using 13 x 3 convolutional layer and 1 x 1 convolutional layer as top identification structures for the split network.
10. The document layout analysis method of claim 1 wherein the number of segmentation identification categories for three segmentation network branches is 2.
CN202010637093.4A 2020-07-03 2020-07-03 Document layout analysis method Pending CN111881768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010637093.4A CN111881768A (en) 2020-07-03 2020-07-03 Document layout analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010637093.4A CN111881768A (en) 2020-07-03 2020-07-03 Document layout analysis method

Publications (1)

Publication Number Publication Date
CN111881768A true CN111881768A (en) 2020-11-03

Family

ID=73151736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010637093.4A Pending CN111881768A (en) 2020-07-03 2020-07-03 Document layout analysis method

Country Status (1)

Country Link
CN (1) CN111881768A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966691A (en) * 2021-04-14 2021-06-15 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN113361247A (en) * 2021-06-23 2021-09-07 北京百度网讯科技有限公司 Document layout analysis method, model training method, device and equipment
CN113420669A (en) * 2021-06-24 2021-09-21 武汉工程大学 Document layout analysis method and system based on multi-scale training and cascade detection
CN115294412A (en) * 2022-10-10 2022-11-04 临沂大学 Real-time coal rock segmentation network generation method based on deep learning
CN116129456A (en) * 2023-02-09 2023-05-16 广西壮族自治区自然资源遥感院 Method and system for identifying and inputting property rights and interests information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100183225A1 (en) * 2009-01-09 2010-07-22 Rochester Institute Of Technology Methods for adaptive and progressive gradient-based multi-resolution color image segmentation and systems thereof
CN108268870A (en) * 2018-01-29 2018-07-10 重庆理工大学 Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study
CN110032998A (en) * 2019-03-18 2019-07-19 华南师范大学 Character detecting method, system, device and the storage medium of natural scene picture
CN110837811A (en) * 2019-11-12 2020-02-25 腾讯科技(深圳)有限公司 Method, device and equipment for generating semantic segmentation network structure and storage medium
CN110895695A (en) * 2019-07-31 2020-03-20 上海海事大学 Deep learning network for character segmentation of text picture and segmentation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100183225A1 (en) * 2009-01-09 2010-07-22 Rochester Institute Of Technology Methods for adaptive and progressive gradient-based multi-resolution color image segmentation and systems thereof
CN108268870A (en) * 2018-01-29 2018-07-10 重庆理工大学 Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study
CN110032998A (en) * 2019-03-18 2019-07-19 华南师范大学 Character detecting method, system, device and the storage medium of natural scene picture
CN110895695A (en) * 2019-07-31 2020-03-20 上海海事大学 Deep learning network for character segmentation of text picture and segmentation method
CN110837811A (en) * 2019-11-12 2020-02-25 腾讯科技(深圳)有限公司 Method, device and equipment for generating semantic segmentation network structure and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周雯;史天运;李平;马小宁;: "基于深度学习的动车组运行安全图像异物检测", 交通信息与安全, no. 06, 28 December 2019 (2019-12-28) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966691A (en) * 2021-04-14 2021-06-15 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN113361247A (en) * 2021-06-23 2021-09-07 北京百度网讯科技有限公司 Document layout analysis method, model training method, device and equipment
CN113420669A (en) * 2021-06-24 2021-09-21 武汉工程大学 Document layout analysis method and system based on multi-scale training and cascade detection
CN113420669B (en) * 2021-06-24 2022-05-10 武汉工程大学 Document layout analysis method and system based on multi-scale training and cascade detection
CN115294412A (en) * 2022-10-10 2022-11-04 临沂大学 Real-time coal rock segmentation network generation method based on deep learning
CN116129456A (en) * 2023-02-09 2023-05-16 广西壮族自治区自然资源遥感院 Method and system for identifying and inputting property rights and interests information
CN116129456B (en) * 2023-02-09 2023-07-25 广西壮族自治区自然资源遥感院 Method and system for identifying and inputting property rights and interests information

Similar Documents

Publication Publication Date Title
CN111881768A (en) Document layout analysis method
WO2019201035A1 (en) Method and device for identifying object node in image, terminal and computer readable storage medium
CN110782420A (en) Small target feature representation enhancement method based on deep learning
Huang et al. Rd-gan: Few/zero-shot chinese character style transfer via radical decomposition and rendering
US20110052062A1 (en) System and method for identifying pictures in documents
CN113569865B (en) Single sample image segmentation method based on class prototype learning
WO2022257578A1 (en) Method for recognizing text, and apparatus
CN110555433A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111080660A (en) Image segmentation method and device, terminal equipment and storage medium
CN114283430A (en) Cross-modal image-text matching training method and device, storage medium and electronic equipment
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
CN110781850A (en) Semantic segmentation system and method for road recognition, and computer storage medium
CN113554032B (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN113674146A (en) Image super-resolution
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN112766409A (en) Feature fusion method for remote sensing image target detection
CN115311454A (en) Image segmentation method based on residual error feature optimization and attention mechanism
CN111898608B (en) Natural scene multi-language character detection method based on boundary prediction
CN112364709A (en) Cabinet intelligent asset checking method based on code identification
CN115909378A (en) Document text detection model training method and document text detection method
CN115810152A (en) Remote sensing image change detection method and device based on graph convolution and computer equipment
Baloun et al. ChronSeg: Novel Dataset for Segmentation of Handwritten Historical Chronicles.
CN112257708A (en) Character-level text detection method and device, computer equipment and storage medium
CN113610032A (en) Building identification method and device based on remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination