CN111340037B

CN111340037B - Text layout analysis method and device, computer equipment and storage medium

Info

Publication number: CN111340037B
Application number: CN202010219551.2A
Authority: CN
Inventors: 王晓珂
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2022-08-19
Anticipated expiration: 2040-03-25
Also published as: CN111340037A

Abstract

A text layout analysis method, a device, a computer device and a storage medium are provided, wherein the text layout analysis method comprises the following steps: acquiring a target picture; performing layout area segmentation on the target picture to obtain a plurality of segmentation areas; identifying a character area of the target picture according to the texture features of the target picture; matching the character areas in the target picture with the plurality of segmentation areas to obtain character areas contained in each segmentation area; performing content identification on the character area contained in each divided area to obtain the character content of the divided area; and outputting the text content of each divided area. Therefore, the accuracy of character recognition in the picture can be effectively improved.

Description

Text layout analysis method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a text layout analysis method, a text layout analysis device, computer equipment and a storage medium.

Background

With the increasing content of internet pictures and the increasing number of office scanning and printing files, scanned and printed pictures are often required to be converted into text information, so how to realize efficient text recognition and detection and quickly complete the conversion of texts in the scanned and printed pictures becomes a problem which needs to be solved urgently.

The current common recognition scheme generally comprises two parts of character positioning and recognition, wherein the accuracy of character line positioning is in linear direct proportion to the accuracy of recognition. However, it is easy to obtain text contents from a document, but the positions of the text contents are random, and particularly when identifying a wrongly printed picture, the accuracy of identifying the contents is low.

Disclosure of Invention

The invention solves the technical problem of how to provide a text layout analysis method capable of accurately identifying the text content in the picture.

In order to solve the above technical problem, an embodiment of the present invention provides a text layout analysis method, where the method includes: acquiring a target picture; performing layout area segmentation on the target picture to obtain a plurality of segmentation areas; identifying a character area of the target picture according to the texture features of the target picture; matching the character area in the target picture with the plurality of segmentation areas to obtain character areas contained in each segmentation area; performing content identification on the character area contained in each divided area to obtain the character content of the divided area; and outputting the text content of each divided area.

Optionally, when content recognition is performed on the character areas included in each of the divided areas, the character areas belonging to the same divided area are transmitted to the text recognition model together for recognition.

Optionally, the content identification of the text area included in each divided area includes: respectively identifying the content of each character area contained in each division area to obtain the character content of the character area; and splicing the text contents of the text areas to obtain the text contents of the divided areas.

Optionally, the identifying the text region of the target picture according to the texture feature of the target picture includes: performing convolution operation on the target picture through a plurality of convolution kernels to extract a plurality of texture feature layers corresponding to characters from the target picture; respectively distributing a plurality of anchor point regions with different receptive fields for part or all of the texture feature layers; and regressing the distributed anchor point region to obtain a character region of the target picture.

Optionally, the identifying the text region of the target picture according to the texture feature of the target picture includes: inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers with different feature dimensions of the target picture, wherein the texture extraction model is obtained by analyzing texture features in historical pictures and is used for extracting the texture feature layers in the input picture; screening a basic texture characteristic layer from the texture characteristic layers; performing feature superposition on the basic texture feature layer to obtain a character feature layer of the target picture; and acquiring a character area of the target picture according to the character feature layer.

Optionally, the text area is an area corresponding to each line of text included in the target picture.

Optionally, the output text content of each divided area is a character string.

The embodiment of the invention also provides a text layout analysis device, which comprises: the image acquisition module is used for acquiring a target image; the layout segmentation module is used for performing layout region segmentation on the target picture to obtain a plurality of segmentation regions; the region identification module is used for identifying the character region of the target picture according to the textural features of the target picture; the layout analysis module is used for matching the character areas in the target picture with the plurality of segmentation areas to obtain character areas contained in each segmentation area; the content identification module is used for identifying the content of the character area contained in each divided area to obtain the character content of the divided area; and the output module is used for outputting the text content of each division area.

The embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores computer instructions capable of being executed on the processor, and the processor executes the steps of the text layout analysis method when executing the computer instructions.

The embodiment of the invention also provides a storage medium, wherein a computer instruction is stored on the storage medium, and the steps of the text layout analysis method are executed when the computer instruction runs.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a text layout analysis method, which comprises the following steps: acquiring a target picture; performing layout area segmentation on the target picture to obtain a plurality of segmentation areas; identifying a character area of the target picture according to the texture features of the target picture; matching the character areas in the target picture with the plurality of segmentation areas to obtain character areas contained in each segmentation area; performing content identification on the character area contained in each divided area to obtain the character content of the divided area; and outputting the text content of each divided area. Compared with the prior art, the method has the advantages that the target picture is divided according to the layout, the positions of characters in the target picture are positioned to obtain the character areas, the character areas in the plurality of layouts of the target picture are obtained, the content of the character areas in each layout is identified, the positions of the characters in the picture can be accurately positioned, the characters in each layout are identified in a targeted mode by combining the background and the font format of the layout, and the accuracy of character identification in the picture can be effectively improved.

Further, the character areas included in each of the divided areas are transmitted to a text recognition model, and the character contents in the character areas are recognized to obtain the character contents included in each of the divided areas, so that the character contents included in the divided areas are obtained. And accurately identifying the content of the character area in each segmentation area by adopting a text identification model such as big data training or neural network.

Further, convolution transportation is carried out on the target picture through a plurality of different convolution cores so as to obtain texture feature layers with different feature dimensions of the target picture, anchor point regions with different receptive fields are distributed for each texture feature layer according to a set distribution scheme so as to adapt to the feature distribution condition in the texture feature layers, and the accuracy of the recognition result of the character regions in the target picture is improved.

Furthermore, a texture extraction model can be adopted, and a plurality of texture feature layers of the target picture are output according to different feature dimensions, so that the number of training samples in model training is reduced, and the data processing pressure is reduced; and finally obtaining the texture feature layer of the character to be recognized according to the screening and feature superposition of the texture feature layer, thereby realizing the accurate positioning of the character in the target picture.

Drawings

Fig. 1 is a schematic flowchart of a text layout analysis method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating step S103 of FIG. 1 according to an embodiment;

FIG. 3 is a schematic flow chart of step S103 in FIG. 1 according to another embodiment;

fig. 4 is an application diagram of a text layout analysis method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a text layout analysis apparatus according to an embodiment of the present invention.

Detailed Description

As background art, when text content in a picture is identified in the prior art, the identification effect is poor.

The existing text positioning method is divided into two categories, namely positioning based on Anchors (Anchors) and positioning based on segmentation, and the anchor-based defects are that the receptive field limits the length of detected contents, and the segmentation-based methods generally need time-consuming post-processing. Existing text recognition methods are divided into two main streams, namely those based on the temporal classification (CTC) and those based on Attention (Attention). The detection and identification methods can obtain expected identification effect for the text without any layout requirement. But for the text with fixed layout format, such as ID card, business license, value added tax invoice, etc., it needs to do a layout analysis and layout reduction, otherwise, it is a stack of unordered invalid characters. At present, common layout analysis is based on keyword matching, and the method has high requirements on the accuracy of a detection model and an identification model. Yet another layout analysis is based on conventional picture processing, such as erosion dilation, connected domain analysis, which is less robust to background variations.

In order to solve the above problem, embodiments of the present invention provide a text layout analysis method, apparatus, computer device, and storage medium. The text layout analysis method comprises the following steps: acquiring a target picture; performing layout area segmentation on the target picture to obtain a plurality of segmentation areas; identifying a character area of the target picture according to the texture features of the target picture; matching the character areas in the target picture with the plurality of segmentation areas to obtain character areas contained in each segmentation area; performing content identification on the character area contained in each divided area to obtain the character content of the divided area; and outputting the text content of each division area.

By the method, the accuracy of character recognition in the picture can be effectively improved, and particularly, the picture and the picture with random text content positions or skewed printing can be effectively improved.

Referring to fig. 1, fig. 1 is a schematic flow chart of a text layout analysis method according to an embodiment of the present invention; the text layout analysis method may specifically include the following steps S101 to S105.

Step S101, a target picture is obtained.

The target picture is a picture containing characters to be recognized, and may be a scanned picture of characters or a picture based on printed characters, and the like, and the target picture may be a true color picture (i.e., an RGB picture). The character to be recognized is the character part in the target picture. The method and the device are used for analyzing the layout of the text in the picture, the terminal firstly obtains the target picture to be analyzed, and the following steps are continuously executed.

And step S102, performing layout area segmentation on the target picture to obtain a plurality of segmentation areas.

And partitioning the layout area of the target picture according to the characteristics of typesetting and/or character format in the target picture and/or background color of the picture, and the like, and partitioning the target picture into a plurality of areas, namely partitioned areas. The number of divided regions obtained by dividing the layout region is at least 1.

Optionally, if there is a font direction difference in the target picture due to reasons such as oblique printing of the print fonts, the fonts can be identified as a single layout area when the layout area is divided, so as to improve the accuracy of identification.

Optionally, when the layout region of the target picture is segmented, a layout region segmentation model may be established by using big data, and the model takes a large number of pictures as training samples, and has a function of automatically identifying a segmentation region included in an input picture and performing region segmentation on the picture. The layout area segmentation model trained by big data can accurately identify the segmentation areas contained in each input picture according to abundant training samples, and improves the accuracy of the layout area segmentation of the pictures.

And step S103, recognizing the character area of the target picture according to the texture features of the target picture.

The texture features of the target picture correspond to the characters contained in the target picture and are used for identifying the region where the characters in the target picture are located, namely the character region. Convolution operation can be carried out on the target picture to identify texture features in the target picture, so that the character areas of the target picture are obtained, and for the target picture containing character contents, the number of the character areas obtained through identification is one or more. In addition, when the character area cannot be identified from the target picture, the target picture can be judged not to contain character content, and information of identification errors is generated, so that a technician can perform secondary identification or corresponding error troubleshooting on the target picture.

Optionally, the recognition result of the text region of the target picture is a rectangular region where a plurality of lines of text included in the target picture are located.

And step S104, matching the character area in the target picture with the plurality of segmentation areas to obtain the character area contained in each segmentation area.

And matching the plurality of divided areas obtained in the step S102 with the character areas obtained in the step S103, so as to obtain character areas where characters contained in the divided areas corresponding to different layouts are located. And when the recognition result of the character area of the target picture is a rectangular area where a plurality of lines of characters contained in the target picture are located, the division area corresponding to each different layout comprises rectangular areas corresponding to a plurality of lines of characters. Each partitioned area may include one or more text regions.

In step S105, content recognition is performed on the text area included in each divided area to obtain the text content of the divided area.

And recognizing the characters in each divided area obtained by dividing the target picture according to recognition methods such as OCR recognition and the like to obtain the character content contained in the character area in each divided area.

In step S106, the text content of each divided area is output.

And respectively outputting the character contents recognized in each divided area as the character recognition result of the layout corresponding to the divided area. Through the text layout analysis method shown in fig. 1, the target picture is divided according to the layout, and the positions of the characters in the target picture are positioned to obtain the character areas, so that the character areas in the plurality of layouts of the target picture are obtained, the content of the character areas in each layout is identified, the positions of the characters in the picture can be accurately positioned, the characters in each layout are identified in a targeted manner by combining the background, the font format and the like of the layout, and the accuracy of identifying the characters in the picture can be effectively improved.

In this embodiment, the layout analysis reduction problem based on the character recognition of the picture is more effective for the forms, the business licenses, the identification cards, and especially for the reduction effect of the inclined text layout.

In one embodiment, when content recognition is performed on the character areas contained in each segmentation area, the character areas belonging to the same segmentation area are transmitted to the text recognition model together for recognition.

The text recognition model is a model for recognizing characters contained in the picture, and can be a model obtained through big data training, such as a neural network model; the text recognition model may employ various appropriate models in Natural Language Processing (NLP) technology.

And transmitting the character area contained in each divided area to a text recognition model, and recognizing the character content in the character area to obtain the character content contained in each divided area, thereby obtaining the character content contained in the divided area. And accurately identifying the content of the character area in each segmentation area by adopting a text identification model such as big data training or neural network.

Optionally, the content identification of the text area included in each of the divided areas includes: respectively identifying the content of each character area contained in each divided area to obtain the character content of the character area; and splicing the text contents of the text areas to obtain the text contents of the divided areas.

After the content of each of the plurality of character areas included in each of the divided areas is identified, the character contents identified by each of the character areas can be spliced to obtain all the character contents of the layout corresponding to each of the divided areas.

Optionally, the text contents identified by the text regions are spliced according to the relative positions of the text regions in the segmentation regions.

Optionally, the performing layout region segmentation on the target picture in step S102 in fig. 1 may include: and performing layout area segmentation on the target picture according to a semantic segmentation technology.

The semantic segmentation (semantic segmentation) of the picture is to classify pixel points in the picture to obtain clustered pixel regions from the picture. Step S102 may be performed by training a large number of picture samples to obtain a semantic segmentation model according to a semantic segmentation technique. As a non-limiting example, the semantic segmentation model can be obtained by training an example segmentation model Mask R-CNN model framework.

In an embodiment, referring to fig. 2, the step S103 in fig. 1 of identifying the text region of the target picture according to the texture feature of the target picture may include the following steps:

step S201, performing convolution operation on the target picture through a plurality of convolution kernels to extract a plurality of texture feature layers corresponding to the characters from the target picture.

The texture features correspond to the character distribution contained in the target picture, and a plurality of texture feature layers corresponding to the target picture can be obtained by performing convolution operation on the pixels in the target picture through a plurality of different convolution kernels.

Step S202, respectively allocating a plurality of anchor point areas with different receptive fields for part or all of the texture feature layers.

And allocating anchor point regions of different receptive fields for the texture feature layers obtained in the step S202, so that the receptive fields can be matched with the detection content.

Optionally, anchor point regions may be respectively allocated to all the obtained texture feature layers; and several layers with better recognition effect can be selected from the layers as recognition objects, so that the calculation amount of equipment is reduced, and the recognition efficiency is improved.

And step S203, regressing the distributed anchor point region to obtain a character region of the target picture.

Text region detection is performed according to the anchor point regions allocated in step S203, and regression operation may be performed on detection results corresponding to the anchor point regions to obtain the location regions of the characters to be recognized in the target picture, that is, the character regions. Wherein, the detection result of the anchor point region can be regressed by a Non-maximum suppression (NMS) algorithm.

In this embodiment, a target picture is subjected to convolution transportation through a plurality of different convolution kernels to obtain texture feature layers with different feature dimensions of the target picture, and anchor point regions with different receptive fields are allocated to each texture feature layer according to a set allocation scheme, so as to adapt to the feature distribution condition in the texture feature layer and improve the accuracy of the recognition result of the character region in the target picture.

In an embodiment, referring to fig. 3, the step S103 in fig. 1 of identifying the text region of the target picture according to the texture feature of the target picture may further include the following steps:

step S301, inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers with different feature dimensions of the target picture, wherein the texture extraction model is obtained by analyzing texture features in historical pictures and is used for extracting the texture feature layers in the input picture.

The texture features correspond to the distribution of characters in the picture, and the feature dimension is the dimension for identifying the region where the characters in the picture are located.

The texture extraction model is a model which is used for acquiring a texture feature layer of an input picture based on different feature dimensions and is trained by taking a historical picture as a training sample according to a character part and a non-character part in the sample. When the characteristic dimension corresponds to the pixel value of the picture, the texture extraction model may use an existing convolutional neural network model (such as MobileNetv2, SqueezeNet, ShuffleNet, and the like), and perform convolution processing on the pixel of the target picture through a plurality of different convolution kernels to obtain a plurality of texture characteristic layers corresponding to the target picture. After the identification terminal obtains the target picture, the target picture passes through a texture extraction model to obtain a plurality of texture feature layers of the target picture. For example, if the texture extraction model is MobileNetv2, 19 texture feature layers may be obtained.

Step S302, a basic texture feature layer is screened from the texture feature layers.

The basic texture characteristic layer is one or more layers with the best text positioning effect in the texture characteristic layers. After obtaining a plurality of textural feature layers, not all textural feature layers are subjected to the next operation, but are screened according to the identification requirement, and only the basic textural feature layer with the best text positioning effect is reserved. Optionally, a plurality of texture feature layers of each historical picture obtained after the identification of the plurality of historical pictures may be screened, so as to obtain a plurality of layers with a better text positioning effect as the basic texture feature layers.

For example, when the texture extraction model is MobileNetv2, the 3 rd, 7 th, 14 th and 19 th layers can be selected from the 19 texture feature layers identified by MobileNetv2 as the base texture feature layers.

And step S303, performing feature superposition on the basic texture feature layer to obtain a character feature layer of the target picture.

After the basic texture feature layers are obtained, if there is more than one basic texture feature layer, features in the basic texture feature layers need to be superimposed to obtain texture feature layers representing positions of characters to be recognized in the target picture. When the basic texture feature layer corresponds to a convolution layer obtained by a target picture through a plurality of different convolution kernels, pixel interpolation can be carried out on the basic texture feature layer, high-dimensional pixel pictures of the plurality of basic texture feature layers are obtained, and optimization of the basic texture feature layer is achieved.

In addition, when the number of the basic texture feature layers is one, the obtained basic texture feature layers are directly used as character feature layers of the target picture.

Step S304, acquiring a character area of the target picture according to the character feature layer.

In the texture feature layer of the character to be recognized, the position of the character to be recognized in the target picture can be obtained according to the feature distribution condition in the texture feature layer, such as the feature pixel distribution corresponding to the character to be recognized in the pixel.

In the embodiment, a texture extraction model is adopted, and a plurality of texture feature layers of a target picture are output according to different feature dimensions, so that the number of training samples in model training is reduced, and the pressure of data processing is reduced; and finally obtaining the texture feature layer of the character to be recognized according to the screening and feature superposition of the texture feature layer, thereby realizing the accurate positioning of the character in the target picture.

The text region included in the target picture output by the texture extraction model is the position of each line of text in the picture, and can be a rectangular region in which each line of text is located.

Optionally, the output text content of each divided area is a character string.

Namely, the character recognition result of each layout in the target picture is output in a character string format.

In one embodiment, please refer to fig. 4, fig. 4 is a schematic diagram illustrating an application of a text layout analysis method; in this application example, the analysis of the text layout by the terminal may specifically include the following steps:

step 1, inputting RGB pictures;

step 2, inputting the picture in the step 1 into a semantic segmentation model 401 to perform layout region segmentation;

step 3, inputting the picture in the step 1 into a texture extraction model 402 for text line positioning;

step 4, comparing the areas of the results of the step 2 and the step 3, and clustering the results of the step 3 to obtain a character area contained in each layout area;

step 5, putting the text lines clustered in the step 4 into a text recognition model 403 for recognition;

and 6, splicing the recognition results into a character string and returning.

Referring to fig. 5, fig. 5 provides a text layout analyzing apparatus according to an embodiment of the present invention, where the apparatus includes:

a picture acquiring module 501, configured to acquire a target picture.

The layout segmentation module 502 is configured to perform layout region segmentation on the target picture to obtain a plurality of segmented regions.

The region identification module 503 is configured to identify a text region of the target picture according to the texture feature of the target picture.

The layout analysis module 504 is configured to match the text regions in the target picture with the plurality of segmentation regions, so as to obtain text regions included in each segmentation region.

The content identification module 505 is configured to perform content identification on the text area included in each divided area to obtain text content of the divided area.

And an output module 506, configured to output text content of each divided area.

In one embodiment, when the content recognition module 505 performs content recognition on the character regions included in each divided region, the character regions belonging to the same divided region are collectively transmitted to the text recognition model for recognition.

In one embodiment, the content identification module 505 comprises:

and the area content identification unit is used for respectively identifying the content of each character area contained in each divided area to obtain the character content of the character area.

And the splicing unit is used for splicing the text contents of the text areas to obtain the text contents of the divided areas.

In one embodiment, the area identification module 103 may include:

and the texture feature layer extraction unit is used for performing convolution operation on the target picture through a plurality of convolution kernels so as to extract a plurality of texture feature layers corresponding to the characters from the target picture.

And the anchor point region distribution unit is used for respectively distributing a plurality of anchor point regions with different receptive fields for part or all of the texture feature layers.

And the first character region acquisition unit is used for regressing the distributed anchor point region to obtain the character region of the target picture.

In one embodiment, the area identification module 103 may include:

and the model analysis unit is used for inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers with different feature dimensions of the target picture, and the texture extraction model is obtained by analyzing texture features in historical pictures and is used for extracting the texture feature layers in the input picture.

And the basic texture characteristic layer screening unit is used for screening a basic texture characteristic layer from the texture characteristic layers.

And the characteristic superposition unit is used for carrying out characteristic superposition on the basic texture characteristic layer to obtain a character characteristic layer of the target picture.

And the second character area acquisition unit is used for acquiring the character area of the target picture according to the character feature layer.

Optionally, the text areas obtained by the first text area obtaining unit and the second text area obtaining unit are areas corresponding to each line of text included in the target picture.

Optionally, the text content of each divided area output by the output module 506 is a character string.

For more contents of the working principle and the working mode of the text layout analysis apparatus, reference may be made to the description of the text layout analysis method in fig. 1 to fig. 4, which is not repeated herein.

Further, the embodiment of the present invention further discloses a computer device, which includes a memory and a processor, where the memory stores a computer instruction capable of being executed on the processor, and the processor executes the technical solution of the text layout analysis method in the embodiments shown in fig. 1 to 4 when executing the computer instruction.

Further, the embodiment of the present invention further discloses a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the technical solution of the text layout analysis method in the embodiments shown in fig. 1 to fig. 4 is executed. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transient) memory. The storage medium may include ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of text layout analysis, the method comprising:

acquiring a target picture;

performing layout area segmentation on the target picture to obtain a plurality of segmentation areas;

identifying a character area of the target picture according to the texture features of the target picture;

matching the character areas in the target picture with the plurality of segmentation areas to obtain character areas contained in each segmentation area;

performing content identification on the character area contained in each divided area to obtain the character content of the divided area;

and outputting the text content of each divided area.

2. The method of claim 1, wherein when performing content recognition on the text regions included in each of the divided regions, the text regions belonging to the same divided region are collectively transmitted to the text recognition model for recognition.

3. The method according to claim 1, wherein the content identification of the text area included in each partition area comprises:

respectively identifying the content of each character area contained in each divided area to obtain the character content of the character area;

and splicing the text contents of the text areas to obtain the text contents of the divided areas.

4. The method according to claim 1, wherein the identifying the text region of the target picture according to the texture feature of the target picture comprises:

performing convolution operation on the target picture through a plurality of convolution kernels to extract a plurality of texture feature layers corresponding to the characters from the target picture;

respectively distributing a plurality of anchor point regions with different receptive fields for part or all of the texture feature layers;

and regressing the distributed anchor point area to obtain the character area of the target picture.

5. The method according to claim 1, wherein the identifying the text region of the target picture according to the texture feature of the target picture comprises:

inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers with different feature dimensions of the target picture, wherein the texture extraction model is obtained by analyzing texture features in historical pictures and is used for extracting the texture feature layers in the input picture;

screening a basic texture characteristic layer from the texture characteristic layers;

performing feature superposition on the basic texture feature layer to obtain a character feature layer of the target picture;

and acquiring a character area of the target picture according to the character feature layer.

6. The method of claim 1, wherein the text area is an area corresponding to each line of text included in the target picture.

7. The method of claim 1, wherein the output text content of each partitioned area is a character string.

8. A text layout analysis apparatus, comprising:

the image acquisition module is used for acquiring a target image;

the layout segmentation module is used for performing layout region segmentation on the target picture to obtain a plurality of segmentation regions;

the region identification module is used for identifying the character region of the target picture according to the textural features of the target picture;

the layout analysis module is used for matching the character areas in the target picture with the plurality of segmentation areas to obtain the character areas contained in each segmentation area;

the content identification module is used for identifying the content of the character area contained in each divided area to obtain the character content of the divided area;

and the output module is used for outputting the text content of each division area.

9. A computer device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor executes the computer instructions to perform the steps of the method of any one of claims 1 to 7.

10. A storage medium having stored thereon computer instructions, wherein said computer instructions when executed perform the steps of the method of any of claims 1 to 7.