CN114266901A - Document contour extraction model construction method, device, equipment and readable storage medium - Google Patents

Document contour extraction model construction method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN114266901A
CN114266901A CN202111600500.5A CN202111600500A CN114266901A CN 114266901 A CN114266901 A CN 114266901A CN 202111600500 A CN202111600500 A CN 202111600500A CN 114266901 A CN114266901 A CN 114266901A
Authority
CN
China
Prior art keywords
document
picture
information
extraction model
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111600500.5A
Other languages
Chinese (zh)
Inventor
陈明理
周彭滔
张新访
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Tianyu Information Industry Co Ltd
Original Assignee
Wuhan Tianyu Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tianyu Information Industry Co Ltd filed Critical Wuhan Tianyu Information Industry Co Ltd
Priority to CN202111600500.5A priority Critical patent/CN114266901A/en
Publication of CN114266901A publication Critical patent/CN114266901A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application relates to a method, a device and equipment for constructing a document contour extraction model and a readable storage medium, relating to the technical field of document detection and comprising the steps of obtaining a picture training set, wherein the picture training set comprises a plurality of sample pictures; labeling each sample picture to obtain a corresponding mask label, and determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set; training a semantic segmentation model based on a new picture training set to generate a document contour extraction model, wherein the semantic segmentation model comprises a learning down-sampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule. The document contour extraction model generated by the method has the advantages that the vertex prediction function is realized while the document region in the picture is segmented, the document contour can be extracted by directly acquiring the vertex information of the document region, and the accuracy of extracting the document contour information is effectively improved.

Description

Document contour extraction model construction method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of document detection technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for constructing a document contour extraction model.
Background
With the improvement of the performance of mobile phones and the rapid development of artificial intelligence, technologies of photographing Recognition of documents, subsequent positioning of handwriting in document materials, wrong-question erasing, Optical Character Recognition (OCR) and emotion analysis and word segmentation in Natural Language Processing (NLP) are gradually applied to daily learning of students. However, if the document material obtained by photographing is directly subjected to handwriting positioning, wrong-question erasing, OCR, emotion analysis in NLP, word segmentation and other processing, the obtained processing result is not satisfactory because of being easily influenced by factors such as the photographing background and environment. Therefore, it becomes important how to perform region detection on the photographed document material to obtain the outline information thereof and correct the outline information. Therefore, the main purpose of extracting the document outline is to acquire the outline information of the document through an algorithm, remove the background information in the picture according to the outline information, and finally correct the document area in the picture by using the outline information, so that the accuracy of the subsequent processing process of the document material obtained by photographing is improved.
In the related art, the method for extracting the document contour can be divided into: 1. segmenting a document region by a semantic segmentation method, analyzing a semantic segmentation result by using a traditional image processing method, and calculating to obtain contour information; 2. text region information in the picture is obtained first through text region detection, and then the outline information is obtained after the offset is calculated according to the center point and the text region.
However, since the result obtained by semantic segmentation may be an irregular graph, accurate contour information may not be obtained by analyzing the irregular graph by using the conventional image processing method; and because the text material obtained by photographing also has various irregular conditions, if the outline result is obtained after the offset is calculated according to the center point and the text area obtained by detecting the text area, the accuracy of the outline result is also poor. Therefore, the problem of poor accuracy exists in the existing method for extracting the document contour information.
Disclosure of Invention
The application provides a document contour extraction model construction method, a document contour extraction model construction device, a document contour extraction model construction equipment and a readable storage medium, and aims to solve the problem that document contour information extraction accuracy is poor in the related technology.
In a first aspect, a method for constructing a document contour extraction model is provided, which includes the following steps:
acquiring a picture training set, wherein the picture training set comprises a plurality of sample pictures;
labeling each sample picture to obtain a corresponding mask label, and determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set;
training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, wherein the semantic segmentation model comprises a learning down-sampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule.
In some embodiments, the calculation formula of the loss function L in the standard classifier is as follows:
L=ω1·Lpoint2·Lmask
in the formula, ω1Denotes the vertex weight, LpointRepresenting the loss, ω, between the region enclosed by the predicted vertices and the region enclosed by the real vertices2Representing the mask weight, LmaskIndicating the loss of the mask.
In some embodiments, after the step of determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set, the method further includes:
judging whether the sample picture in the new picture training set contains an integral document area or not;
if so, performing first enhancement processing on the sample picture in the new picture training set to obtain a first sample picture;
and if not, performing second enhancement processing on the sample pictures in the new picture training set to obtain second sample pictures.
In some embodiments, the semantic segmentation model further includes a splicing module, where the splicing module is configured to splice the high-level semantic information output by the feature fusion module and the low-level semantic information output by the learning downsampling module to obtain spliced information, and the spliced information is used as an input of the standard classifier.
In some embodiments, the labeling each sample picture to obtain a corresponding mask label includes:
performing vertex marking on a document area of each sample picture, and creating a mask background picture for each sample picture according to original picture information;
and filling a region surrounded by a plurality of vertexes on the corresponding sample picture based on the mask background picture to obtain a mask label.
In some embodiments, after the step of training a semantic segmentation model based on the new training set of pictures to generate a document contour extraction model, the method further includes:
acquiring a picture to be predicted, and normalizing the picture to be predicted;
and inputting the normalized picture to be predicted into the document contour extraction model to obtain document contour information of the picture to be predicted and a plurality of vertex information on the document contour.
In some embodiments, after the step of inputting the normalized to-be-predicted picture into the document contour extraction model to obtain document contour information of the to-be-predicted picture and information of a plurality of vertices on the document contour, the method further includes:
calculating the minimum circumscribed rectangle of a region surrounded by a plurality of vertexes on the document outline according to the information of the plurality of vertexes on the document outline;
calculating a perspective transformation matrix according to the vertex information of the minimum circumscribed rectangle and the plurality of vertex information on the document outline;
and correcting the picture to be predicted according to the perspective transformation matrix, and outputting a document area of the picture to be predicted.
In a second aspect, there is provided a document contour extraction model construction apparatus, including:
the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image training set, and the image training set comprises a plurality of sample images;
the processing unit is used for labeling each sample picture to obtain a corresponding mask label, determining each vertex information of the corresponding sample picture according to the mask label, and forming a new picture training set;
and the training unit is used for training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, the semantic segmentation model comprises a learning downsampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule.
In a third aspect, a document contour extraction model building device is provided, including: the document contour extraction model building method comprises a memory and a processor, wherein at least one instruction is stored in the memory, and is loaded and executed by the processor to realize the document contour extraction model building method.
In a fourth aspect, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the aforementioned document contour extraction model construction method.
The beneficial effect that technical scheme that this application provided brought includes: the accuracy of extracting the document outline information can be effectively improved.
The application provides a method, a device and equipment for constructing a document contour extraction model and a readable storage medium, wherein the method comprises the steps of obtaining a picture training set, wherein the picture training set comprises a plurality of sample pictures; labeling each sample picture to obtain a corresponding mask label, and determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set; training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, wherein the semantic segmentation model comprises a learning down-sampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule. According to the method and the device, the vertex regression submodule is integrated in the standard classifier, so that the generated document contour extraction model realizes a vertex prediction function while segmenting the document region in the picture, and further can extract the document contour by directly acquiring the vertex information of the document region, the influence of whether the graph is regular or not or whether the text material is regular or not is avoided, and the accuracy of extracting the document contour information is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flowchart of a document contour extraction model construction method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a document contour extraction model provided in an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a process of document contour extraction and image correction according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a document contour extraction model construction apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a document contour extraction model building device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a method, a device and equipment for constructing a document contour extraction model and a readable storage medium, which can solve the problem of poor document contour information extraction accuracy in the related technology.
Fig. 1 is a document contour extraction model construction method provided in an embodiment of the present application, including the following steps:
step S10: acquiring a picture training set, wherein the picture training set comprises a plurality of sample pictures;
step S20: labeling each sample picture to obtain a corresponding mask label, and determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set;
further, in this embodiment of the application, the labeling each sample picture to obtain a corresponding mask label includes:
performing vertex marking on a document area of each sample picture, and creating a mask background picture for each sample picture according to original picture information;
and filling a region surrounded by a plurality of vertexes on the corresponding sample picture based on the mask background picture to obtain a mask label.
Exemplarily, in the embodiment, each sample picture in the picture training set needs to be preprocessed to generate a training sample required by model training; specifically, a data labeling tool (such as Labelme, which is an offline image labeling tool) can be used for performing multi-vertex labeling on a document region in each sample image, a Mask background image is created according to original image information, a region surrounded by multiple vertices is filled through the Mask background image to generate a Mask label, all vertices are traversed, four vertices with the largest area are calculated, a labeling type is extracted, and finally, a sample and a label required by model training are generated.
Further, in the embodiment of the present application, after step S20, the method further includes the following steps:
judging whether the sample picture in the new picture training set contains an integral document area or not;
if so, performing first enhancement processing on the sample picture in the new picture training set to obtain a first sample picture;
and if not, performing second enhancement processing on the sample pictures in the new picture training set to obtain second sample pictures.
Exemplarily, in the related technology of the document contour extraction method using the conventional image processing, firstly, the binarization processing is performed on the image, secondly, the edge detection is performed on the image by using the binarization result, mainly, an area with a severe pixel value change is searched, then, a closed edge area in the image is searched according to the edge information in the image to be used as a searched contour area, and finally, the vertex coordinates of the contour are output to complete the contour extraction. The inventor finds that when the traditional image processing method is used for processing the picture and calculating to obtain the contour information, the obtained document materials have large difference due to the fact that hardware of photographing equipment, photographing time and scenes where the photographing is located are different. Therefore, if only the conventional image processing method is used to obtain the document contour information, the method cannot be applied to all types of document materials.
Therefore, after various scenes are analyzed, the embodiment performs document distinguishing processing on the training sample data according to the application scenes, namely distinguishing the images with all document outlines from the local document images with incomplete outlines, so that the method is suitable for extracting the outlines of various photographed document materials. Specifically, before model training, data enhancement processing is respectively performed on picture samples containing an integral document area, wherein the picture samples containing the integral document area are subjected to first enhancement processing, namely enhancement processing such as rotation, inversion, color transformation and normalization, and the picture samples not containing the integral document area are subjected to second enhancement processing, namely enhancement processing such as translation, rotation, inversion, color transformation and normalization. Therefore, the image data which do not include the complete boundary information are processed independently, and the accuracy of the image data which include the complete boundary information is not reduced while the model can process various image data through different data enhancement modes.
Step S30: training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, wherein the semantic segmentation model comprises a learning down-sampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule.
Exemplarily, Fast convolutional neural Network (Fast convolutional neural Network) is a deep convolutional neural Network model for Semantic Segmentation task, which is input as a normalized three-channel picture tensor, outputs a binary mask with the same length and width as the input, and the value of each position represents whether the corresponding pixel of the position is a part of a document region; the Fast SCNN model has the characteristics of high speed, low calculation cost and high precision, meanwhile, features in different stages are used in the Fast SCNN model for learning, lower bottom layer features compared with those used in the existing Fast SCNN network structure can be added according to needs, and the features of different features of outlines and contents in document materials can be well processed. The semantic segmentation model designed in this embodiment needs to output document regions and vertex information, where the region extraction part is used to segment the whole document region in the input picture, so that the embodiment improves the document region on the basis of the Fast SCNN model to form a new semantic segmentation model, and trains the new semantic segmentation model through the training samples, thereby generating the document contour extraction model.
Specifically, the network structure of the conventional Fast SCNN only includes a learning down-sampling module, a global feature extraction module, a feature fusion module, and a standard Classifier, and as shown in fig. 2, this embodiment not only has the above modules, but also adds an output branch of vertex regression in the Classifier (i.e. the standard Classifier) for better segmenting the document region and obtaining the vertex information of the document: the vertex regression submodule predicts the vertex information of the document region in the input image through the vertex regression submodule, so that the vertex regression submodule is integrated in the standard classifier, the generated document contour extraction model realizes the vertex prediction function while segmenting the document region in the image, the document contour can be extracted by directly acquiring the vertex information of the document region, the influence of whether the graph is regular or not or whether the text material is regular or not is avoided, and the accuracy of extracting the document contour information is effectively improved.
Furthermore, in this embodiment of the application, the semantic segmentation model further includes a splicing module, where the splicing module is configured to splice the high-level semantic information output by the feature fusion module and the low-level semantic information output by the learning downsampling module to obtain splicing information, and the splicing information is used as an input of the standard classifier.
Exemplarily, in order to more accurately obtain the boundary information of the document to extract the contour region, it is found through sample data feature analysis that the model needs to be processed, if the contour information model needs to be better extracted, not only high-level features (i.e., high-level semantic information) in the document region need to be learned, but also low-level features (i.e., low-level semantic information) of the document boundary need to be learned, because if only high-level features in the document region are learned, the problem of inaccurate contour extraction when the color of the document region is similar to the background color is caused. Therefore, in this embodiment, after the Feature Fusion module of Fast SCNN outputs high-level Feature, the high-level Feature is spliced with the low-level Feature output by the learning downsampling module through the splicing module (i.e., concat in fig. 2), and then the spliced Feature is sent to the Classifier to be processed, so as to improve the extraction accuracy of the document region outline.
Further, in the embodiment of the present application, the calculation formula of the loss function L in the standard classifier is as follows:
L=ω1·Lpoint2·Lmask
in the formula, ω1Denotes the vertex weight, LpointRepresenting the loss, ω, between the region enclosed by the predicted vertices and the region enclosed by the real vertices2Representing the mask weight, LmaskIndicating the loss of the mask.
Exemplarily, the present embodiment further modifies the calculation formula of the loss function L in the standard classifier to:
L=ω1·Lpoint2·Lmask
in the formula, ω1Denotes the vertex weight, LpointCan be LGIoUIt means a GIoU loss (GIoU loss is a generalized IoU (Intersection over Union) loss) between the predicted vertex bounding region and the real vertex bounding region, ω2Representing the mask weight, LmaskCan be LCrossEntropyWhich represents the cross-entropy loss of different classes of masks.
Wherein L isCrossEntropyThe specific calculation formula of (2) is as follows:
Figure BDA0003432961820000091
in the formula, loss represents a cross entropy loss function, x represents a document area, and class represents a Mask type.
LGIoUThe specific calculation formula of (2) is as follows:
LGIoU=1-(GIoU)
Figure BDA0003432961820000092
where IoU represents the cross-over loss, which may reflect the detection effect of the predicted detection box and the actual detection box, AcIndicates that it is to beAnd u represents the area of the union region of the real vertex and the predicted vertex.
After the training, the optimal document contour extraction model with the accuracy rate of 98.59% can be obtained finally, the document region contour and vertex information in the picture can be output accurately through the model, and in the model training process, L in the loss functionpointAnd LmaskThe method has the advantages that the mutual promotion effect is achieved, the prediction effect of the model is improved, and the two different outputs (namely the outline and the vertex information of the document area) are in the mutual promotion process.
Further, in the embodiment of the present application, after the step S30, the method further includes the following steps:
acquiring a picture to be predicted, and normalizing the picture to be predicted;
and inputting the normalized picture to be predicted into the document contour extraction model to obtain document contour information of the picture to be predicted and a plurality of vertex information on the document contour.
Exemplarily, the embodiment firstly performs normalization processing on the picture to be predicted, so as to convert the picture to be predicted into a picture format expected by the document contour extraction model; and then, inputting the normalized picture to be predicted into the document contour extraction model, and outputting a prediction result, namely outputting document contour information of the picture to be predicted and a plurality of vertex information on the document contour.
Furthermore, in this embodiment of the present application, after the step of inputting the normalized to-be-predicted picture into the document contour extraction model to obtain the document contour information of the to-be-predicted picture and the information of the plurality of vertices on the document contour, the method further includes:
calculating the minimum circumscribed rectangle of a region surrounded by a plurality of vertexes on the document outline according to the information of the plurality of vertexes on the document outline;
calculating a perspective transformation matrix according to the vertex information of the minimum circumscribed rectangle and the plurality of vertex information on the document outline;
and correcting the picture to be predicted according to the perspective transformation matrix, and outputting a document area of the picture to be predicted.
Exemplarily, the picture to be predicted is corrected according to the acquired vertex information on the document outline to extract the document outline area, specifically, each vertex coordinate on the document outline is verified, if the vertex coordinates are valid (namely, all the coordinates are in the corresponding document area), the size of the minimum circumscribed rectangle of the area surrounded by the vertex coordinates is obtained according to the vertex coordinates through calculation, a perspective transformation matrix is calculated on the basis of the vertex coordinates of the minimum circumscribed rectangle and the vertex coordinates on the document outline, the picture to be predicted is corrected by using the perspective transformation matrix, and the document area of the picture to be predicted can be output.
Several parts of data preprocessing, network structure adjustment and model training, document contour and vertex prediction, contour extraction and picture correction of the document contour extraction model according to the embodiment of the present application are described below with reference to fig. 3.
Step N1: inputting a plurality of sample pictures photographed under each scene containing the document area.
Step N2: and labeling each obtained sample picture by using a labeling tool to obtain a Mask label corresponding to the sample picture, and generating four vertexes which can contain Mask label areas according to a labeling result. The Mask label is an area with complete boundary information contained in the sample picture, and if the complete boundary information does not exist, the Mask label is an integral document area in the sample picture; four vertexes containing Mask label areas are four vertexes which can enclose the largest area after all vertexes on the sample picture are subjected to traversal calculation.
Step N3: judging whether the sample picture contains the whole document area, performing enhancement processing such as rotation, inversion, color transformation and normalization on the sample picture containing the whole document area, and performing enhancement processing such as translation, rotation, inversion, color transformation and normalization on the sample picture not containing the whole document area. The method for judging whether the sample picture contains the whole document area is to distinguish the sample picture containing the whole document area in a uniform naming mode in the process of sample marking, namely whether the sample picture contains the whole document area can be judged only by the name of the sample picture.
Step N4: and (4) inputting all the training samples obtained in the step N3 into a semantic segmentation network structure which is added with low-level features and output branches of vertex regression and modifies a calculation mode of a loss function to carry out model training, and finally obtaining an optimal document contour extraction model with the accuracy of 98.59%.
Step N5: and acquiring a picture to be predicted, and carrying out corresponding processing on the picture. The processing of the picture to be predicted is normalization processing of the input picture of the verification set in the model training process, and the purpose is to convert the picture to be predicted into a picture format expected by model prediction.
Step N6: and predicting the picture to be predicted input in the step N5 by using the document contour extraction model obtained in the step N4, and outputting a prediction result, namely document contour information and a plurality of vertex information on the document contour.
Step N7: and correcting the picture to be predicted by using the vertex information acquired in the step N6 to extract a document outline region of the picture to be predicted.
Therefore, the embodiment of the application provides a method for directly outputting the contour information by predicting four vertexes of the document contour, and compared with the traditional image processing method, the method is applicable to document materials obtained from various shooting scenes and also applicable to document materials which cannot be processed by the traditional image processing and do not contain the whole document area; in the method for segmenting the document region by the semantic segmentation method, analyzing the semantic segmentation result by using the traditional image processing and calculating to obtain the contour information, the semantic segmentation result may be an irregular graph, and the problem that the accurate result cannot be obtained by using the traditional image processing and analyzing is solved; in addition, aiming at the problem of low accuracy caused by obtaining text region information in the picture firstly through text region detection and obtaining contour information after calculating offset according to the central point and the text region, the embodiment of the application can directly segment the document region in the picture and increase a vertex prediction function on the basis, so that adverse effects caused by various irregular conditions of text materials can be ignored.
Referring to fig. 4, an embodiment of the present application further provides a document contour extraction model building apparatus, including:
the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image training set, and the image training set comprises a plurality of sample images;
the processing unit is used for labeling each sample picture to obtain a corresponding mask label, determining each vertex information of the corresponding sample picture according to the mask label, and forming a new picture training set;
and the training unit is used for training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, the semantic segmentation model comprises a learning downsampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule.
According to the method and the device, the vertex regression submodule is integrated in the standard classifier, so that the generated document contour extraction model realizes a vertex prediction function while segmenting the document region in the picture, and further can extract the document contour by directly acquiring the vertex information of the document region, the influence of whether the graph is regular or not or whether the text material is regular or not is avoided, and the accuracy of extracting the document contour information is effectively improved.
Further, in the embodiment of the present application, the calculation formula of the loss function L in the standard classifier is as follows:
L=ω1·Lpoint2·Lmask
in the formula, ω1Denotes the vertex weight, LpointRepresenting the loss, ω, between the region enclosed by the predicted vertices and the region enclosed by the real vertices2Representing the mask weight, LmaskIndicating loss of mask。
Further, in an embodiment of the present application, the processing unit is further configured to:
judging whether the sample picture in the new picture training set contains an integral document area or not;
if so, performing first enhancement processing on the sample picture in the new picture training set to obtain a first sample picture;
and if not, performing second enhancement processing on the sample pictures in the new picture training set to obtain second sample pictures.
Furthermore, in this embodiment of the application, the semantic segmentation model further includes a splicing module, the splicing module is used for splicing the high-level semantic information output by the feature fusion module and the low-level semantic information output by the learning down-sampling module to obtain splicing information, and the splicing information is used as the input of the standard classifier.
Further, in an embodiment of the present application, the processing unit is specifically configured to:
performing vertex marking on a document area of each sample picture, and creating a mask background picture for each sample picture according to original picture information;
and filling a region surrounded by a plurality of vertexes on the corresponding sample picture based on the mask background picture to obtain a mask label.
Further, in an embodiment of the present application, the processing unit is further configured to:
acquiring a picture to be predicted, and normalizing the picture to be predicted;
and inputting the normalized picture to be predicted into the document contour extraction model to obtain document contour information of the picture to be predicted and a plurality of vertex information on the document contour.
Further, in an embodiment of the present application, the apparatus further includes a correction unit configured to:
calculating the minimum circumscribed rectangle of a region surrounded by a plurality of vertexes on the document outline according to the information of the plurality of vertexes on the document outline;
calculating a perspective transformation matrix according to the vertex information of the minimum circumscribed rectangle and the plurality of vertex information on the document outline;
and correcting the picture to be predicted according to the perspective transformation matrix, and outputting a document area of the picture to be predicted.
Illustratively, referring to fig. 2, the document contour extraction model formed in the embodiment of the present application includes a Learning to Down-sample module (Learning to Down-sample), a Global Feature extraction module (Global Feature Extractor), a Feature Fusion module (Feature Fusion), a concatenation module, and a standard Classifier (Classifier). Wherein the different shapes in fig. 2 represent different objects, e.g., a triangle represents a ConV2D convolutional layer, a cuboid represents a DSConV depth separable convolutional layer, etc.; specifically, the learning downsampling module includes an Input layer (i.e., Input), 1 normal ConV2D convolution layer, and 2 DSConV depth separable convolution layers; the global feature extraction module comprises 8 Bottleneck convolutional layers and 1 Pyramid Pooling convolutional layer, wherein the Pyramid Pooling convolutional layers are used for extracting context features with different scales; the feature fusion module comprises 1 upsamplsample convolutional layer, 1 DWConv convolutional layer and 2 Conv2D convolutional layers; the splicing module is used for splicing the high-level semantic information output by the feature fusion module and the low-level semantic information output by the learning down-sampling module to obtain splicing information, and the splicing information is used as the input of the standard classifier; the standard classifier includes 2 DSConV depth separable convolutional layers, 1 ConV2D convolutional layers, 1 upsamplsample convolutional layer, 1 vertex regression submodule including 1 DSConV depth separable convolutional layer for performing regression on vertices and outputting point information to predict vertex information of a document in a picture to be predicted, and an output layer (i.e., Softmax is a normalized exponential function).
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus and the units described above may refer to the corresponding processes in the foregoing embodiment of the document contour extraction model construction method, and are not described herein again.
The apparatus provided by the above embodiment can be implemented in the form of a computer program, which can be run on the document outline extraction model construction device as shown in fig. 5.
The embodiment of the present application further provides a document contour extraction model building device, including: the system comprises a memory, a processor and a network interface which are connected through a system bus, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor so as to realize all or part of the steps of the document outline extraction model building method.
The network interface is used for performing network communication, such as sending distributed tasks. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The Processor may be a CPU, other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the computer device and the various interfaces and lines connecting the various parts of the overall computer device.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the computer device by executing or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a video playing function, an image playing function, etc.), and the like; the storage data area may store data (such as video data, image data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, all or part of the steps of the aforementioned document contour extraction model construction method are implemented.
The embodiments of the present application may implement all or part of the foregoing processes, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the foregoing methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-Only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, server, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for constructing a document contour extraction model is characterized by comprising the following steps:
acquiring a picture training set, wherein the picture training set comprises a plurality of sample pictures;
labeling each sample picture to obtain a corresponding mask label, and determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set;
training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, wherein the semantic segmentation model comprises a learning down-sampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule.
2. The method for constructing the document contour extraction model according to claim 1, wherein the loss function L in the standard classifier is calculated as follows:
L=ω1·Lpoint2·Lmask
in the formula, ω1Denotes the vertex weight, LpointRepresenting the loss, ω, between the region enclosed by the predicted vertices and the region enclosed by the real vertices2Representing the mask weight, LmaskIndicating the loss of the mask.
3. The method for constructing a document contour extraction model according to claim 1, wherein after the step of determining each vertex information of the corresponding sample picture according to the mask label to form a new picture training set, the method further comprises:
judging whether the sample picture in the new picture training set contains an integral document area or not;
if so, performing first enhancement processing on the sample picture in the new picture training set to obtain a first sample picture;
and if not, performing second enhancement processing on the sample pictures in the new picture training set to obtain second sample pictures.
4. The document contour extraction model construction method of claim 1, characterized in that: the semantic segmentation model further comprises a splicing module, wherein the splicing module is used for splicing the high-level semantic information output by the feature fusion module and the low-level semantic information output by the learning down-sampling module to obtain splicing information, and the splicing information is used as the input of the standard classifier.
5. The method for constructing the document contour extraction model according to claim 1, wherein the labeling each sample picture to obtain a corresponding mask label comprises:
performing vertex marking on a document area of each sample picture, and creating a mask background picture for each sample picture according to original picture information;
and filling a region surrounded by a plurality of vertexes on the corresponding sample picture based on the mask background picture to obtain a mask label.
6. The method for constructing a document contour extraction model according to claim 1, wherein after the step of training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, the method further comprises:
acquiring a picture to be predicted, and normalizing the picture to be predicted;
and inputting the normalized picture to be predicted into the document contour extraction model to obtain document contour information of the picture to be predicted and a plurality of vertex information on the document contour.
7. The method for constructing the document contour extraction model according to claim 6, wherein after the step of inputting the normalized image to be predicted into the document contour extraction model to obtain the document contour information of the image to be predicted and the information of a plurality of vertices on the document contour, the method further comprises:
calculating the minimum circumscribed rectangle of a region surrounded by a plurality of vertexes on the document outline according to the information of the plurality of vertexes on the document outline;
calculating a perspective transformation matrix according to the vertex information of the minimum circumscribed rectangle and the plurality of vertex information on the document outline;
and correcting the picture to be predicted according to the perspective transformation matrix, and outputting a document area of the picture to be predicted.
8. A document outline extraction model construction device is characterized by comprising:
the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image training set, and the image training set comprises a plurality of sample images;
the processing unit is used for labeling each sample picture to obtain a corresponding mask label, determining each vertex information of the corresponding sample picture according to the mask label, and forming a new picture training set;
and the training unit is used for training a semantic segmentation model based on the new picture training set to generate a document contour extraction model, the semantic segmentation model comprises a learning downsampling module, a global feature extraction module, a feature fusion module and a standard classifier, and the standard classifier comprises a vertex regression submodule.
9. A document outline extraction model construction device, comprising: a memory and a processor, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the document outline extraction model construction method of any one of claims 1 to 7.
10. A computer-readable storage medium characterized by: the computer storage medium stores a computer program which, when executed by a processor, implements the document contour extraction model construction method of any one of claims 1 to 7.
CN202111600500.5A 2021-12-24 2021-12-24 Document contour extraction model construction method, device, equipment and readable storage medium Pending CN114266901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111600500.5A CN114266901A (en) 2021-12-24 2021-12-24 Document contour extraction model construction method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111600500.5A CN114266901A (en) 2021-12-24 2021-12-24 Document contour extraction model construction method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114266901A true CN114266901A (en) 2022-04-01

Family

ID=80829898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111600500.5A Pending CN114266901A (en) 2021-12-24 2021-12-24 Document contour extraction model construction method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114266901A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099855A (en) * 2022-06-23 2022-09-23 广州华多网络科技有限公司 Method for preparing advertising pattern creation model and device, equipment, medium and product thereof
CN117456076A (en) * 2023-10-30 2024-01-26 神力视界(深圳)文化科技有限公司 Material map generation method and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099855A (en) * 2022-06-23 2022-09-23 广州华多网络科技有限公司 Method for preparing advertising pattern creation model and device, equipment, medium and product thereof
CN117456076A (en) * 2023-10-30 2024-01-26 神力视界(深圳)文化科技有限公司 Material map generation method and related equipment

Similar Documents

Publication Publication Date Title
CN108304835B (en) character detection method and device
CN108229303B (en) Detection recognition and training method, device, equipment and medium for detection recognition network
CN111461212A (en) Compression method for point cloud target detection model
CN114266901A (en) Document contour extraction model construction method, device, equipment and readable storage medium
CN112418278A (en) Multi-class object detection method, terminal device and storage medium
CN114419570B (en) Point cloud data identification method and device, electronic equipment and storage medium
CN112101386B (en) Text detection method, device, computer equipment and storage medium
CN110969641A (en) Image processing method and device
CN116994021A (en) Image detection method, device, computer readable medium and electronic equipment
CN115797731A (en) Target detection model training method, target detection model detection method, terminal device and storage medium
CN115270184A (en) Video desensitization method, vehicle video desensitization method and vehicle-mounted processing system
CN114842478A (en) Text area identification method, device, equipment and storage medium
CN112966676B (en) Document key information extraction method based on zero sample learning
CN111898544B (en) Text image matching method, device and equipment and computer storage medium
CN113744280A (en) Image processing method, apparatus, device and medium
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
CN117078970A (en) Picture identification method and device, electronic equipment and storage medium
CN116843901A (en) Medical image segmentation model training method and medical image segmentation method
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN110610177A (en) Training method of character recognition model, character recognition method and device
CN114676705A (en) Dialogue relation processing method, computer and readable storage medium
CN113139617A (en) Power transmission line autonomous positioning method and device and terminal equipment
CN112149523A (en) Method and device for OCR recognition and picture extraction based on deep learning and co-searching algorithm, electronic equipment and storage medium
CN112419249A (en) Special clothing picture conversion method, terminal device and storage medium
CN112241994A (en) Model training method, rendering device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination