CN112560857A - Character area boundary detection method, equipment, storage medium and device - Google Patents

Character area boundary detection method, equipment, storage medium and device Download PDF

Info

Publication number
CN112560857A
CN112560857A CN202110190870.XA CN202110190870A CN112560857A CN 112560857 A CN112560857 A CN 112560857A CN 202110190870 A CN202110190870 A CN 202110190870A CN 112560857 A CN112560857 A CN 112560857A
Authority
CN
China
Prior art keywords
image
text
boundary detection
region
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110190870.XA
Other languages
Chinese (zh)
Other versions
CN112560857B (en
Inventor
操晓春
代朋纹
张华�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202110190870.XA priority Critical patent/CN112560857B/en
Publication of CN112560857A publication Critical patent/CN112560857A/en
Application granted granted Critical
Publication of CN112560857B publication Critical patent/CN112560857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a character region boundary detection method, a device, a storage medium and a device, compared with the prior scene character detection method only by exploring the expression form of characters with any shape or enhancing the characteristic expression, the invention extracts the characteristics of an image to be processed through a preset backbone network to obtain the image characteristics, determines an initial candidate region according to the image characteristics and a preset region suggestion network, performs pooling processing on the initial candidate region to obtain a first fixed characteristic and a second fixed characteristic, analyzes the first fixed characteristic through a preset character region adjustment network to obtain a characteristic analysis result, segments the network according to a preset character mask, determines a target character region boundary detection result according to the second fixed characteristic and the characteristic analysis result, and overcomes the defect that the region boundary of characters with any shape cannot be accurately identified in the prior art, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy of the regional boundary detection of the characters is improved.

Description

Character area boundary detection method, equipment, storage medium and device
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a method, an apparatus, a storage medium, and a device for detecting a text region boundary.
Background
In the prior art, in order to detect characters in an arbitrary shape scene, an effort is usually made to explore expression forms of the characters in the arbitrary shape, for example, how to better learn attributes of pixel points or character segments and relationships between the pixel points or the character segments to distinguish character regions, or to enhance feature expression, for example, combining features of different granularities or learning context features.
However, in the prior art, the boundary of the region of the characters in any shape cannot be accurately identified, so that the detection accuracy rate of the characters in any shape scene is low, and the reliability is poor.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, equipment, a storage medium and a device for detecting the boundary of a text area, and aims to solve the technical problem of how to optimize the text area boundary detection process.
In order to achieve the above object, the present invention provides a text region boundary detection method, which includes the following steps:
acquiring an image to be processed, and extracting the features of the image to be processed through a preset backbone network to obtain image features;
determining an initial candidate region according to the image features and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature;
analyzing the first fixed characteristic through a preset character area adjusting network to obtain a characteristic analysis result;
determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic;
and adjusting the initial character area boundary detection result according to the characteristic analysis result to obtain a target character area boundary detection result.
Preferably, the step of adjusting the initial text region boundary detection result according to the feature analysis result to obtain a target text region boundary detection result specifically includes:
obtaining confidence and position offset from the feature analysis result, and determining an initial character mask according to the initial character region boundary and the position offset;
determining a word mask overlapping rate according to the initial word mask, and determining a target word mask according to the word mask overlapping rate and the confidence;
and carrying out boundary detection on the target character mask to obtain a detection result, and determining a target character area boundary detection result according to the detection result.
Preferably, the step of determining a word mask overlap ratio according to the initial word mask and determining a target word mask according to the word mask overlap ratio and the confidence degree specifically includes:
determining the word mask overlapping rate according to the initial word mask, and judging whether the word mask overlapping rate is greater than a preset threshold value;
when the word mask overlapping rate is larger than a preset threshold, sequencing the initial word mask according to the confidence coefficient to obtain a sequencing result;
and screening the initial character mask according to the sorting result to obtain a target character mask.
Preferably, the step of determining an initial candidate region according to the image feature and a preset region suggested network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature specifically includes:
determining an initial candidate area according to the image characteristics and a preset area suggestion network;
analyzing the initial candidate region through a preset deformation interest region pooling model to obtain deformation offset;
and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
Preferably, before the step of obtaining the image to be processed and performing feature extraction on the image to be processed through a preset backbone network to obtain image features, the text region boundary detection method further includes:
obtaining an initial sample image, and carrying out scale adjustment on the initial sample image to obtain a sample image to be processed;
performing image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed;
performing feature analysis on the sub-image to be processed to obtain a positive sample sub-image and a negative sample sub-image;
and training an initial area suggested network according to the positive sample subimage and the negative sample subimage to obtain a preset area established network.
Preferably, the step of performing feature analysis on the sub-image to be processed to obtain a positive example sample sub-image and a negative example sample sub-image specifically includes:
acquiring the image size of the sub-image to be processed, and searching a threshold range corresponding to the image size;
analyzing the sub-image to be processed to obtain a character enclosing frame;
and acquiring the boundary length of the text bounding box, and determining a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range.
Preferably, before the step of dividing the network according to the preset word mask and determining the initial word region boundary detection result according to the second fixed feature, the word region boundary detection method further includes:
determining a shape structure constraint function according to the image to be processed;
and training the initial character mask segmentation network according to the shape structure constraint function to obtain a preset character mask segmentation network.
In addition, in order to achieve the above object, the present invention further provides a text region boundary detection apparatus, which includes a memory, a processor, and a text region boundary detection program stored in the memory and executable on the processor, wherein the text region boundary detection program is configured to implement the steps of the text region boundary detection method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium, on which a text region boundary detection program is stored, wherein the text region boundary detection program, when executed by a processor, implements the steps of the text region boundary detection method as described above.
In order to achieve the above object, the present invention further provides a character region boundary detection device, including: the device comprises an acquisition module, a processing module, an analysis module, a detection module and an adjustment module;
the acquisition module is used for acquiring an image to be processed and extracting the characteristics of the image to be processed through a preset backbone network to obtain image characteristics;
the processing module is used for determining an initial candidate region according to the image features and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature;
the analysis module is used for analyzing the first fixed characteristic through a preset character area adjustment network to obtain a characteristic analysis result;
the detection module is used for determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic;
and the adjusting module is used for adjusting the initial character region boundary detection result according to the characteristic analysis result to obtain a target character region boundary detection result.
Compared with the prior method for detecting the scene characters only by researching the expression form of the characters with any shape or enhancing the characteristic expression, the method obtains the image characteristics by obtaining the image to be processed and extracting the characteristics of the image to be processed through the preset main network, determines the initial candidate area according to the image characteristics and the preset area suggestion network, performs pooling processing on the initial candidate area to obtain the first fixed characteristic and the second fixed characteristic, analyzes the first fixed characteristic through the preset character area adjustment network to obtain the characteristic analysis result, divides the network according to the preset character mask and determines the initial character area boundary detection result according to the second fixed characteristic, adjusts the initial character area boundary detection result according to the characteristic analysis result to obtain the target character area boundary detection result, and overcomes the defect that the area boundary of the characters with any shape can not be accurately identified in the prior art, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy and reliability of the regional boundary detection of the characters are improved, so that the requirement of scene character detection is met.
Drawings
Fig. 1 is a schematic structural diagram of a text region boundary detection device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a text region boundary detection method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a text region boundary detection method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a text region boundary detection method according to a third embodiment of the present invention;
fig. 5 is a block diagram of a first embodiment of a text region boundary detection apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a text region boundary detection device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the text region boundary detecting apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the text region boundary detection apparatus and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in FIG. 1, memory 1005, identified as one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a text region boundary detection program.
In the text area boundary detection device shown in fig. 1, the network interface 1004 is mainly used for connecting a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the text area boundary detection device calls a text area boundary detection program stored in the memory 1005 through the processor 1001, and executes the text area boundary detection method provided by the embodiment of the present invention.
Based on the hardware structure, the embodiment of the character area boundary detection method is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a text region boundary detection method according to a first embodiment of the present invention.
Step S10: acquiring an image to be processed, and extracting the features of the image to be processed through a preset backbone network to obtain image features.
It should be understood that the main execution body of the embodiment is the text region boundary detection device, where the text region boundary detection device may be an electronic device such as a computer and a server, or may also be other devices that can achieve the same or similar functions.
It should be noted that the image to be processed may be a scene image input by a user through a user interaction interface of the text region boundary detection device, or may also be a scene image input by a user through a terminal device that establishes a communication connection with the text region boundary detection device in advance, which is not limited in this embodiment.
The preset backbone network is an image feature extraction network preset by a user, and in this embodiment, a ResNet-101 network embedded with a deformed convolution is taken as an example for description, which is not limited in this embodiment.
In a specific implementation, for example, an image to be processed is acquired, and the ResNet-101 network embedded with deformation convolution is used as a backbone network to extract image features.
Step S20: and determining an initial candidate region according to the image features and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature.
It can be understood that, determining an initial candidate region according to the image feature and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature may be determining an initial candidate region according to the image feature and the preset region suggestion network, analyzing the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
In a specific implementation, for example, an image feature is processed through a Region suggestion Network (RPN) to generate a candidate Region, and in the process of mapping the candidate Region to a fixed size, a deformation offset is learned by using a position-aware deformation Region-of-Interest (drosi) pooling technique to generate a more accurate alignment feature. Here, two different fixed-size features are generated using two DROI pooling layers that do not share learning parameters
Figure 389152DEST_PATH_IMAGE001
Step S30: and analyzing the first fixed characteristic through a preset character area adjusting network to obtain a characteristic analysis result.
It should be noted that the preset Text adjustment Network may be a regional adjustment Network preset by a user, and in this embodiment, a Text Region adjustment Network (TRRN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 24664DEST_PATH_IMAGE002
Input to a Text Region Reference Network (TRRN). The structure of the text region adjustment network is the same as that of the Mask RCNN, but the difference is that the text region adjustment network aims at the 2-category case. The TRRN is configured to obtain the confidence and the position offset of the adjusted text region.
Step S40: and determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic.
It should be noted that the preset Text Mask Segmentation Network may be a Segmentation Network preset by a user, and in this embodiment, a Text Mask Segmentation Network (TMSN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 291697DEST_PATH_IMAGE003
Input into a Text Mask Segmentation Network (TMSN). The structure of the character Mask segmentation network is consistent with that of the Mask RCNN, and the TMSN is used for obtaining segmentation results of scene characters in any shapes.
Step S50: and adjusting the initial character area boundary detection result according to the characteristic analysis result to obtain a target character area boundary detection result.
It should be understood that, adjusting the initial text region boundary detection result according to the feature analysis result to obtain the target text region boundary detection result may be obtaining a confidence degree and a position offset from the feature analysis result, determining an initial text mask according to the initial text region boundary and the position offset, determining a text mask overlap rate according to the initial text mask, determining a target text mask according to the text mask overlap rate and the confidence degree, performing boundary detection on the target text mask to obtain a detection result, and determining the target text region boundary detection result according to the detection result.
Further, in order to improve the accuracy and reliability of the target word mask, the determining the word mask overlap ratio according to the initial word mask and determining the target word mask according to the word mask overlap ratio and the confidence includes:
determining a word mask overlapping rate according to the initial word mask, judging whether the word mask overlapping rate is greater than a preset threshold, when the word mask overlapping rate is greater than the preset threshold, sorting the initial word mask according to the confidence to obtain a sorting result, and screening the initial word mask according to the sorting result to obtain a target word mask.
In a specific implementation, for example, the absolute position of the character region of any shape in the input image is obtained according to the divided binary image and the corresponding position of the character region. And calculating the overlapping rate mode of the two character masks as max (O/A, O/B) by a Non-Maximum Suppression (NMS) of the maximized intersection region, wherein O is the overlapping area of the two masks, and A and B are the areas of the two masks respectively. When the overlap ratio is greater than 0.8, one of the two masks with lower confidence is removed.
Compared with the existing method for detecting the scene characters only by researching the expression form of the characters with any shape or enhancing the feature expression, in the embodiment, the image to be processed is obtained by acquiring the image to be processed and extracting the features of the image to be processed through the preset backbone network, the image features are obtained, the initial candidate region is determined according to the image features and the preset region suggestion network, the initial candidate region is subjected to pooling processing to obtain the first fixed feature and the second fixed feature, the first fixed feature is analyzed through the preset character region adjustment network to obtain the feature analysis result, the network and the second fixed feature are divided according to the preset character mask to determine the initial character region boundary detection result, the initial character region boundary detection result is adjusted according to the feature analysis result to obtain the target character region boundary detection result, and the defect that the region boundary of the characters with any shape cannot be accurately identified in the prior art is overcome, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy and reliability of the regional boundary detection of the characters are improved, so that the requirement of scene character detection is met.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the text region boundary detection method according to the present invention, and the second embodiment of the text region boundary detection method according to the present invention is proposed based on the first embodiment illustrated in fig. 2.
In the second embodiment, the step S20 includes:
step S201: and determining an initial candidate region according to the image characteristics and a preset region suggestion network.
It should be noted that the preset area recommendation Network may be a processing Network preset by a user, and in this embodiment, an area recommendation Network (RPN) is taken as an example for description, which is not limited in this embodiment.
In a specific implementation, for example, the image features are processed through a Region suggestion Network (RPN) to generate candidate regions.
Step S202: and analyzing the initial candidate region through a preset deformation interest region pooling model to obtain deformation offset.
It should be noted that the preset deformed Interest Region pooling model may be a pooling processing model preset by a user, and in this embodiment, a position-sensitive deformed Interest Region (drosi) pooling model is taken as an example for description, which is not limited in this embodiment.
Step S203: and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
In a specific implementation, for example, in the process of mapping a text Region of an arbitrary size to a fixed size, a deformation offset is learned by using a position-aware deformation Region-of-Interest (drosi) pooling technique, so that a more accurate alignment feature is generated. Here, two different fixed-size features are generated using two DROI pooling layers that do not share learning parameters
Figure 490597DEST_PATH_IMAGE001
In a second embodiment, an initial candidate region is determined according to the image features and a preset region suggestion network, the initial candidate region is analyzed through a preset deformation interest region pooling model to obtain a deformation offset, the initial candidate region is pooled according to the deformation offset to obtain a first fixed feature and a second fixed feature, and therefore the candidate region with any scale can be mapped into the feature with a fixed size.
In the second embodiment, the step S50 includes:
step S501: and obtaining confidence coefficient and position offset from the feature analysis result, and determining an initial word mask according to the initial word region boundary and the position offset.
It is understood that the obtaining of the confidence and the position offset from the feature analysis result may be performing feature extraction on the feature analysis result, obtaining a text feature, and determining the confidence and the position offset according to the text feature.
Step S502: and determining a word mask overlapping rate according to the initial word mask, and determining a target word mask according to the word mask overlapping rate and the confidence.
It can be understood that, determining a word mask overlap ratio according to the initial word mask, and determining a target word mask according to the word mask overlap ratio and the confidence level may be determining a word mask overlap ratio according to the initial word mask, and determining whether the word mask overlap ratio is greater than a preset threshold, when the word mask overlap ratio is greater than the preset threshold, sorting the initial word mask according to the confidence level to obtain a sorting result, and screening the initial word mask according to the sorting result to obtain the target word mask.
Further, in order to improve the accuracy and reliability of the target word mask, the step S502 includes:
determining the word mask overlapping rate according to the initial word mask, and judging whether the word mask overlapping rate is greater than a preset threshold value;
when the word mask overlapping rate is larger than a preset threshold, sequencing the initial word mask according to the confidence coefficient to obtain a sequencing result;
and screening the initial character mask according to the sorting result to obtain a target character mask.
In a specific implementation, for example, the absolute position of the character region of any shape in the input image is obtained according to the divided binary image and the corresponding position of the character region. And calculating the overlapping rate mode of the two character masks as max (O/A, O/B) by a Non-Maximum Suppression (NMS) of the maximized intersection region, wherein O is the overlapping area of the two masks, and A and B are the areas of the two masks respectively. When the overlap ratio is greater than 0.8, one of the two masks with lower confidence is removed.
Step S503: and carrying out boundary detection on the target character mask to obtain a detection result, and determining a target character area boundary detection result according to the detection result.
It can be understood that, the boundary detection of the target word mask is performed to obtain a detection result, and the determination of the boundary detection result of the target word area according to the detection result may be that the boundary detection of the target word mask is performed, the boundary of the target word mask is determined according to the detection result, and the boundary of the target word mask is used as the boundary detection result of the target word area.
In a second embodiment, the confidence and the position offset are obtained from the feature analysis result, an initial word mask is determined according to the initial word region boundary and the position offset, a word mask overlapping rate is determined according to the initial word mask, a target word mask is determined according to the word mask overlapping rate and the confidence, boundary detection is performed on the target word mask, a detection result is obtained, and a target word region boundary detection result is determined according to the detection result, so that the accuracy of the target word region boundary detection result can be improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating a text region boundary detection method according to a third embodiment of the present invention, and the third embodiment of the text region boundary detection method is proposed based on the first embodiment shown in fig. 2.
In the third embodiment, before the step S20, the method further includes:
step S110: obtaining an initial sample image, and carrying out scale adjustment on the initial sample image to obtain a sample image to be processed.
It should be noted that the initial sample image may be a sample image input by a user through a text region boundary detection device, which is not limited in this embodiment.
In a specific implementation, for example, the scaling of the initial sample image is performed, and the obtaining of the sample image to be processed may be the scaling of the initial sample image to obtain three scales
Figure 159476DEST_PATH_IMAGE004
To be processed sample image.
Step S120: and carrying out image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed.
It should be noted that the preset sliding window may be a sliding window preset by a user, and in this embodiment, a sliding window of 512 × 512 is taken as an example for description.
In a specific implementation, for example, for each scale of the image, a 512x512 window is slid to generate the sub-image.
Step S130: and performing characteristic analysis on the sub-image to be processed to obtain a positive sample sub-image and a negative sample sub-image.
It can be understood that, the step of performing feature analysis on the to-be-processed sub-image to obtain the positive example sample sub-image and the negative example sample sub-image may be to obtain an image size of the to-be-processed sub-image, find a threshold range corresponding to the image size, analyze the to-be-processed sub-image to obtain a text enclosure frame, obtain each boundary length of the text enclosure frame, and determine the positive example sample sub-image and the negative example sample sub-image according to the boundary length and the threshold range.
Further, the step S03 includes:
acquiring the image size of the sub-image to be processed, and searching a threshold range corresponding to the image size;
analyzing the sub-image to be processed to obtain a character enclosing frame;
and acquiring the boundary length of the text bounding box, and determining a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range.
It is understood that the threshold range corresponding to the image size may be searched in a preset mapping table. The preset mapping table includes a corresponding relationship between the image size and the threshold range, which is not limited in this embodiment.
In particular embodimentsFor example, a range is designed for each scale
Figure 215157DEST_PATH_IMAGE005
When the shortest side of the text bounding box falls within the range, the texts will participate in the training process, and we will write the texts as
Figure 969486DEST_PATH_IMAGE006
. Then, cover
Figure 706498DEST_PATH_IMAGE006
The largest number of sub-images will be selected as the positive example sub-images
Figure 229883DEST_PATH_IMAGE007
. To select the negative example sub-image, a negative example sub-image mining technique is employed. Specifically, the generated normal example image is first utilized
Figure 971312DEST_PATH_IMAGE007
Training a regional recommendation Network (RPN) generates candidate boxes. Then remove the quilt
Figure 212937DEST_PATH_IMAGE007
Candidate frame for covering when the sub-image area is covered
Figure 488061DEST_PATH_IMAGE008
Candidate frames in the range, these sub-images are referred to as negative example images at that scale
Figure 131532DEST_PATH_IMAGE009
. In the learning process, the size of each sub-image region is 512x512, and the size of each minipatch is 10, where the ratio of positive and negative example images is 4: 1. Since many positive example sub-pictures contain only a small number of valid words, it results in a limited number of positive samples in the RPN. The number of positive samples in the RPN is increased by the segment of the text. Specifically, when the overlapping rate of the first check box and the effective character area is larger than a threshold value of 0.7,and the horizontal distance of the overlap region is not less than 1/3 of the horizontal distance of the entire text region, the prior box may be taken as a positive sample. Furthermore, for invalid text in the sub-image, we use the same method to obtain the prior frame, which is removed from the negative sample to reduce the ambiguity of the negative sample.
Step S140: and training an initial area suggested network according to the positive sample subimage and the negative sample subimage to obtain a preset area established network.
It should be noted that the initial area recommendation network may be an area recommendation network to be trained, which is preset by a user, and this embodiment is not limited thereto.
In a third embodiment, an initial sample image is obtained, the initial sample image is subjected to scale adjustment to obtain a sample image to be processed, the sample image to be processed is subjected to image extraction through a preset sliding window to obtain a sample sub-image to be processed, the sub-image to be processed is subjected to feature analysis to obtain a positive sample sub-image and a negative sample sub-image, an initial area recommendation network is trained according to the positive sample sub-image and the negative sample sub-image to obtain a preset area establishment network, and therefore the generalization capability of the area recommendation network can be improved under the condition of limited data.
In the third embodiment, before the step S40, the method further includes:
step S310: and determining a shape structure constraint function according to the image to be processed.
In a particular implementation, for example, determining the Shape Structure Constraint function from the image to be processed may be for Shape Structure Constraints (SSCs) that encourage similarity between network-generated text regions and text region truth values, and between network-generated background regions and background region truth values. The shape structure constraint as an auxiliary function favors the global perception of the text area of the network over the popular pixel-level cross entropy loss function. The loss function is calculated as follows:
Figure 529015DEST_PATH_IMAGE010
where C represents a category, we set C =2, i.e. both text and background.
Figure 257937DEST_PATH_IMAGE011
A matrix averaging operation is represented.
Figure 336751DEST_PATH_IMAGE012
Representing multiplication of corresponding elements of the matrix.
Figure 834729DEST_PATH_IMAGE013
Figure 419425DEST_PATH_IMAGE014
The factors representing the stable fractional division are set in the experiment respectively
Figure 370063DEST_PATH_IMAGE015
,
Figure 252569DEST_PATH_IMAGE016
Figure 605053DEST_PATH_IMAGE017
Figure 609918DEST_PATH_IMAGE018
The average graphs of the class c prediction graph and the truth graph are respectively shown.
Figure 47852DEST_PATH_IMAGE019
Figure 468469DEST_PATH_IMAGE020
The variance diagrams of the class c prediction diagram and the truth diagram are respectively shown.
Figure 941039DEST_PATH_IMAGE021
Showing a covariance graph between the class c prediction graph and the truth graph. The calculation method is as follows:
Figure 366073DEST_PATH_IMAGE022
Figure 291304DEST_PATH_IMAGE023
Figure 515612DEST_PATH_IMAGE024
wherein
Figure 577109DEST_PATH_IMAGE025
Figure 923776DEST_PATH_IMAGE026
The prediction graph and the truth graph of the class c are respectively, and the size of the prediction graph and the truth graph is 28x 28.
Figure 336303DEST_PATH_IMAGE027
Representing a gaussian weight filter of size 3x 3.
Figure 364302DEST_PATH_IMAGE028
And (5) indicating the cooperative related operation.
The shape structure constraint is used as an auxiliary loss function and added into the original network to form end-to-end learning, and the total loss function equation is expressed as follows:
Figure 545885DEST_PATH_IMAGE029
wherein the loss function balance factor
Figure 548607DEST_PATH_IMAGE030
Are all set to 1.
Figure 448430DEST_PATH_IMAGE031
A loss function representing the RPN is shown,
Figure 14540DEST_PATH_IMAGE032
the loss function of the TRNN is represented,
Figure 316209DEST_PATH_IMAGE033
representing a cross entropy loss function in a literal mask partitioned network. These three functions are identical to the loss functions in Mask RCNN, except that the above-described loss functions are for the case of C = 2.
Step S320: and training the initial character mask segmentation network according to the shape structure constraint function to obtain a preset character mask segmentation network.
It should be noted that the initial word mask segmentation network may be a word mask segmentation network to be trained, which is preset by a user, and this is not limited in this embodiment.
In the third embodiment, the shape structure constraint function is determined according to the image to be processed, and the initial character mask segmentation network is trained according to the shape structure constraint function to obtain the preset character mask segmentation network, so that the reliability of the preset character mask segmentation network can be improved.
In addition, an embodiment of the present invention further provides a storage medium, where a text region boundary detection program is stored on the storage medium, and when executed by a processor, the text region boundary detection program implements the steps of the text region boundary detection method described above.
In addition, referring to fig. 5, an embodiment of the present invention further provides a text region boundary detection apparatus, where the text region boundary detection apparatus includes: the system comprises an acquisition module 10, a processing module 20, an analysis module 30, a detection module 40 and an adjustment module 50;
the acquiring module 10 is configured to acquire an image to be processed, and perform feature extraction on the image to be processed through a preset backbone network to obtain image features.
It should be noted that the image to be processed may be a scene image input by a user through a user interaction interface of the text region boundary detection device, or may also be a scene image input by a user through a terminal device that establishes a communication connection with the text region boundary detection device in advance, which is not limited in this embodiment.
The preset backbone network is an image feature extraction network preset by a user, and in this embodiment, a ResNet-101 network embedded with a deformed convolution is taken as an example for description, which is not limited in this embodiment.
In a specific implementation, for example, an image to be processed is acquired, and the ResNet-101 network embedded with deformation convolution is used as a backbone network to extract image features.
The processing module 20 is configured to determine an initial candidate region according to the image feature and a preset region suggestion network, and perform pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature.
It can be understood that, determining an initial candidate region according to the image feature and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature may be determining an initial candidate region according to the image feature and the preset region suggestion network, analyzing the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
In a specific implementation, for example, an image feature is processed through a Region suggestion Network (RPN) to generate a candidate Region, and in the process of mapping the candidate Region to a fixed size, a deformation offset is learned by using a position-aware deformation Region-of-Interest (drosi) pooling technique to generate a more accurate alignment feature. Here, two different fixed-size features are generated using two DROI pooling layers that do not share learning parameters
Figure 739100DEST_PATH_IMAGE001
The analysis module 30 is configured to analyze the first fixed feature through a preset text area adjustment network to obtain a feature analysis result.
It should be noted that the preset Text adjustment Network may be a regional adjustment Network preset by a user, and in this embodiment, a Text Region adjustment Network (TRRN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 126219DEST_PATH_IMAGE002
Input to a Text Region Reference Network (TRRN). The structure of the text region adjustment network is the same as that of the Mask RCNN, but the difference is that the text region adjustment network aims at the 2-category case. The TRRN is configured to obtain the confidence and the position offset of the adjusted text region.
The detection module 40 is configured to determine an initial text region boundary detection result according to a preset text mask segmentation network and the second fixed feature.
It should be noted that the preset Text Mask Segmentation Network may be a Segmentation Network preset by a user, and in this embodiment, a Text Mask Segmentation Network (TMSN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 496020DEST_PATH_IMAGE003
Input into a Text Mask Segmentation Network (TMSN). The structure of the character Mask segmentation network is consistent with that of the Mask RCNN, and the TMSN is used for obtaining segmentation results of scene characters in any shapes.
The adjusting module 50 is configured to adjust the initial text region boundary detection result according to the feature analysis result, so as to obtain a target text region boundary detection result.
It should be understood that, adjusting the initial text region boundary detection result according to the feature analysis result to obtain the target text region boundary detection result may be obtaining a confidence degree and a position offset from the feature analysis result, determining an initial text mask according to the initial text region boundary and the position offset, determining a text mask overlap rate according to the initial text mask, determining a target text mask according to the text mask overlap rate and the confidence degree, performing boundary detection on the target text mask to obtain a detection result, and determining the target text region boundary detection result according to the detection result.
Further, in order to improve the accuracy and reliability of the target word mask, the adjusting module 50 is further configured to determine a word mask overlap ratio according to the initial word mask, determine whether the word mask overlap ratio is greater than a preset threshold, rank the initial word mask according to the confidence when the word mask overlap ratio is greater than the preset threshold, obtain a ranking result, and screen the initial word mask according to the ranking result, so as to obtain the target word mask.
In a specific implementation, for example, the absolute position of the character region of any shape in the input image is obtained according to the divided binary image and the corresponding position of the character region. And calculating the overlapping rate mode of the two character masks as max (O/A, O/B) by a Non-Maximum Suppression (NMS) of the maximized intersection region, wherein O is the overlapping area of the two masks, and A and B are the areas of the two masks respectively. When the overlap ratio is greater than 0.8, one of the two masks with lower confidence is removed.
Compared with the existing method for detecting the scene characters only by researching the expression form of the characters with any shape or enhancing the feature expression, in the embodiment, the image to be processed is obtained by acquiring the image to be processed and extracting the features of the image to be processed through the preset backbone network, the image features are obtained, the initial candidate region is determined according to the image features and the preset region suggestion network, the initial candidate region is subjected to pooling processing to obtain the first fixed feature and the second fixed feature, the first fixed feature is analyzed through the preset character region adjustment network to obtain the feature analysis result, the network and the second fixed feature are divided according to the preset character mask to determine the initial character region boundary detection result, the initial character region boundary detection result is adjusted according to the feature analysis result to obtain the target character region boundary detection result, and the defect that the region boundary of the characters with any shape cannot be accurately identified in the prior art is overcome, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy and reliability of the regional boundary detection of the characters are improved, so that the requirement of scene character detection is met.
In an embodiment, the adjusting module 50 is further configured to obtain a confidence degree and a position offset from the feature analysis result, determine an initial word mask according to the initial word region boundary and the position offset, determine a word mask overlap rate according to the initial word mask, determine a target word mask according to the word mask overlap rate and the confidence degree, perform boundary detection on the target word mask, obtain a detection result, and determine a target word region boundary detection result according to the detection result;
in an embodiment, the adjusting module 50 is further configured to determine a word mask overlap ratio according to the initial word mask, determine whether the word mask overlap ratio is greater than a preset threshold, rank the initial word mask according to the confidence when the word mask overlap ratio is greater than the preset threshold, obtain a ranking result, and filter the initial word mask according to the ranking result to obtain a target word mask;
in an embodiment, the processing module 20 is further configured to determine an initial candidate region according to the image feature and a preset region suggestion network, analyze the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, perform pooling processing on the initial candidate region according to the deformation offset, and obtain a first fixed feature and a second fixed feature;
in an embodiment, the text region boundary detecting apparatus further includes: a training module;
the training module is used for acquiring an initial sample image, carrying out scale adjustment on the initial sample image to obtain a sample image to be processed, carrying out image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed, carrying out feature analysis on the subimage to be processed to obtain a positive sample subimage and a negative sample subimage, training an initial area suggestion network according to the positive sample subimage and the negative sample subimage, and acquiring a preset area establishment network;
in an embodiment, the training module is further configured to obtain an image size of the sub-image to be processed, search a threshold range corresponding to the image size, analyze the sub-image to be processed, obtain a text bounding box, obtain each boundary length of the text bounding box, and determine a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range;
in an embodiment, the training module is further configured to determine a shape structure constraint function according to the image to be processed, and train the initial word mask segmentation network according to the shape structure constraint function to obtain a preset word mask segmentation network.
Other embodiments or specific implementation manners of the text region boundary detection device according to the present invention may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, but rather the words first, second, third, etc. are to be interpreted as names.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1.一种文字区域边界检测方法,其特征在于,所述文字区域边界检测方法包括以下步骤:1. a text area boundary detection method, is characterized in that, described text area boundary detection method comprises the following steps: 获取待处理图像,并通过预设主干网络对所述待处理图像进行特征提取,获得图像特征;Obtaining an image to be processed, and performing feature extraction on the image to be processed through a preset backbone network to obtain image features; 根据所述图像特征以及预设区域建议网络确定初始候选区域,并对所述初始候选区域进行池化处理,获得第一固定特征和第二固定特征;Determine an initial candidate region according to the image feature and the preset region suggestion network, and perform a pooling process on the initial candidate region to obtain a first fixed feature and a second fixed feature; 通过预设文字区域调整网络对所述第一固定特征进行分析,获得特征分析结果;Analyze the first fixed feature through a preset text area adjustment network to obtain a feature analysis result; 根据预设文字掩码分割网络以及所述第二固定特征确定初始文字区域边界检测结果;Determine the initial text region boundary detection result according to the preset text mask segmentation network and the second fixed feature; 根据所述特征分析结果对所述初始文字区域边界检测结果进行调整,获得目标文字区域边界检测结果。The initial text area boundary detection result is adjusted according to the feature analysis result to obtain the target text area boundary detection result. 2.如权利要求1所述的文字区域边界检测方法,其特征在于,所述根据所述特征分析结果对所述初始文字区域边界检测结果进行调整,获得目标文字区域边界检测结果的步骤,具体包括:2. The text area boundary detection method according to claim 1, wherein the step of adjusting the initial text area boundary detection result according to the feature analysis result, and obtaining the target text area boundary detection result, specifically include: 从所述特征分析结果中获取置信度和位置偏移,并根据所述初始文字区域边界以及所述位置偏移确定初始文字掩码;Obtain confidence and position offset from the feature analysis result, and determine an initial text mask according to the initial text area boundary and the position offset; 根据所述初始文字掩码确定文字掩码重叠率,并根据所述文字掩码重叠率以及所述置信度确定目标文字掩码;Determine a text mask overlap rate according to the initial text mask, and determine a target text mask according to the text mask overlap rate and the confidence level; 对所述目标文字掩码进行边界检测,获得检测结果,并根据检测结果确定目标文字区域边界检测结果。Perform boundary detection on the target text mask to obtain detection results, and determine the boundary detection results of the target text area according to the detection results. 3.如权利要求2所述的文字区域边界检测方法,其特征在于,所述根据所述初始文字掩码确定文字掩码重叠率,并根据所述文字掩码重叠率以及所述置信度确定目标文字掩码的步骤,具体包括:3 . The method for detecting the boundary of a text area according to claim 2 , wherein the text mask overlap ratio is determined according to the initial text mask, and the text mask overlap ratio is determined according to the text mask overlap ratio and the confidence level. 4 . The steps of the target text mask, including: 根据所述初始文字掩码确定文字掩码重叠率,并判断文字掩码重叠率是否大于预设阈值;Determine the text mask overlap ratio according to the initial text mask, and determine whether the text mask overlap ratio is greater than a preset threshold; 在所述文字掩码重叠率大于预设阈值时,根据所述置信度对所述初始文字掩码进行排序,获得排序结果;When the overlap ratio of the text masks is greater than a preset threshold, sorting the initial text masks according to the confidence, to obtain a sorting result; 根据所述排序结果对所述初始文字掩码进行筛选,获得目标文字掩码。The initial text mask is filtered according to the sorting result to obtain a target text mask. 4.如权利要求1所述的文字区域边界检测方法,其特征在于,所述根据所述图像特征以及预设区域建议网络确定初始候选区域,并对所述初始候选区域进行池化处理,获得第一固定特征和第二固定特征的步骤,具体包括:4. The method for detecting the boundary of a text region according to claim 1, wherein the proposed network determines an initial candidate region according to the image features and a preset region, and performs a pooling process on the initial candidate region to obtain The steps of the first fixing feature and the second fixing feature specifically include: 根据所述图像特征以及预设区域建议网络确定初始候选区域;Determine the initial candidate region according to the image feature and the preset region suggestion network; 通过预设形变兴趣区域池化模型对所述初始候选区域进行分析,获得形变偏移量;The initial candidate region is analyzed by a preset deformation interest region pooling model to obtain the deformation offset; 根据所述形变偏移量对所述初始候选区域进行池化处理,获得第一固定特征和第二固定特征。The initial candidate region is pooled according to the deformation offset to obtain a first fixed feature and a second fixed feature. 5.如权利要求1所述的文字区域边界检测方法,其特征在于,所述获取待处理图像,并通过预设主干网络对所述待处理图像进行特征提取,获得图像特征的步骤之前,所述文字区域边界检测方法还包括:5. The text area boundary detection method according to claim 1, characterized in that, before the step of obtaining the image to be processed, and performing feature extraction on the image to be processed through a preset backbone network, before the step of obtaining the image features, the The text region boundary detection method also includes: 获取初始样本图像,并对所述初始样本图像进行尺度调整,获得待处理样本图像;obtaining an initial sample image, and performing scale adjustment on the initial sample image to obtain a sample image to be processed; 通过预设滑动窗口对所述待处理样本图像进行图像提取,获得待处理样本子图像;Perform image extraction on the to-be-processed sample image through a preset sliding window to obtain a to-be-processed sample sub-image; 对所述待处理子图像进行特征分析,获得正例样本子图像和负例样本子图像;Perform feature analysis on the sub-images to be processed to obtain a positive sample sub-image and a negative sample sub-image; 根据所述正例样本子图像以及所述负例样本子图像对初始区域建议网络进行训练,获得预设区域建立网络。The initial region proposal network is trained according to the positive example sample sub-image and the negative example sample sub-image, and a preset region establishment network is obtained. 6.如权利要求5所述的文字区域边界检测方法,其特征在于,所述对所述待处理子图像进行特征分析,获得正例样本子图像和负例样本子图像的步骤,具体包括:6. The text area boundary detection method according to claim 5, wherein the step of performing feature analysis on the sub-image to be processed to obtain a positive sample sub-image and a negative sample sub-image specifically comprises: 获取所述待处理子图像的图像尺寸,并查找所述图像尺寸对应的阈值范围;Obtain the image size of the sub-image to be processed, and find the threshold range corresponding to the image size; 对所述待处理子图像进行分析,获得文字包围框;Analyzing the sub-image to be processed to obtain a text bounding box; 获取所述文字包围框的各边界长度,并根据所述边界长度以及所述阈值范围确定正例样本子图像和负例样本子图像。Each boundary length of the text bounding box is acquired, and a positive example sample sub-image and a negative example sample sub-image are determined according to the boundary length and the threshold range. 7.如权利要求1-6中任一项所述的文字区域边界检测方法,其特征在于,所述根据预设文字掩码分割网络以及所述第二固定特征确定初始文字区域边界检测结果的步骤之前,所述文字区域边界检测方法还包括:7. The text region boundary detection method according to any one of claims 1 to 6, wherein the method for determining an initial text region boundary detection result according to a preset text mask segmentation network and the second fixed feature Before the step, the text area boundary detection method further includes: 根据所述待处理图像确定形状结构约束函数;Determine a shape structure constraint function according to the to-be-processed image; 根据所述形状结构约束函数对初始文字掩码分割网络进行训练,获得预设文字掩码分割网络。The initial word mask segmentation network is trained according to the shape structure constraint function to obtain a preset word mask segmentation network. 8.一种文字区域边界检测设备,其特征在于,所述文字区域边界检测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的文字区域边界检测程序,所述文字区域边界检测程序被所述处理器执行时实现如权利要求1至7中任一项所述的文字区域边界检测方法的步骤。8. A text region boundary detection device, wherein the text region boundary detection device comprises: a memory, a processor, and a text region boundary detection program that is stored on the memory and can be run on the processor, When the text area boundary detection program is executed by the processor, the steps of the text area boundary detection method according to any one of claims 1 to 7 are implemented. 9.一种存储介质,其特征在于,所述存储介质上存储有文字区域边界检测程序,所述文字区域边界检测程序被处理器执行时实现如权利要求1至7中任一项所述的文字区域边界检测方法的步骤。9. A storage medium, wherein the storage medium stores a text area boundary detection program, and when the text area boundary detection program is executed by a processor, the method according to any one of claims 1 to 7 is realized. The steps of the text area boundary detection method. 10.一种文字区域边界检测装置,其特征在于,所述文字区域边界检测装置包括:获取模块、处理模块、分析模块、检测模块和调整模块;10. A text region boundary detection device, characterized in that the text region boundary detection device comprises: an acquisition module, a processing module, an analysis module, a detection module and an adjustment module; 所述获取模块,用于获取待处理图像,并通过预设主干网络对所述待处理图像进行特征提取,获得图像特征;The acquisition module is configured to acquire the image to be processed, and perform feature extraction on the image to be processed through a preset backbone network to obtain image features; 所述处理模块,用于根据所述图像特征以及预设区域建议网络确定初始候选区域,并对所述初始候选区域进行池化处理,获得第一固定特征和第二固定特征;The processing module is configured to determine an initial candidate region according to the image feature and the preset region suggestion network, and perform pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature; 所述分析模块,用于通过预设文字区域调整网络对所述第一固定特征进行分析,获得特征分析结果;The analysis module is configured to analyze the first fixed feature through a preset text area adjustment network to obtain a feature analysis result; 所述检测模块,用于根据预设文字掩码分割网络以及所述第二固定特征确定初始文字区域边界检测结果;The detection module is configured to determine the initial text region boundary detection result according to the preset text mask segmentation network and the second fixed feature; 所述调整模块,用于根据所述特征分析结果对所述初始文字区域边界检测结果进行调整,获得目标文字区域边界检测结果。The adjustment module is configured to adjust the initial text region boundary detection result according to the feature analysis result to obtain the target text region boundary detection result.
CN202110190870.XA 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device Active CN112560857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110190870.XA CN112560857B (en) 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110190870.XA CN112560857B (en) 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device

Publications (2)

Publication Number Publication Date
CN112560857A true CN112560857A (en) 2021-03-26
CN112560857B CN112560857B (en) 2021-06-08

Family

ID=75034372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110190870.XA Active CN112560857B (en) 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device

Country Status (1)

Country Link
CN (1) CN112560857B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863431A (en) * 2022-04-14 2022-08-05 中国银行股份有限公司 Text detection method, device and equipment
CN115035377A (en) * 2022-06-15 2022-09-09 天津大学 Saliency detection network system based on dual-stream encoding and interactive decoding
CN119942131A (en) * 2025-04-08 2025-05-06 四川科莫生医疗科技有限公司 An instance segmentation method based on edge attention enhancement

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794246A (en) * 2015-05-15 2015-07-22 百度在线网络技术(北京)有限公司 Information search method and information search device
CN108062531A (en) * 2017-12-25 2018-05-22 南京信息工程大学 A kind of video object detection method that convolutional neural networks are returned based on cascade
CN108062547A (en) * 2017-12-13 2018-05-22 北京小米移动软件有限公司 Character detecting method and device
CN108470172A (en) * 2017-02-23 2018-08-31 阿里巴巴集团控股有限公司 A kind of text information identification method and device
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 Detection and Recognition Method of Curved Characters in Natural Scene Images
KR102030628B1 (en) * 2019-04-04 2019-10-10 (주)아이엠시티 Recognizing method and system of vehicle license plate based convolutional neural network
WO2020005731A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Text entity detection and recognition from images
CN111553347A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 A scene text detection method for any angle
CN111553349A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text positioning and identifying method based on full convolution network
CN111724401A (en) * 2020-05-08 2020-09-29 华中科技大学 An Image Segmentation Method and System Based on Boundary Constrained Cascade U-Net
CN112149620A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Method for constructing natural scene character region detection model based on no anchor point

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794246A (en) * 2015-05-15 2015-07-22 百度在线网络技术(北京)有限公司 Information search method and information search device
CN108470172A (en) * 2017-02-23 2018-08-31 阿里巴巴集团控股有限公司 A kind of text information identification method and device
CN108062547A (en) * 2017-12-13 2018-05-22 北京小米移动软件有限公司 Character detecting method and device
CN108062531A (en) * 2017-12-25 2018-05-22 南京信息工程大学 A kind of video object detection method that convolutional neural networks are returned based on cascade
WO2020005731A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Text entity detection and recognition from images
KR102030628B1 (en) * 2019-04-04 2019-10-10 (주)아이엠시티 Recognizing method and system of vehicle license plate based convolutional neural network
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 Detection and Recognition Method of Curved Characters in Natural Scene Images
CN111553347A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 A scene text detection method for any angle
CN111553349A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text positioning and identifying method based on full convolution network
CN111724401A (en) * 2020-05-08 2020-09-29 华中科技大学 An Image Segmentation Method and System Based on Boundary Constrained Cascade U-Net
CN112149620A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Method for constructing natural scene character region detection model based on no anchor point

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863431A (en) * 2022-04-14 2022-08-05 中国银行股份有限公司 Text detection method, device and equipment
CN115035377A (en) * 2022-06-15 2022-09-09 天津大学 Saliency detection network system based on dual-stream encoding and interactive decoding
CN115035377B (en) * 2022-06-15 2024-08-23 天津大学 Significance detection network system based on double-flow coding and interactive decoding
CN119942131A (en) * 2025-04-08 2025-05-06 四川科莫生医疗科技有限公司 An instance segmentation method based on edge attention enhancement

Also Published As

Publication number Publication date
CN112560857B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
WO2021147563A1 (en) Object detection method and apparatus, electronic device, and computer readable storage medium
CN108229509B (en) Method and device for identifying object class and electronic equipment
CN112560857B (en) Character area boundary detection method, equipment, storage medium and device
CN110147774B (en) Table format picture layout analysis method and computer storage medium
CN105574513B (en) Character detection method and device
CN113255915B (en) Knowledge distillation method, device, equipment and medium based on structured instance graph
US8019164B2 (en) Apparatus, method and program product for matching with a template
EP3101594A1 (en) Saliency information acquisition device and saliency information acquisition method
JP5298831B2 (en) Image processing apparatus and program
CN107784282A (en) The recognition methods of object properties, apparatus and system
CN111814905A (en) Target detection method, target detection device, computer equipment and storage medium
CN115375917B (en) Target edge feature extraction method, device, terminal and storage medium
CN110442719B (en) Text processing method, device, equipment and storage medium
CN112597918A (en) Text detection method and device, electronic equipment and storage medium
CN114494775A (en) Video segmentation method, device, device and storage medium
CN106548114B (en) Image processing method, device and computer-readable medium
CN113128604A (en) Page element identification method and device, electronic equipment and storage medium
US20180061078A1 (en) Image processing device, image processing method, and non-transitory computer-readable recording medium
CN115223173A (en) Object identification method and device, electronic equipment and storage medium
CN116228644A (en) Image detection method, electronic device and storage medium
CN113516609A (en) Split screen video detection method and device, computer equipment and storage medium
JP6202938B2 (en) Image recognition apparatus and image recognition method
CN111325194B (en) Character recognition method, device and equipment and storage medium
JP2018180646A (en) Object candidate area estimation device, object candidate area estimation method, and object candidate area estimation program
CN115775386A (en) User interface component identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant