CN112560857B - Character area boundary detection method, equipment, storage medium and device - Google Patents

Character area boundary detection method, equipment, storage medium and device Download PDF

Info

Publication number
CN112560857B
CN112560857B CN202110190870.XA CN202110190870A CN112560857B CN 112560857 B CN112560857 B CN 112560857B CN 202110190870 A CN202110190870 A CN 202110190870A CN 112560857 B CN112560857 B CN 112560857B
Authority
CN
China
Prior art keywords
image
boundary detection
initial
preset
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110190870.XA
Other languages
Chinese (zh)
Other versions
CN112560857A (en
Inventor
操晓春
代朋纹
张华�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202110190870.XA priority Critical patent/CN112560857B/en
Publication of CN112560857A publication Critical patent/CN112560857A/en
Application granted granted Critical
Publication of CN112560857B publication Critical patent/CN112560857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a character region boundary detection method, a device, a storage medium and a device, compared with the prior scene character detection method only by exploring the expression form of characters with any shape or enhancing the characteristic expression, the invention extracts the characteristics of an image to be processed through a preset backbone network to obtain the image characteristics, determines an initial candidate region according to the image characteristics and a preset region suggestion network, performs pooling processing on the initial candidate region to obtain a first fixed characteristic and a second fixed characteristic, analyzes the first fixed characteristic through a preset character region adjustment network to obtain a characteristic analysis result, segments the network according to a preset character mask, determines a target character region boundary detection result according to the second fixed characteristic and the characteristic analysis result, and overcomes the defect that the region boundary of characters with any shape cannot be accurately identified in the prior art, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy of the regional boundary detection of the characters is improved.

Description

Character area boundary detection method, equipment, storage medium and device
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a method, an apparatus, a storage medium, and a device for detecting a text region boundary.
Background
In the prior art, in order to detect characters in an arbitrary shape scene, an effort is usually made to explore expression forms of the characters in the arbitrary shape, for example, how to better learn attributes of pixel points or character segments and relationships between the pixel points or the character segments to distinguish character regions, or to enhance feature expression, for example, combining features of different granularities or learning context features.
However, in the prior art, the boundary of the region of the characters in any shape cannot be accurately identified, so that the detection accuracy rate of the characters in any shape scene is low, and the reliability is poor.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, equipment, a storage medium and a device for detecting the boundary of a text area, and aims to solve the technical problem of how to optimize the text area boundary detection process.
In order to achieve the above object, the present invention provides a text region boundary detection method, which includes the following steps:
acquiring an image to be processed, and extracting the features of the image to be processed through a preset backbone network to obtain image features;
determining an initial candidate region according to the image features and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature;
analyzing the first fixed characteristic through a preset character area adjusting network to obtain a characteristic analysis result;
determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic;
and adjusting the initial character area boundary detection result according to the characteristic analysis result to obtain a target character area boundary detection result.
Preferably, the step of adjusting the initial text region boundary detection result according to the feature analysis result to obtain a target text region boundary detection result specifically includes:
obtaining confidence and position offset from the feature analysis result, and determining an initial character mask according to the initial character region boundary and the position offset;
determining a word mask overlapping rate according to the initial word mask, and determining a target word mask according to the word mask overlapping rate and the confidence;
and carrying out boundary detection on the target character mask to obtain a detection result, and determining a target character area boundary detection result according to the detection result.
Preferably, the step of determining a word mask overlap ratio according to the initial word mask and determining a target word mask according to the word mask overlap ratio and the confidence degree specifically includes:
determining the word mask overlapping rate according to the initial word mask, and judging whether the word mask overlapping rate is greater than a preset threshold value;
when the word mask overlapping rate is larger than a preset threshold, sequencing the initial word mask according to the confidence coefficient to obtain a sequencing result;
and screening the initial character mask according to the sorting result to obtain a target character mask.
Preferably, the step of determining an initial candidate region according to the image feature and a preset region suggested network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature specifically includes:
determining an initial candidate area according to the image characteristics and a preset area suggestion network;
analyzing the initial candidate region through a preset deformation interest region pooling model to obtain deformation offset;
and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
Preferably, before the step of obtaining the image to be processed and performing feature extraction on the image to be processed through a preset backbone network to obtain image features, the text region boundary detection method further includes:
obtaining an initial sample image, and carrying out scale adjustment on the initial sample image to obtain a sample image to be processed;
performing image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed;
performing feature analysis on the sub-image to be processed to obtain a positive sample sub-image and a negative sample sub-image;
and training an initial area suggested network according to the positive sample subimage and the negative sample subimage to obtain a preset area established network.
Preferably, the step of performing feature analysis on the sub-image to be processed to obtain a positive example sample sub-image and a negative example sample sub-image specifically includes:
acquiring the image size of the sub-image to be processed, and searching a threshold range corresponding to the image size;
analyzing the sub-image to be processed to obtain a character enclosing frame;
and acquiring the boundary length of the text bounding box, and determining a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range.
Preferably, before the step of dividing the network according to the preset word mask and determining the initial word region boundary detection result according to the second fixed feature, the word region boundary detection method further includes:
determining a shape structure constraint function according to the image to be processed;
and training the initial character mask segmentation network according to the shape structure constraint function to obtain a preset character mask segmentation network.
In addition, in order to achieve the above object, the present invention further provides a text region boundary detection apparatus, which includes a memory, a processor, and a text region boundary detection program stored in the memory and executable on the processor, wherein the text region boundary detection program is configured to implement the steps of the text region boundary detection method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium, on which a text region boundary detection program is stored, wherein the text region boundary detection program, when executed by a processor, implements the steps of the text region boundary detection method as described above.
In order to achieve the above object, the present invention further provides a character region boundary detection device, including: the device comprises an acquisition module, a processing module, an analysis module, a detection module and an adjustment module;
the acquisition module is used for acquiring an image to be processed and extracting the characteristics of the image to be processed through a preset backbone network to obtain image characteristics;
the processing module is used for determining an initial candidate region according to the image features and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature;
the analysis module is used for analyzing the first fixed characteristic through a preset character area adjustment network to obtain a characteristic analysis result;
the detection module is used for determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic;
and the adjusting module is used for adjusting the initial character region boundary detection result according to the characteristic analysis result to obtain a target character region boundary detection result.
Compared with the prior method for detecting the scene characters only by researching the expression form of the characters with any shape or enhancing the characteristic expression, the method obtains the image characteristics by obtaining the image to be processed and extracting the characteristics of the image to be processed through the preset main network, determines the initial candidate area according to the image characteristics and the preset area suggestion network, performs pooling processing on the initial candidate area to obtain the first fixed characteristic and the second fixed characteristic, analyzes the first fixed characteristic through the preset character area adjustment network to obtain the characteristic analysis result, divides the network according to the preset character mask and determines the initial character area boundary detection result according to the second fixed characteristic, adjusts the initial character area boundary detection result according to the characteristic analysis result to obtain the target character area boundary detection result, and overcomes the defect that the area boundary of the characters with any shape can not be accurately identified in the prior art, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy and reliability of the regional boundary detection of the characters are improved, so that the requirement of scene character detection is met.
Drawings
Fig. 1 is a schematic structural diagram of a text region boundary detection device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a text region boundary detection method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a text region boundary detection method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a text region boundary detection method according to a third embodiment of the present invention;
fig. 5 is a block diagram of a first embodiment of a text region boundary detection apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a text region boundary detection device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the text region boundary detecting apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the text region boundary detection apparatus and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in FIG. 1, memory 1005, identified as one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a text region boundary detection program.
In the text area boundary detection device shown in fig. 1, the network interface 1004 is mainly used for connecting a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the text area boundary detection device calls a text area boundary detection program stored in the memory 1005 through the processor 1001, and executes the text area boundary detection method provided by the embodiment of the present invention.
Based on the hardware structure, the embodiment of the character area boundary detection method is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a text region boundary detection method according to a first embodiment of the present invention.
Step S10: acquiring an image to be processed, and extracting the features of the image to be processed through a preset backbone network to obtain image features.
It should be understood that the main execution body of the embodiment is the text region boundary detection device, where the text region boundary detection device may be an electronic device such as a computer and a server, or may also be other devices that can achieve the same or similar functions.
It should be noted that the image to be processed may be a scene image input by a user through a user interaction interface of the text region boundary detection device, or may also be a scene image input by a user through a terminal device that establishes a communication connection with the text region boundary detection device in advance, which is not limited in this embodiment.
The preset backbone network is an image feature extraction network preset by a user, and in this embodiment, a ResNet-101 network embedded with a deformed convolution is taken as an example for description, which is not limited in this embodiment.
In a specific implementation, for example, an image to be processed is acquired, and the ResNet-101 network embedded with deformation convolution is used as a backbone network to extract image features.
Step S20: and determining an initial candidate region according to the image features and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature.
It can be understood that, determining an initial candidate region according to the image feature and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature may be determining an initial candidate region according to the image feature and the preset region suggestion network, analyzing the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
In a specific implementation, for example, an image feature is processed through a Region suggestion Network (RPN) to generate a candidate Region, and in the process of mapping the candidate Region to a fixed size, a deformation offset is learned by using a position-aware deformation Region-of-Interest (drosi) pooling technique to generate a more accurate alignment feature. Here, two different fixed-size features are generated using two DROI pooling layers that do not share learning parameters
Figure 389152DEST_PATH_IMAGE001
Step S30: and analyzing the first fixed characteristic through a preset character area adjusting network to obtain a characteristic analysis result.
It should be noted that the preset Text adjustment Network may be a regional adjustment Network preset by a user, and in this embodiment, a Text Region adjustment Network (TRRN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 24664DEST_PATH_IMAGE002
Input to a Text Region Reference Network (TRRN). The structure of the text region adjustment network is the same as that of the Mask RCNN, but the difference is that the text region adjustment network aims at the 2-category case. The TRRN is configured to obtain the confidence and the position offset of the adjusted text region.
Step S40: and determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic.
It should be noted that the preset Text Mask Segmentation Network may be a Segmentation Network preset by a user, and in this embodiment, a Text Mask Segmentation Network (TMSN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 291697DEST_PATH_IMAGE003
Input into a Text Mask Segmentation Network (TMSN). The structure of the character Mask segmentation network is consistent with that of the Mask RCNN, and the TMSN is used for obtaining segmentation results of scene characters in any shapes.
Step S50: and adjusting the initial character area boundary detection result according to the characteristic analysis result to obtain a target character area boundary detection result.
It should be understood that, adjusting the initial text region boundary detection result according to the feature analysis result to obtain the target text region boundary detection result may be obtaining a confidence degree and a position offset from the feature analysis result, determining an initial text mask according to the initial text region boundary and the position offset, determining a text mask overlap rate according to the initial text mask, determining a target text mask according to the text mask overlap rate and the confidence degree, performing boundary detection on the target text mask to obtain a detection result, and determining the target text region boundary detection result according to the detection result.
Further, in order to improve the accuracy and reliability of the target word mask, the determining the word mask overlap ratio according to the initial word mask and determining the target word mask according to the word mask overlap ratio and the confidence includes:
determining a word mask overlapping rate according to the initial word mask, judging whether the word mask overlapping rate is greater than a preset threshold, when the word mask overlapping rate is greater than the preset threshold, sorting the initial word mask according to the confidence to obtain a sorting result, and screening the initial word mask according to the sorting result to obtain a target word mask.
In a specific implementation, for example, the absolute position of the character region of any shape in the input image is obtained according to the divided binary image and the corresponding position of the character region. And calculating the overlapping rate mode of the two character masks as max (O/A, O/B) by a Non-Maximum Suppression (NMS) of the maximized intersection region, wherein O is the overlapping area of the two masks, and A and B are the areas of the two masks respectively. When the overlap ratio is greater than 0.8, one of the two masks with lower confidence is removed.
Compared with the existing method for detecting the scene characters only by researching the expression form of the characters with any shape or enhancing the feature expression, in the embodiment, the image to be processed is obtained by acquiring the image to be processed and extracting the features of the image to be processed through the preset backbone network, the image features are obtained, the initial candidate region is determined according to the image features and the preset region suggestion network, the initial candidate region is subjected to pooling processing to obtain the first fixed feature and the second fixed feature, the first fixed feature is analyzed through the preset character region adjustment network to obtain the feature analysis result, the network and the second fixed feature are divided according to the preset character mask to determine the initial character region boundary detection result, the initial character region boundary detection result is adjusted according to the feature analysis result to obtain the target character region boundary detection result, and the defect that the region boundary of the characters with any shape cannot be accurately identified in the prior art is overcome, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy and reliability of the regional boundary detection of the characters are improved, so that the requirement of scene character detection is met.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the text region boundary detection method according to the present invention, and the second embodiment of the text region boundary detection method according to the present invention is proposed based on the first embodiment illustrated in fig. 2.
In the second embodiment, the step S20 includes:
step S201: and determining an initial candidate region according to the image characteristics and a preset region suggestion network.
It should be noted that the preset area recommendation Network may be a processing Network preset by a user, and in this embodiment, an area recommendation Network (RPN) is taken as an example for description, which is not limited in this embodiment.
In a specific implementation, for example, the image features are processed through a Region suggestion Network (RPN) to generate candidate regions.
Step S202: and analyzing the initial candidate region through a preset deformation interest region pooling model to obtain deformation offset.
It should be noted that the preset deformed Interest Region pooling model may be a pooling processing model preset by a user, and in this embodiment, a position-sensitive deformed Interest Region (drosi) pooling model is taken as an example for description, which is not limited in this embodiment.
Step S203: and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
In particular implementations, for example, in the case of characters of arbitrary sizeIn the process of mapping the Region into a fixed size, learning the deformation offset by using a position-aware deformation Region-of-Interest (DROI) pooling technology, so as to generate more accurate alignment characteristics. Here, two different fixed-size features are generated using two DROI pooling layers that do not share learning parameters
Figure 490597DEST_PATH_IMAGE001
In a second embodiment, an initial candidate region is determined according to the image features and a preset region suggestion network, the initial candidate region is analyzed through a preset deformation interest region pooling model to obtain a deformation offset, the initial candidate region is pooled according to the deformation offset to obtain a first fixed feature and a second fixed feature, and therefore the candidate region with any scale can be mapped into the feature with a fixed size.
In the second embodiment, the step S50 includes:
step S501: and obtaining confidence coefficient and position offset from the feature analysis result, and determining an initial word mask according to the initial word region boundary and the position offset.
It is understood that the obtaining of the confidence and the position offset from the feature analysis result may be performing feature extraction on the feature analysis result, obtaining a text feature, and determining the confidence and the position offset according to the text feature.
Step S502: and determining a word mask overlapping rate according to the initial word mask, and determining a target word mask according to the word mask overlapping rate and the confidence.
It can be understood that, determining a word mask overlap ratio according to the initial word mask, and determining a target word mask according to the word mask overlap ratio and the confidence level may be determining a word mask overlap ratio according to the initial word mask, and determining whether the word mask overlap ratio is greater than a preset threshold, when the word mask overlap ratio is greater than the preset threshold, sorting the initial word mask according to the confidence level to obtain a sorting result, and screening the initial word mask according to the sorting result to obtain the target word mask.
Further, in order to improve the accuracy and reliability of the target word mask, the step S502 includes:
determining the word mask overlapping rate according to the initial word mask, and judging whether the word mask overlapping rate is greater than a preset threshold value;
when the word mask overlapping rate is larger than a preset threshold, sequencing the initial word mask according to the confidence coefficient to obtain a sequencing result;
and screening the initial character mask according to the sorting result to obtain a target character mask.
In a specific implementation, for example, the absolute position of the character region of any shape in the input image is obtained according to the divided binary image and the corresponding position of the character region. And calculating the overlapping rate mode of the two character masks as max (O/A, O/B) by a Non-Maximum Suppression (NMS) of the maximized intersection region, wherein O is the overlapping area of the two masks, and A and B are the areas of the two masks respectively. When the overlap ratio is greater than 0.8, one of the two masks with lower confidence is removed.
Step S503: and carrying out boundary detection on the target character mask to obtain a detection result, and determining a target character area boundary detection result according to the detection result.
It can be understood that, the boundary detection of the target word mask is performed to obtain a detection result, and the determination of the boundary detection result of the target word area according to the detection result may be that the boundary detection of the target word mask is performed, the boundary of the target word mask is determined according to the detection result, and the boundary of the target word mask is used as the boundary detection result of the target word area.
In a second embodiment, the confidence and the position offset are obtained from the feature analysis result, an initial word mask is determined according to the initial word region boundary and the position offset, a word mask overlapping rate is determined according to the initial word mask, a target word mask is determined according to the word mask overlapping rate and the confidence, boundary detection is performed on the target word mask, a detection result is obtained, and a target word region boundary detection result is determined according to the detection result, so that the accuracy of the target word region boundary detection result can be improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating a text region boundary detection method according to a third embodiment of the present invention, and the third embodiment of the text region boundary detection method is proposed based on the first embodiment shown in fig. 2.
In the third embodiment, before the step S20, the method further includes:
step S110: obtaining an initial sample image, and carrying out scale adjustment on the initial sample image to obtain a sample image to be processed.
It should be noted that the initial sample image may be a sample image input by a user through a text region boundary detection device, which is not limited in this embodiment.
In a specific implementation, for example, the scaling of the initial sample image is performed, and the obtaining of the sample image to be processed may be the scaling of the initial sample image to obtain three scales
Figure 159476DEST_PATH_IMAGE004
To be processed sample image.
Step S120: and carrying out image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed.
It should be noted that the preset sliding window may be a sliding window preset by a user, and in this embodiment, a sliding window of 512 × 512 is taken as an example for description.
In a specific implementation, for example, for each scale of the image, a 512x512 window is slid to generate the sub-image.
Step S130: and performing characteristic analysis on the sub-image to be processed to obtain a positive sample sub-image and a negative sample sub-image.
It can be understood that, the step of performing feature analysis on the to-be-processed sub-image to obtain the positive example sample sub-image and the negative example sample sub-image may be to obtain an image size of the to-be-processed sub-image, find a threshold range corresponding to the image size, analyze the to-be-processed sub-image to obtain a text enclosure frame, obtain each boundary length of the text enclosure frame, and determine the positive example sample sub-image and the negative example sample sub-image according to the boundary length and the threshold range.
Further, the step S03 includes:
acquiring the image size of the sub-image to be processed, and searching a threshold range corresponding to the image size;
analyzing the sub-image to be processed to obtain a character enclosing frame;
and acquiring the boundary length of the text bounding box, and determining a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range.
It is understood that the threshold range corresponding to the image size may be searched in a preset mapping table. The preset mapping table includes a corresponding relationship between the image size and the threshold range, which is not limited in this embodiment.
In a specific implementation, for example, a range is designed for each scale
Figure 215157DEST_PATH_IMAGE005
When the shortest side of the text bounding box falls within the range, the texts will participate in the training process, and we will write the texts as
Figure 969486DEST_PATH_IMAGE006
. Then, cover
Figure 706498DEST_PATH_IMAGE006
The largest number of sub-images will be selected as the positive example sub-images
Figure 229883DEST_PATH_IMAGE007
. To select the negative example sub-image, a negative example sub-image mining technique is employed. Specifically, the generated normal example image is first utilized
Figure 971312DEST_PATH_IMAGE007
Training a regional recommendation Network (RPN) generates candidate boxes. Then remove the quilt
Figure 212937DEST_PATH_IMAGE007
Candidate frame for covering when the sub-image area is covered
Figure 488061DEST_PATH_IMAGE008
Candidate frames in the range, these sub-images are referred to as negative example images at that scale
Figure 131532DEST_PATH_IMAGE009
. In the learning process, the size of each sub-image region is 512x512, and the size of each minipatch is 10, where the ratio of positive and negative example images is 4: 1. Since many positive example sub-pictures contain only a small number of valid words, it results in a limited number of positive samples in the RPN. The number of positive samples in the RPN is increased by the segment of the text. Specifically, when the overlap ratio of the prior box and the valid text area is greater than a threshold value of 0.7, and the horizontal distance of the overlap area is not less than 1/3 of the horizontal distance of the entire text area, the prior box can be used as a positive sample. Furthermore, for invalid text in the sub-image, we use the same method to obtain the prior frame, which is removed from the negative sample to reduce the ambiguity of the negative sample.
Step S140: and training an initial area suggested network according to the positive sample subimage and the negative sample subimage to obtain a preset area established network.
It should be noted that the initial area recommendation network may be an area recommendation network to be trained, which is preset by a user, and this embodiment is not limited thereto.
In a third embodiment, an initial sample image is obtained, the initial sample image is subjected to scale adjustment to obtain a sample image to be processed, the sample image to be processed is subjected to image extraction through a preset sliding window to obtain a sample sub-image to be processed, the sub-image to be processed is subjected to feature analysis to obtain a positive sample sub-image and a negative sample sub-image, an initial area recommendation network is trained according to the positive sample sub-image and the negative sample sub-image to obtain a preset area establishment network, and therefore the generalization capability of the area recommendation network can be improved under the condition of limited data.
In the third embodiment, before the step S40, the method further includes:
step S310: and determining a shape structure constraint function according to the image to be processed.
In a particular implementation, for example, determining the Shape Structure Constraint function from the image to be processed may be for Shape Structure Constraints (SSCs) that encourage similarity between network-generated text regions and text region truth values, and between network-generated background regions and background region truth values. The shape structure constraint as an auxiliary function favors the global perception of the text area of the network over the popular pixel-level cross entropy loss function. The loss function is calculated as follows:
Figure 529015DEST_PATH_IMAGE010
where C represents a category, we set C =2, i.e. both text and background.
Figure 257937DEST_PATH_IMAGE011
A matrix averaging operation is represented.
Figure 336751DEST_PATH_IMAGE012
Representing multiplication of corresponding elements of the matrix.
Figure 834729DEST_PATH_IMAGE013
Figure 419425DEST_PATH_IMAGE014
The factors representing the stable fractional division are set in the experiment respectively
Figure 370063DEST_PATH_IMAGE015
,
Figure 252569DEST_PATH_IMAGE016
Figure 605053DEST_PATH_IMAGE017
Figure 609918DEST_PATH_IMAGE018
The average graphs of the class c prediction graph and the truth graph are respectively shown.
Figure 47852DEST_PATH_IMAGE019
Figure 468469DEST_PATH_IMAGE020
The variance diagrams of the class c prediction diagram and the truth diagram are respectively shown.
Figure 941039DEST_PATH_IMAGE021
Showing a covariance graph between the class c prediction graph and the truth graph. The calculation method is as follows:
Figure 366073DEST_PATH_IMAGE022
Figure 291304DEST_PATH_IMAGE023
Figure 515612DEST_PATH_IMAGE024
wherein
Figure 577109DEST_PATH_IMAGE025
Figure 923776DEST_PATH_IMAGE026
The prediction graph and the truth graph of the class c are respectively, and the size of the prediction graph and the truth graph is 28x 28.
Figure 336303DEST_PATH_IMAGE027
Representing a gaussian weight filter of size 3x 3.
Figure 364302DEST_PATH_IMAGE028
And (5) indicating the cooperative related operation.
The shape structure constraint is used as an auxiliary loss function and added into the original network to form end-to-end learning, and the total loss function equation is expressed as follows:
Figure 545885DEST_PATH_IMAGE029
wherein the loss function balance factor
Figure 548607DEST_PATH_IMAGE030
Are all set to 1.
Figure 448430DEST_PATH_IMAGE031
A loss function representing the RPN is shown,
Figure 14540DEST_PATH_IMAGE032
the loss function of the TRNN is represented,
Figure 316209DEST_PATH_IMAGE033
representing a cross entropy loss function in a literal mask partitioned network. These three functions are identical to the loss functions in Mask RCNN, except that the above-described loss functions are for the case of C = 2.
Step S320: and training the initial character mask segmentation network according to the shape structure constraint function to obtain a preset character mask segmentation network.
It should be noted that the initial word mask segmentation network may be a word mask segmentation network to be trained, which is preset by a user, and this is not limited in this embodiment.
In the third embodiment, the shape structure constraint function is determined according to the image to be processed, and the initial character mask segmentation network is trained according to the shape structure constraint function to obtain the preset character mask segmentation network, so that the reliability of the preset character mask segmentation network can be improved.
In addition, an embodiment of the present invention further provides a storage medium, where a text region boundary detection program is stored on the storage medium, and when executed by a processor, the text region boundary detection program implements the steps of the text region boundary detection method described above.
In addition, referring to fig. 5, an embodiment of the present invention further provides a text region boundary detection apparatus, where the text region boundary detection apparatus includes: the system comprises an acquisition module 10, a processing module 20, an analysis module 30, a detection module 40 and an adjustment module 50;
the acquiring module 10 is configured to acquire an image to be processed, and perform feature extraction on the image to be processed through a preset backbone network to obtain image features.
It should be noted that the image to be processed may be a scene image input by a user through a user interaction interface of the text region boundary detection device, or may also be a scene image input by a user through a terminal device that establishes a communication connection with the text region boundary detection device in advance, which is not limited in this embodiment.
The preset backbone network is an image feature extraction network preset by a user, and in this embodiment, a ResNet-101 network embedded with a deformed convolution is taken as an example for description, which is not limited in this embodiment.
In a specific implementation, for example, an image to be processed is acquired, and the ResNet-101 network embedded with deformation convolution is used as a backbone network to extract image features.
The processing module 20 is configured to determine an initial candidate region according to the image feature and a preset region suggestion network, and perform pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature.
It can be understood that, determining an initial candidate region according to the image feature and a preset region suggestion network, and performing pooling processing on the initial candidate region to obtain a first fixed feature and a second fixed feature may be determining an initial candidate region according to the image feature and the preset region suggestion network, analyzing the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, and performing pooling processing on the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature.
In a specific implementation, for example, an image feature is processed through a Region suggestion Network (RPN) to generate a candidate Region, and in the process of mapping the candidate Region to a fixed size, a deformation offset is learned by using a position-aware deformation Region-of-Interest (drosi) pooling technique to generate a more accurate alignment feature. Here, two different fixed-size features are generated using two DROI pooling layers that do not share learning parameters
Figure 739100DEST_PATH_IMAGE001
The analysis module 30 is configured to analyze the first fixed feature through a preset text area adjustment network to obtain a feature analysis result.
It should be noted that the preset Text adjustment Network may be a regional adjustment Network preset by a user, and in this embodiment, a Text Region adjustment Network (TRRN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 126219DEST_PATH_IMAGE002
Input to a Text Region Reference Network (TRRN). The structure of the text region adjustment network is the same as that of the Mask RCNN, but the difference is that the text region adjustment network aims at the 2-category case. TRRN is used for acquiring confidence coefficient and bit of adjusted character regionAnd (5) offset.
The detection module 40 is configured to determine an initial text region boundary detection result according to a preset text mask segmentation network and the second fixed feature.
It should be noted that the preset Text Mask Segmentation Network may be a Segmentation Network preset by a user, and in this embodiment, a Text Mask Segmentation Network (TMSN) is taken as an example for description.
In particular implementations, for example, features of fixed size
Figure 496020DEST_PATH_IMAGE003
Input into a Text Mask Segmentation Network (TMSN). The structure of the character Mask segmentation network is consistent with that of the Mask RCNN, and the TMSN is used for obtaining segmentation results of scene characters in any shapes.
The adjusting module 50 is configured to adjust the initial text region boundary detection result according to the feature analysis result, so as to obtain a target text region boundary detection result.
It should be understood that, adjusting the initial text region boundary detection result according to the feature analysis result to obtain the target text region boundary detection result may be obtaining a confidence degree and a position offset from the feature analysis result, determining an initial text mask according to the initial text region boundary and the position offset, determining a text mask overlap rate according to the initial text mask, determining a target text mask according to the text mask overlap rate and the confidence degree, performing boundary detection on the target text mask to obtain a detection result, and determining the target text region boundary detection result according to the detection result.
Further, in order to improve the accuracy and reliability of the target word mask, the adjusting module 50 is further configured to determine a word mask overlap ratio according to the initial word mask, determine whether the word mask overlap ratio is greater than a preset threshold, rank the initial word mask according to the confidence when the word mask overlap ratio is greater than the preset threshold, obtain a ranking result, and screen the initial word mask according to the ranking result, so as to obtain the target word mask.
In a specific implementation, for example, the absolute position of the character region of any shape in the input image is obtained according to the divided binary image and the corresponding position of the character region. And calculating the overlapping rate mode of the two character masks as max (O/A, O/B) by a Non-Maximum Suppression (NMS) of the maximized intersection region, wherein O is the overlapping area of the two masks, and A and B are the areas of the two masks respectively. When the overlap ratio is greater than 0.8, one of the two masks with lower confidence is removed.
Compared with the existing method for detecting the scene characters only by researching the expression form of the characters with any shape or enhancing the feature expression, in the embodiment, the image to be processed is obtained by acquiring the image to be processed and extracting the features of the image to be processed through the preset backbone network, the image features are obtained, the initial candidate region is determined according to the image features and the preset region suggestion network, the initial candidate region is subjected to pooling processing to obtain the first fixed feature and the second fixed feature, the first fixed feature is analyzed through the preset character region adjustment network to obtain the feature analysis result, the network and the second fixed feature are divided according to the preset character mask to determine the initial character region boundary detection result, the initial character region boundary detection result is adjusted according to the feature analysis result to obtain the target character region boundary detection result, and the defect that the region boundary of the characters with any shape cannot be accurately identified in the prior art is overcome, therefore, the regional boundary detection process of the characters can be optimized, and the accuracy and reliability of the regional boundary detection of the characters are improved, so that the requirement of scene character detection is met.
In an embodiment, the adjusting module 50 is further configured to obtain a confidence degree and a position offset from the feature analysis result, determine an initial word mask according to the initial word region boundary and the position offset, determine a word mask overlap rate according to the initial word mask, determine a target word mask according to the word mask overlap rate and the confidence degree, perform boundary detection on the target word mask, obtain a detection result, and determine a target word region boundary detection result according to the detection result;
in an embodiment, the adjusting module 50 is further configured to determine a word mask overlap ratio according to the initial word mask, determine whether the word mask overlap ratio is greater than a preset threshold, rank the initial word mask according to the confidence when the word mask overlap ratio is greater than the preset threshold, obtain a ranking result, and filter the initial word mask according to the ranking result to obtain a target word mask;
in an embodiment, the processing module 20 is further configured to determine an initial candidate region according to the image feature and a preset region suggestion network, analyze the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, perform pooling processing on the initial candidate region according to the deformation offset, and obtain a first fixed feature and a second fixed feature;
in an embodiment, the text region boundary detecting apparatus further includes: a training module;
the training module is used for acquiring an initial sample image, carrying out scale adjustment on the initial sample image to obtain a sample image to be processed, carrying out image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed, carrying out feature analysis on the subimage to be processed to obtain a positive sample subimage and a negative sample subimage, training an initial area suggestion network according to the positive sample subimage and the negative sample subimage, and acquiring a preset area establishment network;
in an embodiment, the training module is further configured to obtain an image size of the sub-image to be processed, search a threshold range corresponding to the image size, analyze the sub-image to be processed, obtain a text bounding box, obtain each boundary length of the text bounding box, and determine a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range;
in an embodiment, the training module is further configured to determine a shape structure constraint function according to the image to be processed, and train the initial word mask segmentation network according to the shape structure constraint function to obtain a preset word mask segmentation network.
Other embodiments or specific implementation manners of the text region boundary detection device according to the present invention may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, but rather the words first, second, third, etc. are to be interpreted as names.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A method for detecting the boundary of a text area is characterized by comprising the following steps:
acquiring an image to be processed, and extracting the features of the image to be processed through a preset backbone network to obtain image features;
determining an initial candidate area according to the image characteristics and a preset area suggestion network;
analyzing the initial candidate region through a preset deformation interest region pooling model to obtain deformation offset;
pooling the initial candidate region according to the deformation offset to obtain a first fixed feature and a second fixed feature;
analyzing the first fixed characteristic through a preset character area adjusting network to obtain a characteristic analysis result;
determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic;
and adjusting the initial character area boundary detection result according to the characteristic analysis result to obtain a target character area boundary detection result.
2. The text region boundary detection method according to claim 1, wherein the step of adjusting the initial text region boundary detection result according to the feature analysis result to obtain a target text region boundary detection result specifically comprises:
obtaining confidence and position offset from the feature analysis result, and determining an initial character mask according to the initial character region boundary and the position offset;
determining a word mask overlapping rate according to the initial word mask, and determining a target word mask according to the word mask overlapping rate and the confidence;
and carrying out boundary detection on the target character mask to obtain a detection result, and determining a target character area boundary detection result according to the detection result.
3. The method of detecting a text region boundary according to claim 2, wherein the step of determining a text mask overlap ratio according to the initial text mask and determining a target text mask according to the text mask overlap ratio and the confidence degree specifically includes:
determining the word mask overlapping rate according to the initial word mask, and judging whether the word mask overlapping rate is greater than a preset threshold value;
when the word mask overlapping rate is larger than a preset threshold, sequencing the initial word mask according to the confidence coefficient to obtain a sequencing result;
and screening the initial character mask according to the sorting result to obtain a target character mask.
4. The text region boundary detection method according to claim 1, wherein before the step of obtaining the image to be processed and extracting the features of the image to be processed through a preset backbone network to obtain the image features, the text region boundary detection method further comprises:
obtaining an initial sample image, and carrying out scale adjustment on the initial sample image to obtain a sample image to be processed;
performing image extraction on the sample image to be processed through a preset sliding window to obtain a sample subimage to be processed;
performing feature analysis on the to-be-processed sample sub-image to obtain a positive sample sub-image and a negative sample sub-image;
and training an initial area suggested network according to the positive sample subimage and the negative sample subimage to obtain a preset area established network.
5. The text area boundary detection method according to claim 4, wherein the step of performing feature analysis on the sub-image to be processed to obtain a positive example sample sub-image and a negative example sample sub-image specifically comprises:
acquiring the image size of the sub-image to be processed, and searching a threshold range corresponding to the image size;
analyzing the sub-image to be processed to obtain a character enclosing frame;
and acquiring the boundary length of the text bounding box, and determining a positive sample sub-image and a negative sample sub-image according to the boundary length and the threshold range.
6. The text region boundary detection method of any one of claims 1-5, wherein before the step of determining an initial text region boundary detection result according to a preset text mask segmentation network and the second fixed feature, the text region boundary detection method further comprises:
determining a shape structure constraint function according to the image to be processed;
and training the initial character mask segmentation network according to the shape structure constraint function to obtain a preset character mask segmentation network.
7. A letter region boundary detection apparatus characterized by comprising: a memory, a processor and a literal region boundary detection program stored on the memory and executable on the processor, the literal region boundary detection program when executed by the processor implementing the steps of the literal region boundary detection method of any of claims 1-6.
8. A storage medium having a text region boundary detection program stored thereon, the text region boundary detection program when executed by a processor implementing the steps of the text region boundary detection method according to any one of claims 1 to 6.
9. A character region boundary detection device, comprising: the device comprises an acquisition module, a processing module, an analysis module, a detection module and an adjustment module;
the acquisition module is used for acquiring an image to be processed and extracting the characteristics of the image to be processed through a preset backbone network to obtain image characteristics;
the processing module is used for determining an initial candidate region according to the image characteristics and a preset region suggestion network, analyzing the initial candidate region through a preset deformation interest region pooling model to obtain a deformation offset, and pooling the initial candidate region according to the deformation offset to obtain a first fixed characteristic and a second fixed characteristic;
the analysis module is used for analyzing the first fixed characteristic through a preset character area adjustment network to obtain a characteristic analysis result;
the detection module is used for determining an initial character area boundary detection result according to a preset character mask segmentation network and the second fixed characteristic;
and the adjusting module is used for adjusting the initial character region boundary detection result according to the characteristic analysis result to obtain a target character region boundary detection result.
CN202110190870.XA 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device Active CN112560857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110190870.XA CN112560857B (en) 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110190870.XA CN112560857B (en) 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device

Publications (2)

Publication Number Publication Date
CN112560857A CN112560857A (en) 2021-03-26
CN112560857B true CN112560857B (en) 2021-06-08

Family

ID=75034372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110190870.XA Active CN112560857B (en) 2021-02-20 2021-02-20 Character area boundary detection method, equipment, storage medium and device

Country Status (1)

Country Link
CN (1) CN112560857B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863431A (en) * 2022-04-14 2022-08-05 中国银行股份有限公司 Text detection method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794246A (en) * 2015-05-15 2015-07-22 百度在线网络技术(北京)有限公司 Information search method and information search device
CN108062531A (en) * 2017-12-25 2018-05-22 南京信息工程大学 A kind of video object detection method that convolutional neural networks are returned based on cascade
CN108470172A (en) * 2017-02-23 2018-08-31 阿里巴巴集团控股有限公司 A kind of text information identification method and device
KR102030628B1 (en) * 2019-04-04 2019-10-10 (주)아이엠시티 Recognizing method and system of vehicle license plate based convolutional neural network
WO2020005731A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Text entity detection and recognition from images

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062547B (en) * 2017-12-13 2021-03-09 北京小米移动软件有限公司 Character detection method and device
CN110287960B (en) * 2019-07-02 2021-12-10 中国科学院信息工程研究所 Method for detecting and identifying curve characters in natural scene image
CN111553349B (en) * 2020-04-26 2023-04-18 佛山市南海区广工大数控装备协同创新研究院 Scene text positioning and identifying method based on full convolution network
CN111553347B (en) * 2020-04-26 2023-04-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method oriented to any angle
CN111724401A (en) * 2020-05-08 2020-09-29 华中科技大学 Image segmentation method and system based on boundary constraint cascade U-Net
CN112149620A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Method for constructing natural scene character region detection model based on no anchor point

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794246A (en) * 2015-05-15 2015-07-22 百度在线网络技术(北京)有限公司 Information search method and information search device
CN108470172A (en) * 2017-02-23 2018-08-31 阿里巴巴集团控股有限公司 A kind of text information identification method and device
CN108062531A (en) * 2017-12-25 2018-05-22 南京信息工程大学 A kind of video object detection method that convolutional neural networks are returned based on cascade
WO2020005731A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Text entity detection and recognition from images
KR102030628B1 (en) * 2019-04-04 2019-10-10 (주)아이엠시티 Recognizing method and system of vehicle license plate based convolutional neural network

Also Published As

Publication number Publication date
CN112560857A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN108229509B (en) Method and device for identifying object class and electronic equipment
CN108304775B (en) Remote sensing image recognition method and device, storage medium and electronic equipment
CN111242088A (en) Target detection method and device, electronic equipment and storage medium
EP3101594A1 (en) Saliency information acquisition device and saliency information acquisition method
JP5298831B2 (en) Image processing apparatus and program
CN113255915B (en) Knowledge distillation method, device, equipment and medium based on structured instance graph
CN111814905A (en) Target detection method, target detection device, computer equipment and storage medium
CN107578424B (en) Dynamic background difference detection method, system and device based on space-time classification
CN115375917B (en) Target edge feature extraction method, device, terminal and storage medium
CN112101317A (en) Page direction identification method, device, equipment and computer readable storage medium
CN112396047B (en) Training sample generation method and device, computer equipment and storage medium
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN114494775A (en) Video segmentation method, device, equipment and storage medium
CN112560857B (en) Character area boundary detection method, equipment, storage medium and device
CN111738272A (en) Target feature extraction method and device and electronic equipment
CN113128604A (en) Page element identification method and device, electronic equipment and storage medium
US10354409B2 (en) Image processing device, image processing method, and non-transitory computer-readable recording medium
CN110442719B (en) Text processing method, device, equipment and storage medium
CN116030472A (en) Text coordinate determining method and device
CN113537158B (en) Image target detection method, device, equipment and storage medium
CN115775386A (en) User interface component identification method and device, computer equipment and storage medium
CN115223173A (en) Object identification method and device, electronic equipment and storage medium
CN111325194B (en) Character recognition method, device and equipment and storage medium
CN115035552B (en) Fall detection method and device, equipment terminal and readable storage medium
CN115631493B (en) Text region determining method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant