CN118334634A - Image text detection method, system, equipment and storage medium - Google Patents
Image text detection method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN118334634A CN118334634A CN202410085002.9A CN202410085002A CN118334634A CN 118334634 A CN118334634 A CN 118334634A CN 202410085002 A CN202410085002 A CN 202410085002A CN 118334634 A CN118334634 A CN 118334634A
- Authority
- CN
- China
- Prior art keywords
- contour
- text
- module
- iteration
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 68
- 238000003860 storage Methods 0.000 title claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 43
- 230000000750 progressive effect Effects 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 17
- 238000006073 displacement reaction Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000007670 refining Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 claims 1
- 230000000379 polymerizing effect Effects 0.000 claims 1
- 238000005452 bending Methods 0.000 abstract description 3
- 230000001788 irregular Effects 0.000 abstract description 3
- 230000009466 transformation Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 101000872083 Danio rerio Delta-like protein C Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- -1 carrier Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000012633 leachable Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Abstract
The invention discloses an image character detection method, an image character detection system, an image character detection device and a storage medium, wherein progressive learning is introduced to realize more accurate and efficient character detection, and through combining contour transformation and progressive learning, the image character detection method not only can adaptively process characters with oblique, bending and irregular shapes. Meanwhile, the progressive learning is added, so that the method can be self-optimized and adjusted when processing different scenes and character types, the need of manual intervention is greatly reduced, the stability and reliability of detection are improved, and in general, the invention can provide a more universal, efficient and self-adaptive image character detection solution for wide users and researchers.
Description
Technical Field
The present invention relates to the field of image text detection technologies, and in particular, to an image text detection method, system, device, and storage medium.
Background
With the rapid development of information technology, research in the fields of image processing and computer vision is also getting deeper. In particular, in text detection and recognition, these techniques have been widely used in a variety of contexts, such as invoice verification, autopilot, unmanned aerial vehicle detection, and the like. Although many studies have been conducted for this, many challenges remain in practical use.
The traditional text detection method is mainly based on technologies such as image segmentation, edge detection, pattern matching and the like. Although these methods can achieve good results in some controlled scenes, their performance tends to be unsatisfactory in complex backgrounds, low resolution, or text-to-background colors. Furthermore, these conventional methods typically require a large amount of manual parameter adjustment to accommodate different scenarios and applications.
In recent years, with the rise of deep learning technology, many text detection methods based on neural networks have been proposed. These methods are generally capable of automatically learning features in extracted images and exhibit excellent generalization performance in various complex scenarios. However, even these advanced methods often encounter difficulties in handling word boundaries, details and shapes. For example, many methods are difficult to detect accurately when the text is similar to the background color or the text is unevenly distributed. Also, existing methods often perform poorly when dealing with fine contours of text, which is unacceptable in some high precision applications. Specifically, the current solution mainly has the following problems:
1) The traditional image text detection method is mainly aimed at processing transverse and regular text at the beginning of design. Such designs have often made them undesirable for detecting diagonal, curved or irregularly shaped text. Particularly in complex scenes, such as parts of books with handwritten annotations, document images of historic documents, etc., or billboards in urban streets, the detection capabilities of these methods are often greatly limited. Therefore, in practical applications, it is difficult for these techniques to meet the requirements of diversified and dynamic text detection.
2) Most of the prior art processes complex image scenes, such as nested, overlapping or background noisy scenes, with relatively large computational effort, resulting in inefficient processing. Particularly, in the occasions needing real-time processing, such as live caption extraction, street view navigation and the like, the methods often have difficulty in meeting the requirement of real-time performance. Moreover, these methods often suffer from performance bottlenecks in the face of large-scale, high-resolution, or large amounts of dynamically changing image data, severely impacting user experience and application versatility.
3) Many conventional approaches lack efficient adaptive learning capabilities. In practice, operators often spend a great deal of time making manual parameter adjustments and model fine-tuning whenever facing new text types or different scenes. This operation not only increases the technological use threshold, but also greatly reduces the stability and reliability of the method. For example, when the detection system is deployed in a new environment, the conventional technology is likely to be unable to be directly applied due to the differences in language, culture and writing style, and a great deal of pre-adjustment work is required.
Disclosure of Invention
The invention aims to provide an image character detection method, an image character detection system, an image character detection device and a storage medium, which can more accurately and efficiently detect character information in an image, especially in the situations of low contrast between characters and a background or uneven character distribution.
The invention aims at realizing the following technical scheme:
an image text detection method, comprising:
step 1, extracting features of an original image to obtain a multi-scale feature map;
step 2, performing preliminary detection on the text region by utilizing the multi-scale feature map to obtain an initial contour of the text region;
and 3, utilizing a multi-scale feature map and an initial contour of a text region, and continuously iterating and optimizing the contour shape by using a progressive learning mechanism, wherein each iteration refines and adjusts the contour on the basis of the previous iteration, and finally, the contour capable of covering each text is obtained through iteration.
An image text detection system comprising:
the feature extraction module is used for extracting features of the original image to obtain a multi-scale feature map;
The outline initialization module is used for carrying out preliminary detection on the text area by utilizing the multi-scale feature map to obtain an initial outline of the text area;
The progressive contour optimization module is used for utilizing the multi-scale feature map and the initial contour of the text region, using a progressive learning mechanism to continuously iterate and optimize the contour shape, refining and adjusting the contour on the basis of the previous iteration every time, and finally iterating to obtain the contour capable of covering each text.
A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.
A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.
According to the technical scheme provided by the invention, progressive learning is introduced to realize more accurate and efficient character detection, and through combining contour transformation and progressive learning, the invention can adaptively process characters in oblique, bending and irregular shapes. Meanwhile, the progressive learning is added, so that the method can be self-optimized and adjusted when processing different scenes and character types, the need of manual intervention is greatly reduced, the stability and reliability of detection are improved, and in general, the invention can provide a more universal, efficient and self-adaptive image character detection solution for wide users and researchers.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an image text detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a frame of an image text detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an image text detection system according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The terms that may be used herein will first be described as follows:
The terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.
The term "consisting of … …" is meant to exclude any technical feature element not explicitly listed. If such term is used in a claim, the term will cause the claim to be closed, such that it does not include technical features other than those specifically listed, except for conventional impurities associated therewith. If the term is intended to appear in only a clause of a claim, it is intended to limit only the elements explicitly recited in that clause, and the elements recited in other clauses are not excluded from the overall claim.
Aiming at the problems of incomplete detection or unmatched detection results possibly occurring when the existing image and text detection method processes complex background, small characters or text dense areas, the invention aims to provide an image and text detection scheme based on progressive contour transformation. The scheme can more accurately and efficiently detect the text information in the image, especially in the situations of low contrast between the text and the background or uneven text distribution.
The following describes a detailed description of an image text detection scheme provided by the invention. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.
Example 1
The embodiment of the invention provides an image text detection method, which mainly comprises the following steps as shown in fig. 1:
and step 1, extracting features of the original image to obtain a multi-scale feature map.
In the embodiment of the invention, the input image can be processed through the stacked convolution layer, the pooling layer and the normalization layer to obtain the multi-scale feature map.
And step 2, performing preliminary detection on the text region by using the multi-scale feature map to obtain an initial contour of the text region.
In the embodiment of the invention, the circumscribed rectangular detection frame of the text area is obtained based on the existing arbitrary target detector by utilizing the multi-scale feature map; then initializing an octagon in the circumscribed rectangle detection frame, wherein the vertex positions of the octagon are 1/4 and 3/4 positions of the rectangle sides, and obtaining a polygonal outline attached to the character shape and serving as an initial outline of the character area.
And 3, utilizing a multi-scale feature map and an initial contour of a text region, and continuously iterating and optimizing the contour shape by using a progressive learning mechanism, wherein each iteration refines and adjusts the contour on the basis of the previous iteration, and finally, the contour capable of covering each text is obtained through iteration.
The step is an iterative execution process, and the contour output by the last iteration is the contour capable of covering each character. The process at the kth iteration is as follows:
1) Feature sampling is performed on a k-1 th iteration of the contour C k-1 from the multi-scale feature map to obtain a vertex feature f k-1, which is expressed as:
fk-1=Sample(F,Ck-1)
where Sample is a sampling function based on bilinear interpolation, and C k-1 is the initial contour of the text region when k=1.
2) The vertex features are aggregated to obtain an aggregate feature g k-1, denoted as:
gk-1=CircConv(fk-1)
Wherein CircConv (-) is a circular convolution function for feature aggregation.
3) The displacement vector Δc k-1 of the predicted contour using the aggregate feature g k-1 is expressed as:
ΔCk-1=Updater(gk-1)
the Updater is a displacement prediction module, and is formed by stacking a convolution layer and a correction linear unit.
4) The contour C k-1 of the kth-1 time is updated in combination with the displacement vector DeltaC k-1 of the contour to obtain a contour C k of the kth iteration, expressed as:
Ck=Ck-1+ΔCk-1。
In the above scheme provided by the embodiment of the invention, the step 2 is realized by a contour initialization module, the step 3 is realized by a progressive contour optimization module, and the two modules train in the following manner;
For the contour initialization module, the initialization loss function L init is calculated by using the distance between the initial contour C 0 and the actual contour C gt of the text region output by the contour initialization module, and is expressed as:
Linit=||C0-Cgt||
Wherein i is an L1 norm symbol.
For a progressive profile tuning module, the distance between the profile of each iteration and the actual profile C gt is used to calculate an iteration loss function L evolve, expressed as:
Wherein, C i is the contour of the ith iteration, and K is the iteration number.
The total loss function L total is:
Ltotal=λinit Linit+λevolve Levolve
Wherein λ init and λ evolve are two weight factors.
And training a profile initializing module and a progressive profile optimizing module by using the total loss function.
In the embodiment of the present invention, the step serial numbers are only used to identify different steps, and do not represent the sequence of the steps, and the specific sequence of the steps can be embodied by specific technical content.
The scheme provided by the embodiment of the invention has the advantage that characters in various forms can be comprehensively and accurately detected. Firstly, not only can the traditional transverse characters be effectively detected, but also the characters with oblique, bending and irregular shapes can be accurately detected. This provides a solid technical support for processing modern and complex text image scenes such as billboards, handwriting annotations, document images and the like. Secondly, the invention combines the powerful technology of deep learning, and can perform self-learning and optimization on the detection method. This means that the detection effect of the method will gradually increase with the increase of the use time and the accumulation of the data amount, and can adapt to more scenes and cope with more complicated character forms. This adaptive learning capability is difficult to achieve in many conventional approaches. In addition, high efficiency can be maintained when processing a large amount of image data. The three modules of feature extraction, contour initialization and progressive contour optimization are combined, so that the accuracy of detection is ensured while the calculation redundancy is reduced. This balance is difficult to achieve in many prior art methods. In short, the image text detection method of the invention is an innovative and beneficial solution to the text detection field by virtue of the comprehensive detection capability, self-learning advantage and high-efficiency performance.
In order to more clearly demonstrate the technical scheme and the technical effects provided by the invention, the method provided by the embodiment of the invention is described in detail below by using specific embodiments.
1. The scheme is introduced in whole.
The core idea of the invention is to accurately identify the characters in the image by utilizing advanced algorithm of deep learning and combining progressive contour optimization strategy. As shown in fig. 2, it mainly comprises the following three main modules:
1) And the feature extraction module is used for: extracting features of the original image, and outputting the obtained features in two paths, wherein one path of the features is output to the progressive profile optimizing module, and the other path of the features is output to the profile initializing module.
2) Profile initialization module: in the module, based on the extracted feature map, preliminary text detection is performed, and a preliminary contour including a text region is generated. This process involves not only the detection of text regions, but also the further refinement of the detected rectangular box into a polygonal outline that more closely fits the actual text shape. These polygonal contours are initial contours, which lay the foundation for progressive contour tuning.
3) Progressive profile tuning module: the module, as a core part of the invention, accepts as input the features from the feature extraction module and the initial profile of the profile initialization module. By introducing a progressive learning mechanism, the module continuously iterates and optimizes the contour shape, and each iteration refines and adjusts the contour on the basis of the previous iteration so as to finally obtain the contour prediction capable of accurately covering each character. The iterative learning method can continuously optimize the detection effect, approach the word boundary and remarkably improve the accuracy and the robustness of word detection.
The invention realizes the high-precision detection of the characters in the image through the three mutually-cooperated modules, and particularly in complex scenes which are difficult to deal with by the traditional method, the invention has excellent performance and practical value.
2. The specific scheme is introduced.
1. And the characteristic extraction module.
The input to the feature extraction module is the original image, e.g. an RGB imageAnd 3 represents three channels of RGB, and H and W are the height and width of the image respectively.
The feature extraction module may be a feature extractor comprising a series of convolution layers (Conv), pooling layers (Pool), and normalization layers (Norm); the input original image may extract a multi-scale feature map F through a series of convolution layers Conv, pooling layers Pool and normalization layers Norm.
The deep features can help effectively capture text information in an image, strengthen the contrast between text and background and provide powerful feature support for subsequent text detection.
2. And a contour initializing module.
The outline initialization module mainly comprises: the detector mainly takes charge of detecting rectangular frames of the text areas, and the contour refiner refines the rectangular frames into polygonal contours closely attached to the actual text shapes.
The detector predicts a rectangular detection frame of the text region, and can be realized by a disclosed target detection algorithm. The contour refinement module initializes octagons in each rectangular detection frame as an initial contour of the text region. Wherein, the position of each vertex of the octagon is 1/4 and 3/4 of the position of each side of the rectangular frame. Thereby providing the necessary information for subsequent contour refinement.
3. And a progressive profile optimizing module.
In the kth iteration of the progressive profile tuning module, the profile C k-1 undergoes a fine tuning step to more accurately fit the text profile in the image. The process first performs feature sampling on C k-1 from the multi-scale feature map F to obtain the associated vertex feature F k-1. This step is achieved by bilinear sampling, ensuring that features extracted from the multi-scale feature map match the vertex positions of the contours. Next, a feature aggregation operation is performed, and the sampled vertex feature f k-1 is aggregated into an aggregate feature g k-1 through circular convolution CircConv, so as to obtain more abundant context information and detail features. Feature aggregation may enhance the feature representation of the vertices of the contours, thereby providing a strong basis for further deformation of the contours. The deformation and adjustment of the contour are completed through an updater module in the progressive contour optimization module, and the hyperbolic tangent function is specifically applied to convert the initial contour feature g 0 into an initial hidden state h 0, so that a starting point is provided for vertex coordinate updating in the iterative process. The subsequent hidden states h 1 to h k-1 are updated step by the GRU unit (gate loop unit) based on the previous state and the current profile information, which receives the aggregated feature g k-1 and calculates a displacement vector deltac k-1, This vector indicates how to adjust the current contour vertex to be closer to the actual text edge. The iterative update module may employ a gated loop unit or other leachable structure to gradually update to approximate the optimal profile, and finally, the updater module applies Δc k-1 to the current profile C k-1 by element-level addition to obtain an updated profile C k. The overall process can be summarized by the following formula:
fk-1=Sample(F,Ck-1)
gk-1=CircConv(fk-1)
ΔCk-1=Updater(gk-1)
Ck=Ck-1+ΔCk-1。
In the above formula, updater is responsible for calculating displacement vector Δc k-1 of the contour by the displacement prediction module, and the contour deformation module obtains updated contour C k by combining displacement vector Δc k-1 of the contour, where the displacement prediction module and the contour deformation module together form the Updater module.
The above process is repeated until the set iteration times K are reached, and the final outline is obtained, namely the image text detection result.
3. Training protocols.
In the embodiment of the invention, the optimization training is mainly performed aiming at the contour initialization module and the progressive contour optimization module.
For the contour initialization module, the initialization loss function L init calculates the distance of the predicted initial contour C 0 from the actual contour C gt:
Linit=||C0-Cgt||
Wherein i is an L1 norm symbol.
For the progressive profile tuning module, the iterative loss function L evolve calculates the distance of the predicted profile from the actual profile:
wherein C i is the contour of the ith iteration.
Finally, the total loss function L total is:
Ltotal=λinit Linit+λevolve Levolve
Where λ init and λ evolve are two weight factors to balance the initialization and iteration losses.
From the description of the above embodiments, it will be apparent to those skilled in the art that the above embodiments may be implemented in software, or may be implemented by means of software plus a necessary general hardware platform. With such understanding, the technical solutions of the foregoing embodiments may be embodied in a software product, where the software product may be stored in a nonvolatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and include several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present invention.
Example two
The invention also provides an image text detection system, which is mainly used for realizing the method provided by the previous embodiment, as shown in fig. 3, and mainly comprises:
the feature extraction module is used for extracting features of the original image to obtain a multi-scale feature map;
The outline initialization module is used for carrying out preliminary detection on the text area by utilizing the multi-scale feature map to obtain an initial outline of the text area;
The progressive contour optimization module is used for utilizing the multi-scale feature map and the initial contour of the text region, using a progressive learning mechanism to continuously iterate and optimize the contour shape, refining and adjusting the contour on the basis of the previous iteration every time, and finally iterating to obtain the contour capable of covering each text.
In view of the above, the details of the main processing of each module have been described in the previous embodiments, and will not be described in detail.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the system is divided into different functional modules to perform all or part of the functions described above.
Example III
The present invention also provides a processing apparatus, as shown in fig. 4, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.
Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.
In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:
The input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;
The output device may be a display terminal;
the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.
Example IV
The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.
The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (10)
1. An image text detection method is characterized by comprising the following steps:
step 1, extracting features of an original image to obtain a multi-scale feature map;
step 2, performing preliminary detection on the text region by utilizing the multi-scale feature map to obtain an initial contour of the text region;
and 3, utilizing a multi-scale feature map and an initial contour of a text region, and continuously iterating and optimizing the contour shape by using a progressive learning mechanism, wherein each iteration refines and adjusts the contour on the basis of the previous iteration, and finally, the contour capable of covering each text is obtained through iteration.
2. The method for detecting image text according to claim 1, wherein the feature extraction of the original image to obtain a multi-scale feature map comprises:
And processing the input image through the stacked convolution layer, the pooling layer and the normalization layer to obtain a multi-scale feature map.
3. The method for detecting image text according to claim 1, wherein the preliminary detection of text regions by using the multi-scale feature map includes:
Obtaining an external rectangular detection frame of the text area based on any target detector by utilizing the multi-scale feature map;
Initializing an octagon in the circumscribed rectangular detection frame, wherein the vertex positions of the octagon are 1/4 and 3/4 of the positions of the rectangular sides, obtaining a polygonal outline attached to the shape of the characters, and taking the polygonal outline as an initial outline of the character area.
4. The image text detection method of claim 1, wherein the process at the kth iteration is as follows:
Feature sampling is carried out on the k-1 th iterative contour C k-1 from the multi-scale feature map, and vertex features f k-1 are obtained;
Polymerizing the vertex characteristics to obtain an aggregation characteristic g k-1;
Predicting a displacement vector delta C k-1 of the contour by using the aggregation feature g k-1;
And updating the k-1 th contour C k-1 by combining the displacement vector delta C k-1 of the contour to obtain a k iteration contour C k.
5. The method for detecting image text according to claim 4, wherein,
The feature samples are expressed as:
fk-1=Sample(F,Ck-1)
wherein Sample (& gt) is a sampling function, and when k=1, C k-1 is an initial contour of a text region;
aggregating vertex features is expressed as:
gk-1=CircConv(fk-1)
Wherein CircConv (-) is a circular convolution function for feature aggregation.
6. The method for detecting image text according to claim 4, wherein,
The displacement vector of the calculated contour is expressed as:
ΔCk-1=Updater(gk-1)
the Updater is a displacement prediction module and is formed by stacking a convolution layer and a correction linear unit;
The update of contour C k-1 for the k-1 th time in combination with the displacement vector ΔC k-1 of the contour is expressed as:
Ck=Ck-1+ΔCk-1。
7. the image text detection method according to claim 1, wherein the step 2 is implemented by a contour initialization module, the step 3 is implemented by a progressive contour optimization module, and the two modules are trained in the following manner;
For the contour initialization module, the initialization loss function L init is calculated by using the distance between the initial contour C 0 and the actual contour C gt of the text region output by the contour initialization module, and is expressed as:
Linit=||C0-Cgt||
Wherein, L is L1 norm symbol;
For a progressive profile tuning module, the distance between the profile of each iteration and the actual profile C gt is used to calculate an iteration loss function L evolve, expressed as:
wherein C i is the contour of the ith iteration, and K is the iteration number;
The total loss function L total is:
Ltotal=λinitLinit+λevolveLevolve
Wherein λ init and λ evolve are two weight factors;
And training a profile initializing module and a progressive profile optimizing module by using the total loss function.
8. An image text detection system, comprising:
the feature extraction module is used for extracting features of the original image to obtain a multi-scale feature map;
The outline initialization module is used for carrying out preliminary detection on the text area by utilizing the multi-scale feature map to obtain an initial outline of the text area;
The progressive contour optimization module is used for utilizing the multi-scale feature map and the initial contour of the text region, using a progressive learning mechanism to continuously iterate and optimize the contour shape, refining and adjusting the contour on the basis of the previous iteration every time, and finally iterating to obtain the contour capable of covering each text.
9. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
10. A readable storage medium storing a computer program, characterized in that the method according to any one of claims 1-7 is implemented when the computer program is executed by a processor.
Publications (1)
Publication Number | Publication Date |
---|---|
CN118334634A true CN118334634A (en) | 2024-07-12 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108288088B (en) | Scene text detection method based on end-to-end full convolution neural network | |
CN102282572B (en) | Method and system for representing image patches | |
US8391613B2 (en) | Statistical online character recognition | |
CN110163239B (en) | Weak supervision image semantic segmentation method based on super-pixel and conditional random field | |
CN109903331B (en) | Convolutional neural network target detection method based on RGB-D camera | |
CN109086777B (en) | Saliency map refining method based on global pixel characteristics | |
CN108830279B (en) | Image feature extraction and matching method | |
CN111738055B (en) | Multi-category text detection system and bill form detection method based on same | |
CN111242221B (en) | Image matching method, system and storage medium based on image matching | |
CN110969129A (en) | End-to-end tax bill text detection and identification method | |
CN111583279A (en) | Super-pixel image segmentation method based on PCBA | |
WO2012070474A1 (en) | Object or form information expression method | |
CN113971809A (en) | Text recognition method and device based on deep learning and storage medium | |
CN111161300B (en) | Niblack image segmentation method based on improved Otsu method | |
CN114283431B (en) | Text detection method based on differentiable binarization | |
CN115147932A (en) | Static gesture recognition method and system based on deep learning | |
CN114862925A (en) | Image registration method, device and system based on SIFT and storage medium | |
CN114943754A (en) | Image registration method, system and storage medium based on SIFT | |
CN108647605B (en) | Human eye gaze point extraction method combining global color and local structural features | |
CN112633070A (en) | High-resolution remote sensing image building extraction method and system | |
CN111724428A (en) | Depth map sampling and reconstructing method based on-map signal model | |
CN118334634A (en) | Image text detection method, system, equipment and storage medium | |
CN110570450A (en) | Target tracking method based on cascade context-aware framework | |
CN113515661B (en) | Image retrieval method based on filtering depth convolution characteristics | |
CN112836594B (en) | Three-dimensional hand gesture estimation method based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication |