CN113807336A - Semi-automatic labeling method, system, computer equipment and medium for image text detection - Google Patents
Semi-automatic labeling method, system, computer equipment and medium for image text detection Download PDFInfo
- Publication number
- CN113807336A CN113807336A CN202110906651.7A CN202110906651A CN113807336A CN 113807336 A CN113807336 A CN 113807336A CN 202110906651 A CN202110906651 A CN 202110906651A CN 113807336 A CN113807336 A CN 113807336A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- candidate
- recognizer
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a semi-automatic labeling method, a system, computer equipment and a medium for image text detection, wherein the method comprises the following steps: acquiring a text image; acquiring a text center line from a text image; generating N candidate bounding boxes around a text centerline; inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer; comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses; obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label; and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label. The invention can improve the text detection labeling efficiency and labeling effect.
Description
Technical Field
The invention relates to a semi-automatic labeling method, a semi-automatic labeling system, computer equipment and a storage medium for image text detection, and belongs to the technical field of artificial intelligence and OCR.
Background
With the development of artificial intelligence technology, text detection technology has been greatly developed as a fundamental computer vision task. Text detection refers to the positioning of a text area in an image, and the technology can be widely applied to the industries of unmanned driving, robot navigation, blind person assistance and the like. With the great data-driven machine learning algorithm such as deep learning, great success is achieved in the fields of natural language processing, computer vision, voice recognition and the like, the image text detection technology based on deep learning is greatly developed, and the performance of the image text detection algorithm is remarkably improved. However, these methods rely on a large amount of detection annotation data.
To date, the acquisition of the detection annotation data has been largely manual, time consuming, labor intensive, and expensive. Especially for irregular text region positions, more points are usually needed for labeling, the efficiency is extremely low, and the precision is low due to human subjectivity. Therefore, a semi-automatic or automatic labeling algorithm is required to replace a manual labeling method, so that the labeling efficiency and accuracy are improved. Automatic algorithms have so far been implemented using detection algorithms, so-called pre-labeling. Because the performance of the detection algorithm is limited, the automatic algorithm pre-labeled by the detection algorithm cannot generate high-quality labels, more manual checks are still needed to obtain truly usable labels, and the difficult problem of image text detection labeling is not substantially solved.
Disclosure of Invention
In view of the above, the present invention provides a semi-automatic labeling method, system, computer device and storage medium for image text detection, which can improve the text detection labeling efficiency and labeling effect.
The invention aims to provide a semi-automatic image text detection labeling method.
The invention also provides a semi-automatic image text detection labeling system.
It is a third object of the invention to provide a computer apparatus.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a semi-automatic labeling method for image text detection, the method comprising:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
Further, the generating N candidate bounding boxes around the text center line specifically includes:
determining K +1 normals, each normal niIntersects the text centerline at point ciAt the same time as the text center line at point ciThe tangent of (A) is vertical;
at each normal niIn the above, a length h is determinedjLine segment l ofiLine segment liQuilt point ciHalving, regarding the end points of all the line segments as the vertexes of the polygon, and connecting all the vertexes in sequence to obtain a candidate bounding box Bj;
By determining N differencesVariable h ofjTo obtain N candidate bounding boxes.
Further, the variable hjIs determined as follows:
wherein j is 1, 2.
Further, the identifying, by the loose recognizer, the estimated text content from the N candidate text regions specifically includes:
the N candidate text regions are identified through a loose identifier to obtain N identification results { T }jI j ═ 1, 2,.. N }, and the recognition resultIs a matrix of L C in shape;
calculating adjacent recognition results TjAnd Tj-1Difference d ofjFrom the difference djMinimum recognition result Tj*To obtain an estimated text content T*Wherein
The slave difference djMinimum recognition resultTo obtain an estimated text content T*The following formula:
wherein component (a)Represents the recognition result TjProbability that the u-th character belongs to the v-th class.
Further, the structure of the loose recognizer is an image text recognizer based on a convolutional neural network, and the image text recognizer comprises a corrector, a first encoder, a first sequence model and a first decoder;
the corrector for correcting the input image region RjThe text image shape of (2);
the first encoder is used for extracting the characteristics of the corrected text image;
the first sequence model is used for extracting context dependent features;
the first decoder is used for translating the context dependent features and outputting a recognition result Tj;
The loose text recognizer is trained by using a loose text region of the synthesized image, wherein the loose text region is a region in the regional image, except for text, and a proper amount of background interference is introduced.
Further, the structure of the rigid recognizer is an image text recognizer based on a convolutional neural network, and the rigid recognizer comprises a second encoder, a second sequence model and a second decoder;
the second encoder is used for encoding the input image region RjCarrying out feature extraction;
the second sequence model is used for extracting context dependent features;
the second decoder is used for decoding the context dependent characteristic and outputting a recognition result sj;
The rigid recognizer is trained using a compact text region of the synthetic image, the compact text region being an image region within the regional image that is free of background interference other than text.
Further, the labeling of the text box to identify the loss as a guide is optimized as follows:
wherein the content of the first and second substances,indicating a loss of recognitionTo pairAnd μ represents the update step.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a semi-automatic annotation system for image text detection, the system comprising:
the text image acquisition module is used for acquiring a text image;
the text center line acquisition module is used for acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
the candidate bounding box generating module is used for generating N candidate bounding boxes surrounding the text center line, wherein each candidate bounding box is a polygonal area outline and encloses N candidate text areas;
the identification module is used for simultaneously inputting the N candidate text regions into the loose recognizer and the strict recognizer, obtaining estimated text contents by identifying the N candidate text regions through the loose recognizer, and predicting the content identification result of each candidate text region through the strict recognizer;
the identification loss calculation module is used for comparing the N content identification results with the estimated text content, and respectively calculating identification loss to obtain N identification losses;
the text box label acquisition module is used for acquiring the index of the most accurate candidate boundary box by determining the index of the minimum loss in all the identification losses so as to obtain the final text box label;
and the optimization module is used for optimizing the text box label by taking the identification loss as a guide to finally obtain a compact text box label.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a computer device comprises a processor and a memory for storing a program executable by the processor, wherein when the processor executes the program stored in the memory, the semi-automatic labeling method for detecting the image text is realized.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program, and when the program is executed by a processor, the semi-automatic labeling method for image text detection is realized.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of obtaining a text center line from a text image, generating a candidate boundary box surrounding the text center line, inputting the candidate boundary box into a loose recognizer and a strict recognizer, recognizing and obtaining estimated text contents through the loose recognizer, predicting a content recognition result through the strict recognizer, further calculating recognition loss, obtaining an index of the most accurate candidate boundary box through determining an index with the minimum loss in all recognition losses, further obtaining final text box labeling, optimizing the text box labeling by taking the recognition loss as a guide, obtaining compact text box labeling, achieving semi-automatic labeling, enabling the semi-automatic labeling to be between manual labeling and automatic labeling algorithms, and considering both labeling efficiency and labeling effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a simple flowchart of a semi-automatic labeling method for image text detection in embodiment 1 of the present invention.
Fig. 2 is a specific flowchart of the image text detection semi-automatic labeling method according to embodiment 1 of the present invention.
Fig. 3 is a schematic flowchart of candidate boundary generation in the image text detection semi-automatic labeling method according to embodiment 1 of the present invention.
Fig. 4 is a schematic flowchart of compact boundary estimation in the image text detection semi-automatic labeling method according to embodiment 1 of the present invention.
Fig. 5 is a block diagram of a semi-automatic image text detection annotation system according to embodiment 2 of the present invention.
Fig. 6 is a block diagram of a computer device according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1 and fig. 2, the present embodiment provides a semi-automatic labeling method for image text detection, which includes the following steps:
s201, acquiring a text image.
The text image of the embodiment is a scene text image, and can be acquired by collection, for example, by shooting the scene text image with a camera, or by searching from a database, for example, by storing the scene text image in the database in advance, and by searching the scene text image from the database, the text image can be acquired, and the acquired text image is used as an input.
S202, acquiring a text center line from the text image.
In this embodiment, the text center line is a curved fold line passing through the center of the textThe linear array is formed by sequentially connecting K +1 points to form K linear segments, wherein the K +1 points are marked as { ciI 1, 2., K +1}, with the text centerline as input.
S203, generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas.
In this embodiment, the N candidate bounding boxes are denoted as { B }j1, 2.,. N }, where N candidate text regions are denoted as { R |j|j=1,2,...,N},N=30。
With reference to fig. 3, the step S203 is a step of candidate bounding boxes, which specifically includes:
s2031, determining K +1 normals, each normal niIntersects the text centerline at point ciAt the same time as the text center line at point ciThe tangent of (a) is vertical.
S2032, at each normal niIn the above, a length h is determinedjLine segment l ofiLine segment liQuilt point ciHalving, regarding the end points of all the line segments as the vertexes of the polygon, and connecting all the vertexes in sequence to obtain a candidate bounding box Bj。
S2033, determining N different variables hjTo obtain N candidate bounding boxes.
In this embodiment, the variable h is differentjDifferent polygon bounding boxes may be determined, candidate N different hjN candidate bounding boxes, variable h, may be obtainedjIs determined as follows:
wherein j is 1, 2.
The following steps S204 to S206 are semantic boundary decision steps:
s204, inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer.
In this embodiment, identifying and obtaining the estimated text content from the N candidate text regions by using the loose identifier specifically includes:
1) for N candidate text regions R by a relaxed recognizerjIdentifying the 1, 2, N to obtain N identification results { T }jI j ═ 1, 2,.. N }, and the recognition resultIs a matrix of L x C shape.
2) Calculating adjacent recognition results TjAnd Tj-1Difference d ofjFrom the difference djMinimum recognition resultTo obtain an estimated text content T*Wherein
Calculating adjacent recognition results TjAnd Tj-1Difference d ofjIncluding but not limited to cross entropy loss function, CTC (connectionist Temporal Classification) loss function, and edit distance, from the difference djMinimum recognition resultTo obtain an estimated text content T*The following formula:
wherein component (a)Represents the recognition result TjProbability that the u-th character belongs to the v-th class.
The structure of the relaxed recognizer of the embodiment is an image text recognizer based on a convolutional neural network, and comprises a corrector, a first encoder, a first sequence model and a first decoder, and the descriptions of the parts are as follows:
corrector for correcting an input image region RjThe text image shape of (2);
the first encoder is used for extracting the characteristics of the corrected text image;
a first sequence model for extracting context dependent features;
a first decoder for translating the context dependent features and outputting a recognition result Tj;
The loose recognizer is trained by using a loose text region of the synthesized image, wherein the loose text region is a region in the regional image, except for the text, and a proper amount of background interference is introduced.
The structure of the rigid recognizer of this embodiment is an image text recognizer based on a convolutional neural network, and includes a second encoder, a second sequence model and a second decoder, and the descriptions of the respective parts are as follows:
a second encoder for encoding the input image region RjCarrying out feature extraction;
a second sequence model for extracting context dependent features;
a second decoder for decoding the context dependent feature and outputting a recognition result sj;
The rigid recognizer is trained using the compacted text regions of the synthetic image, which are regions of the image within the regional image that, except for text, do not have background interference.
S205, comparing the N content recognition results with the estimated text content, respectively calculating recognition losses, and obtaining N recognition losses.
In this embodiment, the recognition loss is given as { l }j1, 2.., N }, calculating a recognition penalty including, but not limited to, crossingAn entropy loss function, a ctc (connectionist Temporal classification) loss function, and an edit distance.
S206, obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label.
And S207, marking the text box, optimizing by taking the recognition loss as a guide, and finally obtaining a compact text box mark.
Referring to fig. 4, the step S107 is a tight boundary estimation step, and the text box is labeled to identify the loss as a guide for optimization, as follows:
wherein the content of the first and second substances,indicating a loss of recognitionTo pairAnd μ represents the update step.
In the above embodiment, the strict identifier performs training using the compact text region of the synthesized image, and is sensitive to the text image background region, so that the obtained text detection box is compact and has high accuracy.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 5, the present embodiment provides a semi-automatic labeling system for image text detection, which includes a text image obtaining module 501, a text centerline obtaining module 502, a candidate bounding box generating module 503, a recognition module 504, a recognition loss calculating module 505, a text box labeling obtaining module 506, and an optimizing module 507, where the specific functions of each module are as follows:
a text image obtaining module 501, configured to obtain a text image.
The text center line obtaining module 502 is configured to obtain a text center line from the text image, where the text center line is a curved broken line that runs through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments.
The candidate bounding box generating module 503 is configured to generate N candidate bounding boxes around a text centerline, where each candidate bounding box is a polygon region outline and encloses N candidate text regions.
The recognition module 504 is configured to input the N candidate text regions into the relaxed recognizer and the rigid recognizer at the same time, recognize the estimated text content from the N candidate text regions by the relaxed recognizer, and predict the content recognition result of each candidate text region by the rigid recognizer.
And the recognition loss calculating module 505 is configured to compare the N content recognition results with the estimated text content, and calculate recognition losses respectively to obtain N recognition losses.
The text box label obtaining module 506 is configured to obtain an index of the most accurate candidate bounding box by determining an index of the smallest loss among all the identification losses, and further obtain a final text box label.
And the optimizing module 507 is configured to optimize the text box label by using the recognition loss as a guide, so as to finally obtain a compact text box label.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional units, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, as shown in fig. 6, and includes a system bus 601, a connected processor 602, a memory, an input device 603, a display device 604, and a network interface 605, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium 606 and an internal memory 607, the nonvolatile storage medium 606 stores an operating system, a computer program, and a database, the internal memory 607 provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor 602 executes the computer program stored in the memory, the image text detection semi-automatic labeling method of embodiment 1 is implemented as follows:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for semi-automatically labeling image text detection in embodiment 1 is implemented as follows:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In summary, the invention obtains a text center line from a text image, generates a candidate bounding box around the text center line, inputs the candidate bounding box into a loose recognizer and a strict recognizer, obtains estimated text content through recognition of the loose recognizer, predicts a content recognition result through the strict recognizer, further calculates recognition loss, obtains an index of the most accurate candidate bounding box through determining an index of the minimum loss in all recognition losses, further obtains a final text box label, optimizes the text box label by taking the recognition loss as a guide, obtains a compact text box label, realizes semi-automatic label, and can give consideration to both label efficiency and label effect through the intermediate between a manual label and an automatic label algorithm.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.
Claims (10)
1. A semi-automatic labeling method for image text detection is characterized by comprising the following steps:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
2. The image text detection semi-automatic labeling method according to claim 1, wherein the generating N candidate bounding boxes around the text center line specifically comprises:
determining K +1 normals, each normal niIntersects the text centerline at point ciAt the same time as the text center line at point ciThe tangent of (A) is vertical;
at each normal niIn the above, a length h is determinedjLine segment l ofiLine segment liQuilt point ciHalving, regarding the end points of all the line segments as the vertexes of the polygon, and connecting all the vertexes in sequence to obtain a candidate bounding box Bj;
By determining N different variables hjTo obtain N candidate bounding boxes.
4. The method for semi-automatic annotation of image text detection according to claim 1, wherein the identifying of the estimated text content from the N candidate text regions by the relaxed identifier specifically comprises:
the N candidate text regions are identified through a loose identifier to obtain N identification results { T }jI j ═ 1, 2,.. N }, and the recognition resultIs a matrix of L C in shape;
calculating adjacent recognition results TjAnd Tj-1Difference d ofjFrom the difference djMinimum recognition resultTo obtain an estimated text content T*Wherein
The slave difference djMinimum recognition resultTo obtain an estimated text content T*The following formula:
5. The image text detection semi-automatic labeling method of any one of claims 1-4, characterized in that the structure of the relaxed recognizer is a convolutional neural network-based image text recognizer, comprising a corrector, a first encoder, a first sequence model and a first decoder;
the corrector for correcting the input image region RjThe text image shape of (2);
the first encoder is used for extracting the characteristics of the corrected text image;
the first sequence model is used for extracting context dependent features;
the first decoder is used for translating the context dependent features and outputting a recognition result Tj;
The loose text recognizer is trained by using a loose text region of the synthesized image, wherein the loose text region is a region in the regional image, except for text, and a proper amount of background interference is introduced.
6. The image text detection semi-automatic labeling method of any one of claims 1-4, characterized in that the structure of the rigid recognizer is a convolutional neural network-based image text recognizer, which comprises a second encoder, a second sequence model and a second decoder;
the second encoder is used for encoding the input image region RjCarrying out feature extraction;
the second sequence model is used for extracting context dependent features;
the second decoder is used for decoding the context dependent characteristic and outputting a recognition result sj;
The rigid recognizer is trained using a compact text region of the synthetic image, the compact text region being an image region within the regional image that is free of background interference other than text.
7. The image text detection semi-automatic labeling method according to any one of claims 1 to 4, characterized in that the labeling of the text box is optimized by taking loss recognition as a guide, as follows:
8. A semi-automatic annotation system for image text detection, the system comprising:
the text image acquisition module is used for acquiring a text image;
the text center line acquisition module is used for acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
the candidate bounding box generating module is used for generating N candidate bounding boxes surrounding the text center line, wherein each candidate bounding box is a polygonal area outline and encloses N candidate text areas;
the identification module is used for simultaneously inputting the N candidate text regions into the loose recognizer and the strict recognizer, obtaining estimated text contents by identifying the N candidate text regions through the loose recognizer, and predicting the content identification result of each candidate text region through the strict recognizer;
the identification loss calculation module is used for comparing the N content identification results with the estimated text content, and respectively calculating identification loss to obtain N identification losses;
the text box label acquisition module is used for acquiring the index of the most accurate candidate boundary box by determining the index of the minimum loss in all the identification losses so as to obtain the final text box label;
and the optimization module is used for optimizing the text box label by taking the identification loss as a guide to finally obtain a compact text box label.
9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the image text detection semi-automatic labeling method of any one of claims 1 to 7.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the image text detection semi-automatic labeling method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110906651.7A CN113807336B (en) | 2021-08-09 | 2021-08-09 | Semi-automatic labeling method, system, computer equipment and medium for image text detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110906651.7A CN113807336B (en) | 2021-08-09 | 2021-08-09 | Semi-automatic labeling method, system, computer equipment and medium for image text detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113807336A true CN113807336A (en) | 2021-12-17 |
CN113807336B CN113807336B (en) | 2023-06-30 |
Family
ID=78942853
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110906651.7A Active CN113807336B (en) | 2021-08-09 | 2021-08-09 | Semi-automatic labeling method, system, computer equipment and medium for image text detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113807336B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549893A (en) * | 2018-04-04 | 2018-09-18 | 华中科技大学 | A kind of end-to-end recognition methods of the scene text of arbitrary shape |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN110147786A (en) * | 2019-04-11 | 2019-08-20 | 北京百度网讯科技有限公司 | For text filed method, apparatus, equipment and the medium in detection image |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
CN110929665A (en) * | 2019-11-29 | 2020-03-27 | 河海大学 | Natural scene curve text detection method |
CN111898411A (en) * | 2020-06-16 | 2020-11-06 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
-
2021
- 2021-08-09 CN CN202110906651.7A patent/CN113807336B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549893A (en) * | 2018-04-04 | 2018-09-18 | 华中科技大学 | A kind of end-to-end recognition methods of the scene text of arbitrary shape |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN110147786A (en) * | 2019-04-11 | 2019-08-20 | 北京百度网讯科技有限公司 | For text filed method, apparatus, equipment and the medium in detection image |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
CN110929665A (en) * | 2019-11-29 | 2020-03-27 | 河海大学 | Natural scene curve text detection method |
CN111898411A (en) * | 2020-06-16 | 2020-11-06 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113807336B (en) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175527B (en) | Pedestrian re-identification method and device, computer equipment and readable medium | |
US20190272438A1 (en) | Method and apparatus for detecting text | |
Rahul et al. | Automatic information extraction from piping and instrumentation diagrams | |
CN116956929B (en) | Multi-feature fusion named entity recognition method and device for bridge management text data | |
CN111680753A (en) | Data labeling method and device, electronic equipment and storage medium | |
Shrivastava et al. | Deep learning model for text recognition in images | |
US11948078B2 (en) | Joint representation learning from images and text | |
CN115544303A (en) | Method, apparatus, device and medium for determining label of video | |
CN111985209A (en) | Text sentence recognition method, device, equipment and storage medium combining RPA and AI | |
CN112052819A (en) | Pedestrian re-identification method, device, equipment and storage medium | |
CN116954113B (en) | Intelligent robot driving sensing intelligent control system and method thereof | |
CN113780040A (en) | Lip key point positioning method and device, storage medium and electronic equipment | |
CN110287970B (en) | Weak supervision object positioning method based on CAM and covering | |
CN117152504A (en) | Space correlation guided prototype distillation small sample classification method | |
CN111914822A (en) | Text image labeling method and device, computer readable storage medium and equipment | |
CN113807336A (en) | Semi-automatic labeling method, system, computer equipment and medium for image text detection | |
CN115205649A (en) | Convolution neural network remote sensing target matching method based on fusion local features | |
CN111310442B (en) | Method for mining shape-word error correction corpus, error correction method, device and storage medium | |
CN113657364A (en) | Method, device, equipment and storage medium for recognizing character mark | |
CN108021918B (en) | Character recognition method and device | |
CN112487811A (en) | Cascading information extraction system and method based on reinforcement learning | |
CN116543389B (en) | Character recognition method, device, equipment and medium based on relational network | |
CN113673336B (en) | Character cutting method, system and medium based on alignment CTC | |
CN117076596B (en) | Data storage method, device and server applying artificial intelligence | |
US11227186B2 (en) | Method and device for training image recognition model and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |