CN113807336A - Semi-automatic labeling method, system, computer equipment and medium for image text detection - Google Patents

Semi-automatic labeling method, system, computer equipment and medium for image text detection Download PDF

Info

Publication number
CN113807336A
CN113807336A CN202110906651.7A CN202110906651A CN113807336A CN 113807336 A CN113807336 A CN 113807336A CN 202110906651 A CN202110906651 A CN 202110906651A CN 113807336 A CN113807336 A CN 113807336A
Authority
CN
China
Prior art keywords
text
image
candidate
recognizer
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110906651.7A
Other languages
Chinese (zh)
Other versions
CN113807336B (en
Inventor
黄双萍
刘宗昊
王庆丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou, South China University of Technology SCUT filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Priority to CN202110906651.7A priority Critical patent/CN113807336B/en
Publication of CN113807336A publication Critical patent/CN113807336A/en
Application granted granted Critical
Publication of CN113807336B publication Critical patent/CN113807336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a semi-automatic labeling method, a system, computer equipment and a medium for image text detection, wherein the method comprises the following steps: acquiring a text image; acquiring a text center line from a text image; generating N candidate bounding boxes around a text centerline; inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer; comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses; obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label; and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label. The invention can improve the text detection labeling efficiency and labeling effect.

Description

Semi-automatic labeling method, system, computer equipment and medium for image text detection
Technical Field
The invention relates to a semi-automatic labeling method, a semi-automatic labeling system, computer equipment and a storage medium for image text detection, and belongs to the technical field of artificial intelligence and OCR.
Background
With the development of artificial intelligence technology, text detection technology has been greatly developed as a fundamental computer vision task. Text detection refers to the positioning of a text area in an image, and the technology can be widely applied to the industries of unmanned driving, robot navigation, blind person assistance and the like. With the great data-driven machine learning algorithm such as deep learning, great success is achieved in the fields of natural language processing, computer vision, voice recognition and the like, the image text detection technology based on deep learning is greatly developed, and the performance of the image text detection algorithm is remarkably improved. However, these methods rely on a large amount of detection annotation data.
To date, the acquisition of the detection annotation data has been largely manual, time consuming, labor intensive, and expensive. Especially for irregular text region positions, more points are usually needed for labeling, the efficiency is extremely low, and the precision is low due to human subjectivity. Therefore, a semi-automatic or automatic labeling algorithm is required to replace a manual labeling method, so that the labeling efficiency and accuracy are improved. Automatic algorithms have so far been implemented using detection algorithms, so-called pre-labeling. Because the performance of the detection algorithm is limited, the automatic algorithm pre-labeled by the detection algorithm cannot generate high-quality labels, more manual checks are still needed to obtain truly usable labels, and the difficult problem of image text detection labeling is not substantially solved.
Disclosure of Invention
In view of the above, the present invention provides a semi-automatic labeling method, system, computer device and storage medium for image text detection, which can improve the text detection labeling efficiency and labeling effect.
The invention aims to provide a semi-automatic image text detection labeling method.
The invention also provides a semi-automatic image text detection labeling system.
It is a third object of the invention to provide a computer apparatus.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a semi-automatic labeling method for image text detection, the method comprising:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
Further, the generating N candidate bounding boxes around the text center line specifically includes:
determining K +1 normals, each normal niIntersects the text centerline at point ciAt the same time as the text center line at point ciThe tangent of (A) is vertical;
at each normal niIn the above, a length h is determinedjLine segment l ofiLine segment liQuilt point ciHalving, regarding the end points of all the line segments as the vertexes of the polygon, and connecting all the vertexes in sequence to obtain a candidate bounding box Bj
By determining N differencesVariable h ofjTo obtain N candidate bounding boxes.
Further, the variable hjIs determined as follows:
Figure BDA0003201922830000021
wherein j is 1, 2.
Further, the identifying, by the loose recognizer, the estimated text content from the N candidate text regions specifically includes:
the N candidate text regions are identified through a loose identifier to obtain N identification results { T }jI j ═ 1, 2,.. N }, and the recognition result
Figure BDA0003201922830000022
Is a matrix of L C in shape;
calculating adjacent recognition results TjAnd Tj-1Difference d ofjFrom the difference djMinimum recognition result Tj*To obtain an estimated text content T*Wherein
Figure BDA0003201922830000023
The slave difference djMinimum recognition result
Figure BDA0003201922830000031
To obtain an estimated text content T*The following formula:
Figure BDA0003201922830000032
Figure BDA0003201922830000033
wherein component (a)
Figure BDA0003201922830000034
Represents the recognition result TjProbability that the u-th character belongs to the v-th class.
Further, the structure of the loose recognizer is an image text recognizer based on a convolutional neural network, and the image text recognizer comprises a corrector, a first encoder, a first sequence model and a first decoder;
the corrector for correcting the input image region RjThe text image shape of (2);
the first encoder is used for extracting the characteristics of the corrected text image;
the first sequence model is used for extracting context dependent features;
the first decoder is used for translating the context dependent features and outputting a recognition result Tj
The loose text recognizer is trained by using a loose text region of the synthesized image, wherein the loose text region is a region in the regional image, except for text, and a proper amount of background interference is introduced.
Further, the structure of the rigid recognizer is an image text recognizer based on a convolutional neural network, and the rigid recognizer comprises a second encoder, a second sequence model and a second decoder;
the second encoder is used for encoding the input image region RjCarrying out feature extraction;
the second sequence model is used for extracting context dependent features;
the second decoder is used for decoding the context dependent characteristic and outputting a recognition result sj
The rigid recognizer is trained using a compact text region of the synthetic image, the compact text region being an image region within the regional image that is free of background interference other than text.
Further, the labeling of the text box to identify the loss as a guide is optimized as follows:
Figure BDA0003201922830000035
wherein the content of the first and second substances,
Figure BDA0003201922830000036
indicating a loss of recognition
Figure BDA0003201922830000037
To pair
Figure BDA0003201922830000038
And μ represents the update step.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a semi-automatic annotation system for image text detection, the system comprising:
the text image acquisition module is used for acquiring a text image;
the text center line acquisition module is used for acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
the candidate bounding box generating module is used for generating N candidate bounding boxes surrounding the text center line, wherein each candidate bounding box is a polygonal area outline and encloses N candidate text areas;
the identification module is used for simultaneously inputting the N candidate text regions into the loose recognizer and the strict recognizer, obtaining estimated text contents by identifying the N candidate text regions through the loose recognizer, and predicting the content identification result of each candidate text region through the strict recognizer;
the identification loss calculation module is used for comparing the N content identification results with the estimated text content, and respectively calculating identification loss to obtain N identification losses;
the text box label acquisition module is used for acquiring the index of the most accurate candidate boundary box by determining the index of the minimum loss in all the identification losses so as to obtain the final text box label;
and the optimization module is used for optimizing the text box label by taking the identification loss as a guide to finally obtain a compact text box label.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a computer device comprises a processor and a memory for storing a program executable by the processor, wherein when the processor executes the program stored in the memory, the semi-automatic labeling method for detecting the image text is realized.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program, and when the program is executed by a processor, the semi-automatic labeling method for image text detection is realized.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of obtaining a text center line from a text image, generating a candidate boundary box surrounding the text center line, inputting the candidate boundary box into a loose recognizer and a strict recognizer, recognizing and obtaining estimated text contents through the loose recognizer, predicting a content recognition result through the strict recognizer, further calculating recognition loss, obtaining an index of the most accurate candidate boundary box through determining an index with the minimum loss in all recognition losses, further obtaining final text box labeling, optimizing the text box labeling by taking the recognition loss as a guide, obtaining compact text box labeling, achieving semi-automatic labeling, enabling the semi-automatic labeling to be between manual labeling and automatic labeling algorithms, and considering both labeling efficiency and labeling effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a simple flowchart of a semi-automatic labeling method for image text detection in embodiment 1 of the present invention.
Fig. 2 is a specific flowchart of the image text detection semi-automatic labeling method according to embodiment 1 of the present invention.
Fig. 3 is a schematic flowchart of candidate boundary generation in the image text detection semi-automatic labeling method according to embodiment 1 of the present invention.
Fig. 4 is a schematic flowchart of compact boundary estimation in the image text detection semi-automatic labeling method according to embodiment 1 of the present invention.
Fig. 5 is a block diagram of a semi-automatic image text detection annotation system according to embodiment 2 of the present invention.
Fig. 6 is a block diagram of a computer device according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1 and fig. 2, the present embodiment provides a semi-automatic labeling method for image text detection, which includes the following steps:
s201, acquiring a text image.
The text image of the embodiment is a scene text image, and can be acquired by collection, for example, by shooting the scene text image with a camera, or by searching from a database, for example, by storing the scene text image in the database in advance, and by searching the scene text image from the database, the text image can be acquired, and the acquired text image is used as an input.
S202, acquiring a text center line from the text image.
In this embodiment, the text center line is a curved fold line passing through the center of the textThe linear array is formed by sequentially connecting K +1 points to form K linear segments, wherein the K +1 points are marked as { ciI 1, 2., K +1}, with the text centerline as input.
S203, generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas.
In this embodiment, the N candidate bounding boxes are denoted as { B }j1, 2.,. N }, where N candidate text regions are denoted as { R |j|j=1,2,...,N},N=30。
With reference to fig. 3, the step S203 is a step of candidate bounding boxes, which specifically includes:
s2031, determining K +1 normals, each normal niIntersects the text centerline at point ciAt the same time as the text center line at point ciThe tangent of (a) is vertical.
S2032, at each normal niIn the above, a length h is determinedjLine segment l ofiLine segment liQuilt point ciHalving, regarding the end points of all the line segments as the vertexes of the polygon, and connecting all the vertexes in sequence to obtain a candidate bounding box Bj
S2033, determining N different variables hjTo obtain N candidate bounding boxes.
In this embodiment, the variable h is differentjDifferent polygon bounding boxes may be determined, candidate N different hjN candidate bounding boxes, variable h, may be obtainedjIs determined as follows:
Figure BDA0003201922830000061
wherein j is 1, 2.
The following steps S204 to S206 are semantic boundary decision steps:
s204, inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer.
In this embodiment, identifying and obtaining the estimated text content from the N candidate text regions by using the loose identifier specifically includes:
1) for N candidate text regions R by a relaxed recognizerjIdentifying the 1, 2, N to obtain N identification results { T }jI j ═ 1, 2,.. N }, and the recognition result
Figure BDA0003201922830000062
Is a matrix of L x C shape.
2) Calculating adjacent recognition results TjAnd Tj-1Difference d ofjFrom the difference djMinimum recognition result
Figure BDA0003201922830000063
To obtain an estimated text content T*Wherein
Figure BDA0003201922830000064
Calculating adjacent recognition results TjAnd Tj-1Difference d ofjIncluding but not limited to cross entropy loss function, CTC (connectionist Temporal Classification) loss function, and edit distance, from the difference djMinimum recognition result
Figure BDA0003201922830000065
To obtain an estimated text content T*The following formula:
Figure BDA0003201922830000071
Figure BDA0003201922830000072
wherein component (a)
Figure BDA0003201922830000073
Represents the recognition result TjProbability that the u-th character belongs to the v-th class.
The structure of the relaxed recognizer of the embodiment is an image text recognizer based on a convolutional neural network, and comprises a corrector, a first encoder, a first sequence model and a first decoder, and the descriptions of the parts are as follows:
corrector for correcting an input image region RjThe text image shape of (2);
the first encoder is used for extracting the characteristics of the corrected text image;
a first sequence model for extracting context dependent features;
a first decoder for translating the context dependent features and outputting a recognition result Tj
The loose recognizer is trained by using a loose text region of the synthesized image, wherein the loose text region is a region in the regional image, except for the text, and a proper amount of background interference is introduced.
The structure of the rigid recognizer of this embodiment is an image text recognizer based on a convolutional neural network, and includes a second encoder, a second sequence model and a second decoder, and the descriptions of the respective parts are as follows:
a second encoder for encoding the input image region RjCarrying out feature extraction;
a second sequence model for extracting context dependent features;
a second decoder for decoding the context dependent feature and outputting a recognition result sj
The rigid recognizer is trained using the compacted text regions of the synthetic image, which are regions of the image within the regional image that, except for text, do not have background interference.
S205, comparing the N content recognition results with the estimated text content, respectively calculating recognition losses, and obtaining N recognition losses.
In this embodiment, the recognition loss is given as { l }j1, 2.., N }, calculating a recognition penalty including, but not limited to, crossingAn entropy loss function, a ctc (connectionist Temporal classification) loss function, and an edit distance.
S206, obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label.
And S207, marking the text box, optimizing by taking the recognition loss as a guide, and finally obtaining a compact text box mark.
Referring to fig. 4, the step S107 is a tight boundary estimation step, and the text box is labeled to identify the loss as a guide for optimization, as follows:
Figure BDA0003201922830000081
wherein the content of the first and second substances,
Figure BDA0003201922830000082
indicating a loss of recognition
Figure BDA0003201922830000083
To pair
Figure BDA0003201922830000084
And μ represents the update step.
In the above embodiment, the strict identifier performs training using the compact text region of the synthesized image, and is sensitive to the text image background region, so that the obtained text detection box is compact and has high accuracy.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 5, the present embodiment provides a semi-automatic labeling system for image text detection, which includes a text image obtaining module 501, a text centerline obtaining module 502, a candidate bounding box generating module 503, a recognition module 504, a recognition loss calculating module 505, a text box labeling obtaining module 506, and an optimizing module 507, where the specific functions of each module are as follows:
a text image obtaining module 501, configured to obtain a text image.
The text center line obtaining module 502 is configured to obtain a text center line from the text image, where the text center line is a curved broken line that runs through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments.
The candidate bounding box generating module 503 is configured to generate N candidate bounding boxes around a text centerline, where each candidate bounding box is a polygon region outline and encloses N candidate text regions.
The recognition module 504 is configured to input the N candidate text regions into the relaxed recognizer and the rigid recognizer at the same time, recognize the estimated text content from the N candidate text regions by the relaxed recognizer, and predict the content recognition result of each candidate text region by the rigid recognizer.
And the recognition loss calculating module 505 is configured to compare the N content recognition results with the estimated text content, and calculate recognition losses respectively to obtain N recognition losses.
The text box label obtaining module 506 is configured to obtain an index of the most accurate candidate bounding box by determining an index of the smallest loss among all the identification losses, and further obtain a final text box label.
And the optimizing module 507 is configured to optimize the text box label by using the recognition loss as a guide, so as to finally obtain a compact text box label.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional units, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, as shown in fig. 6, and includes a system bus 601, a connected processor 602, a memory, an input device 603, a display device 604, and a network interface 605, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium 606 and an internal memory 607, the nonvolatile storage medium 606 stores an operating system, a computer program, and a database, the internal memory 607 provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor 602 executes the computer program stored in the memory, the image text detection semi-automatic labeling method of embodiment 1 is implemented as follows:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for semi-automatically labeling image text detection in embodiment 1 is implemented as follows:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In summary, the invention obtains a text center line from a text image, generates a candidate bounding box around the text center line, inputs the candidate bounding box into a loose recognizer and a strict recognizer, obtains estimated text content through recognition of the loose recognizer, predicts a content recognition result through the strict recognizer, further calculates recognition loss, obtains an index of the most accurate candidate bounding box through determining an index of the minimum loss in all recognition losses, further obtains a final text box label, optimizes the text box label by taking the recognition loss as a guide, obtains a compact text box label, realizes semi-automatic label, and can give consideration to both label efficiency and label effect through the intermediate between a manual label and an automatic label algorithm.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (10)

1. A semi-automatic labeling method for image text detection is characterized by comprising the following steps:
acquiring a text image;
acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
generating N candidate boundary boxes surrounding the text center line, wherein each candidate boundary box is a polygonal area outline and surrounds N candidate text areas;
inputting the N candidate text regions into a loose recognizer and a strict recognizer simultaneously, recognizing the N candidate text regions through the loose recognizer to obtain estimated text contents, and predicting the content recognition result of each candidate text region through the strict recognizer;
comparing the N content identification results with the estimated text content, and respectively calculating identification losses to obtain N identification losses;
obtaining the index of the most accurate candidate bounding box by determining the index of the minimum loss in all the identification losses, and further obtaining the final text box label;
and optimizing the text box label by taking the recognition loss as a guide, and finally obtaining a compact text box label.
2. The image text detection semi-automatic labeling method according to claim 1, wherein the generating N candidate bounding boxes around the text center line specifically comprises:
determining K +1 normals, each normal niIntersects the text centerline at point ciAt the same time as the text center line at point ciThe tangent of (A) is vertical;
at each normal niIn the above, a length h is determinedjLine segment l ofiLine segment liQuilt point ciHalving, regarding the end points of all the line segments as the vertexes of the polygon, and connecting all the vertexes in sequence to obtain a candidate bounding box Bj
By determining N different variables hjTo obtain N candidate bounding boxes.
3. The image text detection semi-automatic labeling method of claim 2, characterized in that the variable hjIs determined as follows:
Figure FDA0003201922820000011
wherein j is 1, 2.
4. The method for semi-automatic annotation of image text detection according to claim 1, wherein the identifying of the estimated text content from the N candidate text regions by the relaxed identifier specifically comprises:
the N candidate text regions are identified through a loose identifier to obtain N identification results { T }jI j ═ 1, 2,.. N }, and the recognition result
Figure FDA0003201922820000021
Is a matrix of L C in shape;
calculating adjacent recognition results TjAnd Tj-1Difference d ofjFrom the difference djMinimum recognition result
Figure FDA0003201922820000022
To obtain an estimated text content T*Wherein
Figure FDA0003201922820000023
The slave difference djMinimum recognition result
Figure FDA0003201922820000024
To obtain an estimated text content T*The following formula:
Figure FDA0003201922820000025
Figure FDA0003201922820000026
wherein component (a)
Figure FDA0003201922820000027
Represents the recognition result TjProbability that the u-th character belongs to the v-th class.
5. The image text detection semi-automatic labeling method of any one of claims 1-4, characterized in that the structure of the relaxed recognizer is a convolutional neural network-based image text recognizer, comprising a corrector, a first encoder, a first sequence model and a first decoder;
the corrector for correcting the input image region RjThe text image shape of (2);
the first encoder is used for extracting the characteristics of the corrected text image;
the first sequence model is used for extracting context dependent features;
the first decoder is used for translating the context dependent features and outputting a recognition result Tj
The loose text recognizer is trained by using a loose text region of the synthesized image, wherein the loose text region is a region in the regional image, except for text, and a proper amount of background interference is introduced.
6. The image text detection semi-automatic labeling method of any one of claims 1-4, characterized in that the structure of the rigid recognizer is a convolutional neural network-based image text recognizer, which comprises a second encoder, a second sequence model and a second decoder;
the second encoder is used for encoding the input image region RjCarrying out feature extraction;
the second sequence model is used for extracting context dependent features;
the second decoder is used for decoding the context dependent characteristic and outputting a recognition result sj
The rigid recognizer is trained using a compact text region of the synthetic image, the compact text region being an image region within the regional image that is free of background interference other than text.
7. The image text detection semi-automatic labeling method according to any one of claims 1 to 4, characterized in that the labeling of the text box is optimized by taking loss recognition as a guide, as follows:
Figure FDA0003201922820000031
wherein the content of the first and second substances,
Figure FDA0003201922820000032
indicating a loss of recognition
Figure FDA0003201922820000033
To pair
Figure FDA0003201922820000034
And μ represents the update step.
8. A semi-automatic annotation system for image text detection, the system comprising:
the text image acquisition module is used for acquiring a text image;
the text center line acquisition module is used for acquiring a text center line from a text image, wherein the text center line is a bent broken line penetrating through the center of the text and is formed by sequentially connecting K +1 points to form K straight line segments;
the candidate bounding box generating module is used for generating N candidate bounding boxes surrounding the text center line, wherein each candidate bounding box is a polygonal area outline and encloses N candidate text areas;
the identification module is used for simultaneously inputting the N candidate text regions into the loose recognizer and the strict recognizer, obtaining estimated text contents by identifying the N candidate text regions through the loose recognizer, and predicting the content identification result of each candidate text region through the strict recognizer;
the identification loss calculation module is used for comparing the N content identification results with the estimated text content, and respectively calculating identification loss to obtain N identification losses;
the text box label acquisition module is used for acquiring the index of the most accurate candidate boundary box by determining the index of the minimum loss in all the identification losses so as to obtain the final text box label;
and the optimization module is used for optimizing the text box label by taking the identification loss as a guide to finally obtain a compact text box label.
9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the image text detection semi-automatic labeling method of any one of claims 1 to 7.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the image text detection semi-automatic labeling method of any one of claims 1 to 7.
CN202110906651.7A 2021-08-09 2021-08-09 Semi-automatic labeling method, system, computer equipment and medium for image text detection Active CN113807336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110906651.7A CN113807336B (en) 2021-08-09 2021-08-09 Semi-automatic labeling method, system, computer equipment and medium for image text detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110906651.7A CN113807336B (en) 2021-08-09 2021-08-09 Semi-automatic labeling method, system, computer equipment and medium for image text detection

Publications (2)

Publication Number Publication Date
CN113807336A true CN113807336A (en) 2021-12-17
CN113807336B CN113807336B (en) 2023-06-30

Family

ID=78942853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110906651.7A Active CN113807336B (en) 2021-08-09 2021-08-09 Semi-automatic labeling method, system, computer equipment and medium for image text detection

Country Status (1)

Country Link
CN (1) CN113807336B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN110147786A (en) * 2019-04-11 2019-08-20 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN110929665A (en) * 2019-11-29 2020-03-27 河海大学 Natural scene curve text detection method
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN110147786A (en) * 2019-04-11 2019-08-20 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN110929665A (en) * 2019-11-29 2020-03-27 河海大学 Natural scene curve text detection method
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium

Also Published As

Publication number Publication date
CN113807336B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN110175527B (en) Pedestrian re-identification method and device, computer equipment and readable medium
US20190272438A1 (en) Method and apparatus for detecting text
Rahul et al. Automatic information extraction from piping and instrumentation diagrams
CN116956929B (en) Multi-feature fusion named entity recognition method and device for bridge management text data
CN111680753A (en) Data labeling method and device, electronic equipment and storage medium
Shrivastava et al. Deep learning model for text recognition in images
US11948078B2 (en) Joint representation learning from images and text
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN111985209A (en) Text sentence recognition method, device, equipment and storage medium combining RPA and AI
CN112052819A (en) Pedestrian re-identification method, device, equipment and storage medium
CN116954113B (en) Intelligent robot driving sensing intelligent control system and method thereof
CN113780040A (en) Lip key point positioning method and device, storage medium and electronic equipment
CN110287970B (en) Weak supervision object positioning method based on CAM and covering
CN117152504A (en) Space correlation guided prototype distillation small sample classification method
CN111914822A (en) Text image labeling method and device, computer readable storage medium and equipment
CN113807336A (en) Semi-automatic labeling method, system, computer equipment and medium for image text detection
CN115205649A (en) Convolution neural network remote sensing target matching method based on fusion local features
CN111310442B (en) Method for mining shape-word error correction corpus, error correction method, device and storage medium
CN113657364A (en) Method, device, equipment and storage medium for recognizing character mark
CN108021918B (en) Character recognition method and device
CN112487811A (en) Cascading information extraction system and method based on reinforcement learning
CN116543389B (en) Character recognition method, device, equipment and medium based on relational network
CN113673336B (en) Character cutting method, system and medium based on alignment CTC
CN117076596B (en) Data storage method, device and server applying artificial intelligence
US11227186B2 (en) Method and device for training image recognition model and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant