CN111353458A - Text box marking method and device and storage medium - Google Patents

Text box marking method and device and storage medium Download PDF

Info

Publication number
CN111353458A
CN111353458A CN202010161011.3A CN202010161011A CN111353458A CN 111353458 A CN111353458 A CN 111353458A CN 202010161011 A CN202010161011 A CN 202010161011A CN 111353458 A CN111353458 A CN 111353458A
Authority
CN
China
Prior art keywords
text
text box
image
boxes
position information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010161011.3A
Other languages
Chinese (zh)
Other versions
CN111353458B (en
Inventor
彭梅英
鲁四喜
农高明
唐嘉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010161011.3A priority Critical patent/CN111353458B/en
Publication of CN111353458A publication Critical patent/CN111353458A/en
Application granted granted Critical
Publication of CN111353458B publication Critical patent/CN111353458B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a text box annotation method, a text box annotation device and a storage medium, and belongs to the technical field of information. The method comprises the following steps: acquiring position information of a plurality of text boxes in an image; determining attribute names of the text boxes according to the position information of the text boxes; and using the position information and the attribute names of the plurality of text boxes as text box mark information of the image. The method and the device for marking the text box automatically acquire the text box marking information of the image, so that marking efficiency can be improved, marking time is reduced, subjective errors caused by manual marking are avoided, and yield and quality of text box marking work can be improved.

Description

Text box marking method and device and storage medium
Technical Field
The present application relates to the field of information technology, and in particular, to a method and an apparatus for annotating a text box, and a storage medium.
Background
OCR (Optical Character Recognition) refers to a technique of converting characters of bills, newspapers, books, documents, certificates, and other printed matters into image information by an Optical input method such as scanning, and then converting the image information into usable computer input by using a Character Recognition technique.
OCR may be implemented by means of deep learning. Specifically, an image to be recognized is provided for a text box detection model based on deep learning to obtain text box position information in the image, then the image is cut according to the text box position information to obtain an image block to be recognized, the image block is provided for a text content recognition model to obtain text content in the image block.
The performance of the text box detection model greatly affects the recognition accuracy of the OCR, so that parameter tuning and verification tests for the text box detection model are very important, and a large amount of text box marking information is needed in the process. At present, the text boxes are marked manually by technicians, the workload is large, the time consumption is very long, and the parameter optimization and verification test of the text box detection model are not facilitated.
Disclosure of Invention
The application provides a text box labeling method, a text box labeling device and a storage medium, which can improve the yield and quality of text box labeling work.
In one aspect, a method for annotating a text box is provided, and the method includes:
acquiring position information of a plurality of text boxes in an image;
determining attribute names of the text boxes according to the position information of the text boxes;
and using the position information and the attribute names of the plurality of text boxes as text box mark information of the image.
In one aspect, an apparatus for labeling a text box is provided, the apparatus comprising:
the acquisition module is used for acquiring the position information of a plurality of text boxes in the image;
the determining module is used for determining the attribute names of the text boxes according to the position information of the text boxes;
and the marking module is used for taking the position information and the attribute names of the text boxes as the text box marking information of the image.
In one aspect, a text box labeling apparatus is provided, where the text box labeling apparatus includes a processor and a memory, and the memory is used to store a program that supports the text box labeling apparatus to execute the text box labeling method, and to store data related to implementing the text box labeling method. The processor is configured to execute programs stored in the memory. The text box labeling apparatus may further include a communication bus for establishing a connection between the processor and the memory.
In one aspect, a computer-readable storage medium is provided, which stores instructions that, when executed by a processor, implement the steps of the text box annotation method described above.
In one aspect, a computer program product containing instructions is provided which, when run on a computer, causes the computer to perform the above-described text box annotation method.
The technical scheme provided by the application can at least bring the following beneficial effects:
after the position information of the plurality of text boxes in the image is acquired, the attribute names of the plurality of text boxes are determined according to the position information of the plurality of text boxes, so that the speed and the accuracy of acquiring the attribute names of the text boxes are improved. And finally, using the position information and the attribute names of the plurality of text boxes as text box mark information of the image. Therefore, the text box labeling information of the image is automatically acquired, so that the labeling efficiency can be improved, the labeling time is reduced, subjective errors caused by manual labeling are avoided, and the yield and the quality of the text box labeling work can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an ID card image provided by an embodiment of the present application;
fig. 2 is a flowchart of a text box annotation method provided in an embodiment of the present application;
FIG. 3 is a flowchart of another method for annotating a text box according to an embodiment of the present application;
FIG. 4 is a diagram illustrating a method for annotating a text box according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a textbox labeling apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a server provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be understood that reference to "a plurality" in this application means two or more. In the description of the present application, "/" means "or" unless otherwise stated, for example, a/B may mean a or B; "and/or" herein is only an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, for the convenience of clearly describing the technical solutions of the present application, the terms "first", "second", and the like are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
Before explaining the embodiments of the present application in detail, an application scenario of the embodiments of the present application will be described.
OCR is a technique of converting the text of bills, newspapers, books, documents, certificates, and other printed matters into image information by means of optical input means such as scanning, and then converting the image information into usable computer input by means of a text recognition technique. In the field of OCR, text detection is a precondition for scene text recognition, and the problem to be solved is how to accurately position the position of a text box matched with characters in a disordered and strange complex scene.
OCR generally includes the steps of image preprocessing, text box detection, character recognition, and the like. The text box detection is to detect the position range and layout of characters, including layout analysis, character line detection, and the like, and then to send the detected image blocks corresponding to the text box containing the characters to a character recognition model for character recognition. Therefore, the recognition accuracy of the OCR is greatly influenced by the accuracy of the text box detection, and the parameter tuning and verification test aiming at the text box detection algorithm is particularly important.
However, in the process of parameter tuning and verification test of the text box detection algorithm, a large amount of text box marking information is needed to continuously optimize the text box detection algorithm, and the accuracy of the text box detection algorithm is promoted to be improved. Therefore, the text box labeling work in the text box detection algorithm is very important.
The following describes the related contents labeled in the text box:
the text box label mainly comprises a text box position label and a text box attribute name label. The position information of the text box is used to indicate the position of the text box in the image. The attribute name of the text box is used to indicate the meaning of the text content in the text box.
For example, the identity card image shown in fig. 1 includes a plurality of text contents. The text box marking information of the identity card image comprises position information and attribute names of text boxes corresponding to each text content in the plurality of text contents.
Fig. 2 is a flowchart of a text box annotation method according to an embodiment of the present application. Referring to fig. 2, the method includes:
step 201: position information of each of a plurality of text boxes in an image is acquired.
The image is an image to which a text box is to be labeled. The image may be an image taken of a subject in need of text recognition. For example, the image may be an image of a subject such as a certificate (including but not limited to an identification card, a port and australian pass, a driver's license, etc.), a ticket, etc.
Additionally, the text box label may include a text box location information label and a text box attribute name label. That is, the position information and the attribute name of the text box in the image may be used as the text box label information of the image. Wherein the position information of the text box is used for indicating the position of the text box in the image. The attribute name of the text box is used to indicate the meaning of the text content in the text box.
Further, the position information of the text box is determined in the image coordinate system. That is, the image coordinate system is a coordinate system for determining the position information of the text box. The text box is generally a quadrilateral box and includes four corner points, specifically, two upper corner points and two lower corner points, where the two upper corner points are an upper left corner point and an upper right corner point, and the two lower corner points are a lower left corner point and a lower right corner point. The position information of the text box may include positions of four corner points of the text box, and the positions of the four corner points of the text box may be expressed by coordinates of the four corner points of the text box in an image coordinate system.
Specifically, the image may be input into a text box detection model, position information of a plurality of text boxes in the image is obtained, and then the position information of the plurality of text boxes in the image is obtained according to the position information of the plurality of text boxes.
The text box detection model is used to identify a text region containing text content in the image, and output an edge position of the text region as text box position information. The textbox detection model can be a neural network model such as a convolutional neural network, a cyclic neural network, a deep neural network, and the like.
The text box detection model may be trained using a large number of image samples that include different text box annotation information. For example, a plurality of training samples may be determined in advance, for any one of the plurality of training samples, the sample data of the training sample is an image, and the sample label of the training sample is text box annotation information of the image. Then, the plurality of training samples may be used for model training, specifically, the sample data in the plurality of training samples may be used as input, the sample labels of the plurality of training samples may be used as expected output, and model training is performed to obtain the textbox detection model.
The operation of acquiring the position information of the plurality of text boxes in the image according to the position information of the plurality of text boxes may be: and directly using the position information of the plurality of text boxes as the position information of the plurality of text boxes in the image. Alternatively, a text box indicated by each of the plurality of text box position information may be displayed in the image; adjusting the position of one text box displayed in the image in response to the adjustment operation of the one text box; responding to the frame selection operation executed in the image, and displaying a frame selection box corresponding to the frame selection operation in the image as a text box; position information of all text boxes displayed in the image is acquired.
It should be noted that the adjusting operation may be an operation triggered by a technician to adjust the position of the text box, such as an operation of lengthening or compressing the text box. The technician can adjust the positions of the four corner points of the text box through the adjusting operation to adjust the overall position of the text box.
Additionally, the box operation may be a technician-triggered operation to add a new text box. The technician may directly frame out a new text box in the image through a frame selection operation.
In addition, in the embodiment of the application, after the text box detected by the text box detection model is displayed in the image, a technician may adjust the position of the displayed text box through an adjustment operation, and may add a new text box in the image through a frame selection operation, so that the technician may review and correct the text box detected by the text box detection model.
In this embodiment of the present application, in one mode, the position information of the text box output by the text box detection model may be directly used as the position information of the text box in the image, so that the speed of determining the position of the text box may be increased. In another mode, based on the position information of the text box output by the text box detection model, a technician performs review correction on the position information of the text box output by the text box detection model, and the position information of the text box after review correction is used as the position information of the text box in the image, so that the accuracy of the determined position of the text box can be improved.
Step 202: and determining attribute names of the text boxes according to the position information of the text boxes.
Specifically, the text boxes may be sorted according to the position information of the text boxes and a specified sorting rule to obtain the serial numbers of the text boxes; and acquiring the attribute name corresponding to the sequence number of the first text box from the corresponding relation between the sequence number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the plurality of text boxes.
It should be noted that after the plurality of text boxes are sorted, the plurality of text boxes have an order, and thus each text box in the plurality of text boxes also has a sequence number representing the order. The ordering of the plurality of text boxes may represent the positional relationship of the plurality of text boxes in the object to which they pertain. For example, the ordering of the plurality of text boxes is from top to bottom and from left to right in the object.
In addition, the specified sorting rule can be preset, which means that sorting is performed according to a certain rule. For example, the specified sort rule may be a rule that sorts by positional relationship from top to bottom and from left to right.
In general, an object contained in an image may be at 0 degrees or at a non-0 degree. At 0 degrees means that the pose of the object in the image is relatively positive, i.e. the edge line of the object is relatively parallel to the horizontal or vertical axis of the image coordinate system. Being at a non-0 degree means that the pose of the object in the image is relatively skewed, i.e. the edge line of the object is not parallel to both the horizontal and vertical axes of the image coordinate system. In the embodiment of the present application, whether to sort the plurality of text boxes directly according to the position information of each text box in the plurality of text boxes may be determined according to whether an object included in the image is at 0 degree.
The operation of sorting the plurality of text boxes according to the specified sorting rule according to the position information of the plurality of text boxes may include the following steps (1) to (3):
(1) and taking the text box with the largest length of the long edge as a second text box.
It should be noted that the long side of the text box is the side between two upper corner points or two lower corner points. The text box with the largest length of the long side in the plurality of text boxes is the longest text box in the plurality of text boxes. The one text box is taken as the second text box at the moment, so that the posture of the object to which the second text box belongs can be conveniently researched through the second text box in the follow-up process.
In addition, the length of the long side of the text box is the difference between the abscissas of its two upper corner points or two lower corner points. For example, the long side of the text box is the side between two upper corner points, and assuming that the positions of the upper left corner point, the upper right corner point, the lower left corner point and the lower right corner point of a certain text box are respectively represented by (x1, y1), (x2, y2), (x3, y3), (x4, y4), the length of the long side of the text box is the difference between x2 and x1 (i.e., x 2-x 1).
(2) And acquiring an angle of an included angle between a straight line where the long edge of the second text box is located and a horizontal axis of the image coordinate system as a target angle.
It should be noted that, since the second text box is the longest text box among all text boxes of the object to which the second text box belongs, the long side of the second text box tends to be consistent with the edge line of the object to which the second text box belongs. In this case, the angle of the angle between the straight line on which the long side of the second text box is located and the horizontal axis of the image coordinate system (i.e., the target angle) may represent the angle of the angle between the edge line of the object to which the second text box belongs and the horizontal axis of the image coordinate system, and thus the target angle may reflect the posture of the object in the image. That is, when the target angle is small, it indicates that the pose of the object in the image is relatively correct. When the target angle is larger, it indicates that the object is relatively skewed in the image.
(3) Rotating the plurality of text boxes by a target angle by taking one corner point of the second text box as a rotating point to obtain a plurality of corresponding target boxes; and sequencing the plurality of text boxes according to the specified sequencing rule according to the position information of the plurality of target boxes.
It should be noted that the target angle may represent an angle between an edge line of the object to which the second text box belongs and a horizontal axis of the image coordinate system. Therefore, after a corner point of the second text box is taken as a rotation point and each text box in the plurality of text boxes is rotated by a target angle to obtain a corresponding target box, the postures of the plurality of target boxes in the image are relatively correct. In this case, the plurality of text boxes corresponding to the plurality of target boxes one to one can be sorted according to the specified sorting rule according to the position information of the plurality of target boxes, so that the sorting accuracy of the plurality of text boxes can be improved.
The operation of sorting the text boxes according to the specified sorting rule according to the position information of the target boxes may be: sequencing the plurality of text boxes corresponding to the plurality of target boxes one by one according to the sequence of the vertical coordinates of the target corner points of the plurality of target boxes from small to large; if the vertical coordinates of the target corner points of the at least two target frames are the same, sorting the at least two text frames corresponding to the at least two target frames one by one according to the sequence of the horizontal coordinates of the target corner points of the at least two target frames from small to large.
It should be noted that the target corner point may be a fixed corner point in the quadrilateral frame. For example, one of the upper left corner point, the upper right corner point, the lower left corner point, and the lower right corner point may be defined in advance as the target corner point.
In this embodiment of the present application, the target frames may be preferentially sorted according to an order from small to large on a vertical coordinate and then from small to large on a horizontal coordinate, and then the order of the target frames is used as the order of the text frames corresponding to the target frames one to one. In this way, the plurality of text boxes are sorted in the order from top to bottom and from left to right.
It is noted that step (3) may be directly performed after the target angle is obtained in step (2). Or, after the target angle is obtained in step (2), it may be determined whether the target angle is greater than or equal to a specified angle; if the target angle is larger than or equal to the specified angle, executing the step (3); if the target angle is smaller than the designated angle, the plurality of text boxes can be sorted according to the sequence of the vertical coordinates of the target corner points of the plurality of text boxes from small to large; if the vertical coordinates of the target corner points of the at least two text boxes are the same, the at least two text boxes are sequenced according to the sequence that the horizontal coordinates of the target corner points of the at least two text boxes are from small to large.
It should be noted that the designated angle may be set in advance, and the designated angle may be set to be smaller. When the target angle is greater than or equal to the specified angle, it indicates that the object is at a non-0 degree, that is, the posture of the object in the image is relatively skewed, so that step (3) needs to be performed to rotate the text boxes to obtain a plurality of target boxes, and then the text boxes are sorted according to the target boxes, so that the sorting accuracy of the text boxes can be improved. When the target angle is smaller than the specified angle, the object is at 0 degree, namely the posture of the object in the image is more correct, so that the plurality of text boxes can be sorted directly according to the position information of the plurality of text boxes, and the sorting speed of the plurality of text boxes can be improved.
Generally, the meaning of the text content at each position in the object (such as identity card, hong Kong and Macau pass, driver's license, etc.) is relatively fixed. For example, for the identity card shown in fig. 1, the attribute names of the text boxes from top to bottom and from left to right may be: attribute name of the first text box: resident identification card, attribute name of the second text box: chinese name, attribute name of third text box: english name, attribute name of the fourth text box: number, attribute name of fifth text box: date of birth, attribute name of sixth text box: date 1, attribute name of seventh text box: gender, attribute name of eighth text box: issue date, attribute name of ninth textbox: date 2.
In this way, the correspondence between the sequence numbers representing the order of the text boxes and the corresponding attribute names can be set in advance for each object. For example, for the identity card shown in fig. 1, the correspondence between the set sequence number and the attribute name may be as shown in table 1 below:
TABLE 1
Serial number Attribute name
1 Resident identification card
2 Name of Chinese
3 Name of English
4 Number (I)
5 Date of birth
6 Date 1
7 Sex
8 Date of issuance
9 Date 2
Note that, in the embodiments of the present application, the correspondence between the serial numbers and the attribute names is described only by taking table 1 as an example, and table 1 does not limit the embodiments of the present application.
In the embodiment of the application, the attribute name of each text box in the plurality of text boxes can be accurately obtained according to the sequence number obtained after the plurality of text boxes are sequenced, so that the speed and the accuracy of obtaining the attribute name of the text box are improved.
Step 203: and using the position information and the attribute names of the plurality of text boxes as text box mark information of the image.
It should be noted that, for any one image in a large number of images, the text box annotation information of the image may be obtained according to the text box annotation method provided in the embodiment of the present application.
In addition, after the text box annotation information of the image is obtained, the image and the text box annotation information of the image can be used as training samples to perform parameter optimization on the text box detection algorithm, or the image and the text box annotation information of the image can be used as test samples to perform verification test on the text box detection algorithm.
In the embodiment of the application, after the position information of the plurality of text boxes in the image is acquired, the attribute names of the plurality of text boxes are determined according to the position information of the plurality of text boxes, so that the speed and the accuracy of acquiring the attribute names of the text boxes are improved. And finally, using the position information and the attribute names of the plurality of text boxes as text box mark information of the image. Therefore, the text box labeling information of the image is automatically acquired, so that the labeling efficiency can be improved, the labeling time is reduced, subjective errors caused by manual labeling are avoided, and the yield and the quality of the text box labeling work can be improved.
For ease of understanding, the text box labeling method provided in the embodiment of the present application is illustrated in the following with reference to fig. 3 and 4.
Referring to fig. 3, the text box annotation method may include steps 301 to 303 as follows.
Step 301: and automatically generating a basic annotation file containing the position information of the text box and the attribute name.
Inputting each image (such as a certificate photo and the like) in a plurality of images to be subjected to text box labeling into a text box detection model, and obtaining a plurality of text box position information in each image output by the text box detection model. In addition, a uniform attribute name (which may be a value customized in advance) may be set to all output text boxes. In this way, a base annotation file containing the text box location information and the text box attribute name is generated for each image. The format of the basic Markup file needs to support the loading of the Markup software, and the format of the basic Markup file may be defined in advance, such as json (JavaScript Object Notation), xml (Extensible Markup Language), and the like.
Step 302: and manually reviewing the position information of the corrected text box.
All the basic annotation files output in step 301 and the plurality of images are loaded using the annotation software to display the plurality of images, and a text box indicated by the position information of each text box is displayed in each image of the plurality of images. The technician performs review correction on the position information of the text box (the position information of the text box output by the text detection model is incorrect and needs to be corrected, and the position information of the text box output by the text detection model is correct and does not need to be corrected), for example, the position information of the text box can be corrected by adjusting the displayed text box. The technician may also add a new textbox, and may continue to set the attribute name of the added textbox as the attribute name that was uniformly set in step 301. And then, regenerating a new annotation file for each image, wherein the annotation file comprises the text box position information and the text box attribute name.
Step 303: automatically adding a modified text box attribute name.
For the annotation file of each image output in step 302, a specified sort rule is provided to sort the text boxes in each image, such as sorting from top to bottom and from left to right. And for any one of a plurality of text boxes in a certain image, acquiring an attribute name corresponding to the serial number of the text box as the attribute name of the text box, and modifying the originally set uniform attribute name into the acquired attribute name of each text box. In this way, the position information and the attribute name of each of the plurality of text boxes in each image are obtained as the text box annotation information. By using the text box mark information, parameter tuning and test verification can be performed on the text box detection algorithm.
The embodiment of fig. 3 will be described in detail with reference to fig. 4. Referring to fig. 4, the following three parts may be included:
a first part: software module for defining automatic generation of basic annotation file
Python loads an open source or an existing text box detection model, and the function of automatically generating the position information of the text box is realized aiming at large-batch image input. After generating the text box position information of each image, automatically adding an attribute name to each text box, wherein the attribute name can be uniformly set to be a value. The markup file containing the text box position information and the text box attribute name generated by the software module is called a basic markup file. Each image can generate a basic annotation file, and the format of the basic annotation file meets the loading requirement of the annotation software.
A second part: manually rechecking and correcting the position of the text box through labeling software
And loading all the basic annotation files output by the first part and the plurality of images by using annotation software to display the plurality of images, and displaying the text box indicated by the position information of each text box in each image in the plurality of images.
The technician performs review correction on the position information of the text box (the position information of the text box output by the text detection model is incorrect and needs to be corrected, and the position information of the text box output by the text detection model is correct and does not need to be corrected), for example, the position information of the text box can be corrected by adjusting the displayed text box.
Technicians can also add a new text box and can continuously set the attribute name of the new text box as the uniformly set attribute name in the first part.
After the loaded multiple images are reviewed and corrected, a new annotation file is regenerated for each image, wherein the annotation file comprises text box position information and text box attribute names. And finally, exporting the annotation files of all the images again.
And a third part: software module for defining automatic adding and changing text box attribute name
(1) Textbox attribute name definition
The meaning represented by each text box on an image containing a specific object can be known in advance, for example, the meaning represented by each text box can be known when the text boxes on an identity card are viewed from top to bottom and from left to right. Accordingly, a set of attribute names can be preset according to the sequence of the text boxes, that is, the corresponding relation between the sequence number and the attribute names is set.
(2) Textbox ordering algorithm implementation
Regarding a plurality of text boxes in each image, taking one text box with the largest length of a long edge in the plurality of text boxes as a second text box; acquiring an angle of an included angle between a straight line where a long edge of the second text box is located and a transverse axis of the image coordinate system as a target angle; taking one corner point of the second text box as a rotation point, and rotating each text box in the plurality of text boxes by a target angle to obtain a corresponding target box; and sequencing the plurality of text boxes according to the specified sequencing rule according to the position information of the plurality of target boxes.
(3) Automatic addition and modification of textbox attribute name implementation
And acquiring the attribute name corresponding to the serial number of the text box from the corresponding relation between the serial number and the attribute name as the attribute name of the text box, and modifying the originally set uniform attribute name into the acquired attribute name of each text box. In this way, the position information and the attribute name of each of the plurality of text boxes in each image are obtained as the text box label information of each image.
Fig. 5 is a schematic structural diagram of a text box labeling apparatus according to an embodiment of the present application. Referring to fig. 5, the apparatus includes:
an obtaining module 501, configured to obtain position information of multiple text boxes in an image;
a determining module 502, configured to determine attribute names of the text boxes according to the location information of the text boxes;
and the labeling module 503 is configured to use the position information and the attribute names of the multiple text boxes as text box labeling information of the image.
Optionally, the obtaining module 501 is configured to:
inputting the image into a text box detection model to obtain position information of a plurality of text boxes in the image;
and acquiring the position information of the plurality of text boxes in the image according to the position information of the plurality of text boxes.
Optionally, the obtaining module 501 is configured to:
displaying a text box indicated by the position information of the plurality of text boxes in the image;
adjusting the position of a text box in response to an adjusting operation of the text box displayed in the image;
responding to the frame selection operation executed in the image, and displaying a frame selection frame corresponding to the frame selection operation in the image as a text frame;
position information of all text boxes displayed in the image is acquired.
Optionally, the determining module 502 is configured to:
sequencing the plurality of text boxes according to the position information of the plurality of text boxes and a specified sequencing rule to obtain the sequence numbers of the plurality of text boxes;
and acquiring the attribute name corresponding to the sequence number of the first text box from the corresponding relation between the sequence number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the plurality of text boxes.
Optionally, the determining module 502 is configured to:
taking one text box with the largest length of the long sides of the plurality of text boxes as a second text box, wherein the long sides of the text boxes are the sides between two upper corner points or two lower corner points;
acquiring an angle of an included angle between a straight line where a long edge of the second text box is located and a transverse axis of an image coordinate system as a target angle, wherein the image coordinate system is used for determining position information of the text box;
rotating the plurality of text boxes by a target angle by taking one corner point of the second text box as a rotating point to obtain a plurality of corresponding target boxes;
and sequencing the plurality of text boxes according to the specified sequencing rule according to the position information of the plurality of target boxes.
Optionally, the determining module 502 is configured to:
sequencing a plurality of text boxes corresponding to a plurality of target boxes one by one according to the sequence of the vertical coordinates of the target corner points of the plurality of target boxes from small to large; if the vertical coordinates of the target corner points of the at least two target frames are the same, sorting the at least two text frames corresponding to the at least two target frames one by one according to the sequence of the horizontal coordinates of the target corner points of the at least two target frames from small to large.
In the embodiment of the application, after the position information of the plurality of text boxes in the image is acquired, the attribute names of the plurality of text boxes are determined according to the position information of the plurality of text boxes, so that the speed and the accuracy of acquiring the attribute names of the text boxes are improved. And finally, using the position information and the attribute names of the plurality of text boxes as text box mark information of the image. Therefore, the text box labeling information of the image is automatically acquired, so that the labeling efficiency can be improved, the labeling time is reduced, subjective errors caused by manual labeling are avoided, and the yield and the quality of the text box labeling work can be improved.
It should be noted that: in the text box labeling apparatus provided in the above embodiment, when displaying a page, only the division of the above functional modules is used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the text box labeling device provided by the above embodiment and the text box labeling method embodiment belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment and will not be described herein again.
Fig. 6 is a schematic structural diagram of a server 600 according to an embodiment of the present application. Server 600 may be a server in a background server cluster. Specifically, the method comprises the following steps:
the server 600 includes a CPU (Central Processing Unit) 601, a system Memory 604 including a RAM (Random Access Memory) 602 and a ROM (Read-Only Memory) 603, and a system bus 605 connecting the system Memory 604 and the Central Processing Unit 601. The server 600 also includes a basic I/O (Input/Output) system 606, which facilitates the transfer of information between devices within the computer, and a mass storage device 607, which stores an operating system 613, application programs 614, and other program modules 615.
The basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse, keyboard, etc. for user input of information. Wherein a display 608 and an input device 609 are connected to the central processing unit 601 through an input/output controller 610 connected to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input/output controller 610 may also provide output to a display screen, a printer, or other type of output device.
The mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the server 600. That is, the mass storage device 607 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk-read-Only Memory) drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM (Electrically Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, and CD-ROM, DVD (Digital versatile disk) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 604 and mass storage device 607 may collectively be referred to as memory.
According to various embodiments of the present application, the server 600 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the server 600 may be connected to the network 612 through the network interface unit 611 connected to the system bus 605, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611.
The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU. The one or more programs include instructions for performing the operations performed in the text box annotation methods provided by the method embodiments herein.
Fig. 7 is a schematic structural diagram of a terminal 700 according to an embodiment of the present application. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.
In general, terminal 700 includes: a processor 701 and a memory 702.
The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to perform operations performed in a text box annotation method provided by method embodiments herein.
In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, touch screen display 705, camera 706, audio circuitry 707, positioning components 708, and power source 709.
The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or both of processor 701, memory 702, and peripherals interface 703 may be implemented on separate chips or circuit boards, which is not limited in this application.
The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, etc. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, disposed on a front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.
The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.
The positioning component 708 is used to locate the current geographic position of the terminal 700 to implement navigation or LBS (location based Service). The positioning component 708 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.
Power supply 709 is provided to supply power to various components of terminal 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.
The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 713 may be disposed on a side bezel of terminal 700 and/or an underlying layer of touch display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the touch display 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the terminal 700. When a physical button or a vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical button or the vendor Logo.
The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display 705 is turned down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.
The proximity sensor 716, also referred to as a distance sensor, is typically disposed on a front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the touch display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually becomes larger, the processor 701 controls the touch display 705 to switch from the breath screen state to the bright screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 7 is not intended to be limiting of terminal 700 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In some embodiments, a computer-readable storage medium is also provided, in which a computer program is stored, which when executed by a processor implements the steps of the text box annotation method provided in the embodiment of fig. 2 above. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It is noted that the computer-readable storage medium referred to in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.
It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.
In some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the text box annotation method provided in the embodiment of fig. 2 above.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for annotating a text box, the method comprising:
acquiring position information of a plurality of text boxes in an image;
determining attribute names of the text boxes according to the position information of the text boxes;
and using the position information and the attribute names of the plurality of text boxes as text box mark information of the image.
2. The method of claim 1, wherein the obtaining position information of a plurality of text boxes in an image comprises:
inputting the image into a text box detection model to obtain position information of a plurality of text boxes in the image;
and acquiring the position information of the plurality of text boxes in the image according to the position information of the plurality of text boxes.
3. The method of claim 2, wherein the obtaining the position information of the plurality of text boxes in the image according to the position information of the plurality of text boxes comprises:
displaying a text box indicated by the plurality of text box position information in the image;
adjusting the position of one text box displayed in the image in response to the adjustment operation of the one text box;
responding to a frame selection operation executed in the image, and displaying a frame selection box corresponding to the frame selection operation in the image as a text box;
and acquiring the position information of all the text boxes displayed in the image.
4. The method of any of claims 1-3, wherein said determining attribute names of the plurality of text boxes based on the location information of the plurality of text boxes comprises:
sequencing the plurality of text boxes according to the position information of the plurality of text boxes and a specified sequencing rule to obtain the sequence numbers of the plurality of text boxes;
and acquiring an attribute name corresponding to the sequence number of a first text box from the corresponding relation between the sequence number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the text boxes.
5. The method of claim 4, wherein said sorting the plurality of text boxes according to a specified sorting rule based on the position information of the plurality of text boxes comprises:
taking one text box with the largest length of the long sides of the plurality of text boxes as a second text box, wherein the long sides of the text boxes are the sides between two upper corner points or two lower corner points;
acquiring an angle of an included angle between a straight line where a long edge of the second text box is located and a transverse axis of an image coordinate system as a target angle, wherein the image coordinate system is used for determining position information of the text box;
rotating the plurality of text boxes by the target angle by taking one corner point of the second text box as a rotating point to obtain a plurality of corresponding target boxes;
and sequencing the plurality of text boxes according to the specified sequencing rule according to the position information of the plurality of target boxes.
6. The method of claim 5, wherein said sorting the plurality of text boxes according to a specified sorting rule based on the position information of the plurality of target boxes comprises:
sequencing the plurality of text boxes corresponding to the plurality of target boxes one by one according to the sequence of the vertical coordinates of the target corner points of the plurality of target boxes from small to large; if the vertical coordinates of the target corner points of the at least two target frames are the same, sequencing the at least two text frames corresponding to the at least two target frames one by one according to the sequence of the horizontal coordinates of the target corner points of the at least two target frames from small to large.
7. A text box annotation apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring the position information of a plurality of text boxes in the image;
the determining module is used for determining the attribute names of the text boxes according to the position information of the text boxes;
and the marking module is used for taking the position information and the attribute names of the text boxes as the text box marking information of the image.
8. The apparatus of claim 7, wherein the acquisition module is to:
inputting the image into a text box detection model to obtain position information of a plurality of text boxes in the image;
and acquiring the position information of the plurality of text boxes in the image according to the position information of the plurality of text boxes.
9. The apparatus of claim 7 or 8, wherein the determination module is to:
sequencing the plurality of text boxes according to the position information of the plurality of text boxes and a specified sequencing rule to obtain the sequence numbers of the plurality of text boxes;
and acquiring an attribute name corresponding to the sequence number of a first text box from the corresponding relation between the sequence number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the text boxes.
10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the method of any of claims 1-6.
CN202010161011.3A 2020-03-10 2020-03-10 Text box labeling method, device and storage medium Active CN111353458B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010161011.3A CN111353458B (en) 2020-03-10 2020-03-10 Text box labeling method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010161011.3A CN111353458B (en) 2020-03-10 2020-03-10 Text box labeling method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111353458A true CN111353458A (en) 2020-06-30
CN111353458B CN111353458B (en) 2023-08-18

Family

ID=71197546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010161011.3A Active CN111353458B (en) 2020-03-10 2020-03-10 Text box labeling method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111353458B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN112016438A (en) * 2020-08-26 2020-12-01 北京嘀嘀无限科技发展有限公司 Method and system for identifying certificate based on graph neural network
CN112200107A (en) * 2020-10-16 2021-01-08 深圳市华付信息技术有限公司 Invoice text detection method
CN112580735A (en) * 2020-12-25 2021-03-30 南方电网深圳数字电网研究院有限公司 Picture online labeling method and device and computer readable storage medium
CN113239227A (en) * 2021-06-02 2021-08-10 泰康保险集团股份有限公司 Image data structuring method and device, electronic equipment and computer readable medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08221506A (en) * 1995-02-16 1996-08-30 Toshiba Corp Device and method for recognizing business document
CN108549843A (en) * 2018-03-22 2018-09-18 南京邮电大学 A kind of VAT invoice recognition methods based on image procossing
CN109308476A (en) * 2018-09-06 2019-02-05 邬国锐 Billing information processing method, system and computer readable storage medium
CN109492635A (en) * 2018-09-20 2019-03-19 第四范式(北京)技术有限公司 Obtain method, apparatus, equipment and the storage medium of labeled data
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109635627A (en) * 2018-10-23 2019-04-16 中国平安财产保险股份有限公司 Pictorial information extracting method, device, computer equipment and storage medium
CN109948533A (en) * 2019-03-19 2019-06-28 讯飞智元信息科技有限公司 A kind of Method for text detection, device, equipment and readable storage medium storing program for executing
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110533079A (en) * 2019-08-05 2019-12-03 贝壳技术有限公司 Form method, apparatus, medium and the electronic equipment of image pattern
CN110689010A (en) * 2019-09-27 2020-01-14 支付宝(杭州)信息技术有限公司 Certificate identification method and device
CN111950397A (en) * 2020-07-27 2020-11-17 腾讯科技(深圳)有限公司 Text labeling method, device and equipment for image and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08221506A (en) * 1995-02-16 1996-08-30 Toshiba Corp Device and method for recognizing business document
CN108549843A (en) * 2018-03-22 2018-09-18 南京邮电大学 A kind of VAT invoice recognition methods based on image procossing
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN109308476A (en) * 2018-09-06 2019-02-05 邬国锐 Billing information processing method, system and computer readable storage medium
CN109492635A (en) * 2018-09-20 2019-03-19 第四范式(北京)技术有限公司 Obtain method, apparatus, equipment and the storage medium of labeled data
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109635627A (en) * 2018-10-23 2019-04-16 中国平安财产保险股份有限公司 Pictorial information extracting method, device, computer equipment and storage medium
CN109948533A (en) * 2019-03-19 2019-06-28 讯飞智元信息科技有限公司 A kind of Method for text detection, device, equipment and readable storage medium storing program for executing
CN110533079A (en) * 2019-08-05 2019-12-03 贝壳技术有限公司 Form method, apparatus, medium and the electronic equipment of image pattern
CN110689010A (en) * 2019-09-27 2020-01-14 支付宝(杭州)信息技术有限公司 Certificate identification method and device
CN111950397A (en) * 2020-07-27 2020-11-17 腾讯科技(深圳)有限公司 Text labeling method, device and equipment for image and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱盈盈;张拯;章成全;张兆翔;白翔;刘文予;: "适用于文字检测的候选框提取算法", 数据采集与处理, no. 06 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN112016438A (en) * 2020-08-26 2020-12-01 北京嘀嘀无限科技发展有限公司 Method and system for identifying certificate based on graph neural network
CN112200107A (en) * 2020-10-16 2021-01-08 深圳市华付信息技术有限公司 Invoice text detection method
CN112580735A (en) * 2020-12-25 2021-03-30 南方电网深圳数字电网研究院有限公司 Picture online labeling method and device and computer readable storage medium
CN113239227A (en) * 2021-06-02 2021-08-10 泰康保险集团股份有限公司 Image data structuring method and device, electronic equipment and computer readable medium
CN113239227B (en) * 2021-06-02 2023-11-17 泰康保险集团股份有限公司 Image data structuring method, device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN111353458B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN108415705B (en) Webpage generation method and device, storage medium and equipment
CN111353458B (en) Text box labeling method, device and storage medium
CN109684980B (en) Automatic scoring method and device
WO2022042425A1 (en) Video data processing method and apparatus, and computer device and storage medium
CN111192005A (en) Government affair service processing method and device, computer equipment and readable storage medium
CN110321126B (en) Method and device for generating page code
CN111078521A (en) Abnormal event analysis method, device, equipment, system and storage medium
CN110647881A (en) Method, device, equipment and storage medium for determining card type corresponding to image
CN108491748B (en) Graphic code identification and generation method and device and computer readable storage medium
CN112230908A (en) Method and device for aligning components, electronic equipment and storage medium
CN110738185B (en) Form object identification method, form object identification device and storage medium
CN113627413A (en) Data labeling method, image comparison method and device
CN111753606A (en) Intelligent model upgrading method and device
CN112396076A (en) License plate image generation method and device and computer storage medium
CN112115748B (en) Certificate image recognition method, device, terminal and storage medium
CN110163192B (en) Character recognition method, device and readable medium
CN111898535A (en) Target identification method, device and storage medium
CN111354378A (en) Voice endpoint detection method, device, equipment and computer storage medium
CN110990728A (en) Method, device and equipment for managing point of interest information and storage medium
CN113535039B (en) Method and device for updating page, electronic equipment and computer readable storage medium
CN113051485B (en) Group searching method, device, terminal and storage medium
CN114238859A (en) Data processing system, method, electronic device, and storage medium
CN109816047B (en) Method, device and equipment for providing label and readable storage medium
CN111859549A (en) Method for determining weight and gravity center information of single-configuration whole vehicle and related equipment
CN111444945A (en) Sample information filtering method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40024806

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant