CN111353458B

CN111353458B - Text box labeling method, device and storage medium

Info

Publication number: CN111353458B
Application number: CN202010161011.3A
Authority: CN
Inventors: 彭梅英; 鲁四喜; 农高明; 唐嘉龙
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2023-08-18
Anticipated expiration: 2040-03-10
Also published as: CN111353458A

Abstract

The application discloses a text box labeling method, a text box labeling device and a storage medium, and belongs to the technical field of information. The method comprises the following steps: acquiring position information of a plurality of text boxes in an image; determining attribute names of the text boxes according to the position information of the text boxes; and taking the position information and attribute names of the text boxes as text box marking information of the image. According to the method, the text box marking information of the image is automatically obtained, so that marking efficiency can be improved, marking time is reduced, subjective errors caused by manual marking are avoided, and the yield and quality of text box marking work can be improved.

Description

Text box labeling method, device and storage medium

Technical Field

The present application relates to the field of information technologies, and in particular, to a text box labeling method, a text box labeling device, and a storage medium.

Background

OCR (Optical Character Recognition ) is a technology for converting characters of notes, newspapers, books, manuscripts, certificates, and other printed matters into image information by an optical input method such as scanning, and converting the image information into usable computer input by a character recognition technology.

OCR may be implemented by way of deep learning. Specifically, an image to be identified is provided to a text box detection model based on deep learning to obtain text box position information in the image, then the image is sheared according to the text box position information to obtain an image block to be identified, and the image block is provided to a text content identification model to obtain text content in the image block.

The performance of the text box detection model greatly influences the recognition accuracy of OCR, so that parameter tuning and verification testing for the text box detection model are particularly important, and a large amount of text box labeling information is needed in the process. At present, text boxes are all marked manually by technicians, so that the method has the defects of large workload and very long time consumption, and is very unfavorable for parameter tuning and verification testing of a text box detection model.

Disclosure of Invention

The application provides a text box labeling method, a text box labeling device and a storage medium, which can improve the yield and the quality of text box labeling work.

In one aspect, a text box labeling method is provided, the method comprising:

acquiring position information of a plurality of text boxes in an image;

determining attribute names of the text boxes according to the position information of the text boxes;

And taking the position information and attribute names of the text boxes as text box marking information of the image.

In one aspect, there is provided a text box labeling apparatus, the apparatus comprising:

the acquisition module is used for acquiring the position information of a plurality of text boxes in the image;

a determining module, configured to determine attribute names of the text boxes according to the location information of the text boxes;

and the labeling module is used for taking the position information and the attribute names of the text boxes as text box labeling information of the image.

In one aspect, a text box labeling device is provided, which includes a processor and a memory, where the memory is configured to store a program that supports the text box labeling device to execute the text box labeling method described above, and store data related to implementing the text box labeling method described above. The processor is configured to execute a program stored in the memory. The text box labeling device may further include a communication bus for establishing a connection between the processor and the memory.

In one aspect, a computer-readable storage medium having instructions stored thereon that when executed by a processor perform the steps of the text box marking method described above is provided.

In one aspect, a computer program product is provided comprising instructions that, when executed on a computer, cause the computer to perform the text box marking method described above.

The technical scheme provided by the application has at least the following beneficial effects:

after the position information of a plurality of text boxes in the image is acquired, the attribute names of the text boxes are determined according to the position information of the text boxes, so that the speed and accuracy of acquiring the attribute names of the text boxes are improved. Finally, the position information and the attribute names of the text boxes are used as text box marking information of the image. Therefore, the text box marking information of the image is automatically obtained, so that the marking efficiency can be improved, the marking time is reduced, subjective errors caused by manual marking are avoided, and the yield and quality of the text box marking work can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an identification card image provided by an embodiment of the present application;

FIG. 2 is a flowchart of a text box labeling method according to an embodiment of the present application;

FIG. 3 is a flow chart of another text box labeling method provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a text box labeling method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a text box labeling device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

It should be understood that references to "a plurality" in this disclosure refer to two or more. In the description of the present application, "/" means or, unless otherwise indicated, for example, A/B may represent A or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, in order to facilitate the clear description of the technical solution of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and function. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

Before explaining the embodiment of the present application in detail, an application scenario of the embodiment of the present application is described.

OCR is a technique of converting characters of notes, newspapers, books, manuscripts, certificates, and other printed matters into image information by an optical input method such as scanning, and converting the image information into usable computer input by a character recognition technique. In the field of OCR, text detection is a precondition for scene text recognition, and the problem to be solved is how to accurately locate the position of a text box matched with characters in a disordered and thousand-odd complex scene.

OCR generally includes image preprocessing, text box detection, text recognition, and the like. The text box detection is to detect the position range of the text and the layout thereof, including layout analysis, text line detection and the like, and then send the detected image block corresponding to the text box containing the text to the text recognition model for text recognition. Therefore, the accuracy of text box detection greatly influences the recognition accuracy of OCR, and the parameter tuning and verification test for the text box detection algorithm is particularly important.

However, in the parameter tuning and verification test process of the text box detection algorithm, a large amount of text box marking information is required to be used for continuously optimizing the text box detection algorithm, so that the improvement of the accuracy of the text box detection algorithm is promoted. The text box labeling effort in the text box detection algorithm is very important.

The following describes the text box labeling related content:

the text box labels mainly comprise text box position labels and text box attribute name labels. The location information of the text box is used to indicate the location of the text box in the image. The attribute name of a text box is used to indicate the meaning of the text content in the text box.

For example, the identification card image shown in fig. 1 includes a plurality of text contents. The text box marking information of the identity card image comprises position information and attribute names of text boxes corresponding to each text content in the plurality of text contents.

Fig. 2 is a flowchart of a text box labeling method according to an embodiment of the present application. Referring to fig. 2, the method includes:

step 201: position information of each of a plurality of text boxes in an image is acquired.

The image is an image to be subjected to text box labeling. The image may be an image taken of an object requiring text recognition. For example, the image may be an image of an object such as a certificate (including but not limited to an identification card, a port-australian pass, a driver license, etc.), a ticket, etc.

Additionally, the text box labels may include text box positional information labels and text box attribute name labels. That is, the location information and attribute name of a text box in an image may be taken as text box annotation information for the image. Wherein the location information of the text box is used to indicate the location of the text box in the image. The attribute name of a text box is used to indicate the meaning of the text content in the text box.

Furthermore, the position information of the text box is determined in the image coordinate system. That is, the image coordinate system is a coordinate system for determining position information of the text box. The text box is generally a quadrilateral box and comprises four corner points, specifically two upper corner points and two lower corner points, wherein the two upper corner points are an upper left corner point and an upper right corner point, and the two lower corner points are a lower left corner point and a lower right corner point. The location information of the text box may include locations of four corner points of the text box, which may be represented by coordinates of the four corner points of the text box in an image coordinate system.

Specifically, the image may be input into a text box detection model to obtain a plurality of text box position information in the image, and then the position information of the plurality of text boxes in the image may be obtained according to the plurality of text box position information.

The text box detection model is used for identifying a text region containing text content in an image, and outputting the edge position of the text region as text box position information. The text box detection model may be a neural network model such as a convolutional neural network, a recurrent neural network, a deep neural network, and the like.

The text box detection model may be trained using a large number of image samples including different text box annotation information. For example, a plurality of training samples may be determined in advance, and for any one of the plurality of training samples, the sample data of this training sample is an image, and the sample label of this training sample is text box information of this image. The plurality of training samples may then be used for model training, and specifically, sample data in the plurality of training samples may be used as input, and sample labels of the plurality of training samples may be used as desired output, and model training may be performed to obtain the text box detection model.

The operation of obtaining the position information of the text boxes in the image according to the position information of the text boxes may be: the plurality of text box position information is directly used as the position information of a plurality of text boxes in the image. Alternatively, a text box indicated by each text box position information of the plurality of text box position information may be displayed in the image; adjusting the position of one text box displayed in the image in response to an adjustment operation of the one text box; responding to a frame selection operation executed in the image, and displaying a frame selection frame corresponding to the frame selection operation in the image as a text frame; and acquiring the position information of all text boxes displayed in the image.

The adjustment operation may be an operation triggered by a technician to adjust the position of the text box, for example, an operation of stretching or compressing the text box. The technician can adjust the positions of the four corner points of the text box through the adjustment operation to adjust the overall position of the text box.

Additionally, the box selection operation may be a technician-triggered operation for adding a new text box. The technician may directly box a new text box in the image by a box selection operation.

In addition, after the text box detected by the text box detection model is displayed in the image in the embodiment of the application, a technician can adjust the position of the displayed text box through adjustment operation, and a text box can be newly added in the image through box selection operation, so that the technician can review the text box detected by the corrected text box detection model.

In one mode of the embodiment of the application, the text box position information output by the text box detection model can be directly used as the position information of the text box in the image, so that the speed of determining the position of the text box can be improved. In another mode, based on the text box position information output by the text box detection model, a technician performs review correction on the text box position information output by the text box detection model, and the reviewed and corrected text box position information is used as the position information of the text box in the image, so that the accuracy of the determined text box position can be improved.

Step 202: and determining attribute names of the text boxes according to the position information of the text boxes.

Specifically, the text boxes can be ordered according to the position information of the text boxes and the specified ordering rule to obtain the sequence numbers of the text boxes; and acquiring the attribute name corresponding to the serial number of the first text box from the corresponding relation between the serial number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the plurality of text boxes.

It should be noted that, after the plurality of text boxes are ordered, the plurality of text boxes have an order, and thus each text box in the plurality of text boxes has a sequence number representing the order thereof. The ordering of the plurality of text boxes may represent the positional relationship of the plurality of text boxes in the object to which they belong. For example, the ordering of the plurality of text boxes is a top-to-bottom, left-to-right ordering in the object.

In addition, the specified ordering rule may be preset, which means that the ordering is performed according to a certain rule. For example, the specified ordering rule may be a rule that orders by positional relationship from top to bottom and from left to right.

In general, the objects contained in the image may be at 0 degrees or at non-0 degrees. At 0 degrees means that the pose of the object in the image is relatively positive, i.e. the boundary of the object is relatively parallel to the horizontal or vertical axis of the image coordinate system. At a non-0 degree means that the pose of the object in the image is relatively skewed, i.e. the edges of the object are not parallel to both the horizontal and vertical axes of the image coordinate system. In the embodiment of the application, whether the text boxes are to be ordered according to the position information of each text box in the text boxes can be determined according to whether the object contained in the image is at 0 degree.

Wherein, according to the position information of the text boxes, the operation of ordering the text boxes according to the specified ordering rule may include the following steps (1) -step (3):

(1) And taking one text box with the largest length of the long side of the plurality of text boxes as a second text box.

It should be noted that the long side of the text box is the side between the two upper corner points or the two lower corner points. The text box with the largest length of the long sides in the text boxes is the longest text box in the text boxes. At this time, the one text box is used as a second text box, so that the gesture of an object to which the second text box belongs can be conveniently studied through the second text box.

In addition, the length of the long side of a text box is the difference between the abscissas of its two upper or lower corner points. For example, the long side of a text box is the side between two upper corner points, and assuming that the positions of the upper left corner point, the upper right corner point, the lower left corner point and the lower right corner point of a certain text box are respectively represented by (x 1, y 1), (x 2, y 2), (x 3, y 3) and (x 4, y 4), the length of the long side of the text box is the difference between x2 and x1 (i.e., x2-x 1).

(2) And acquiring an angle of an included angle between a straight line where the long side of the second text box is positioned and a transverse axis of the image coordinate system as a target angle.

It should be noted that, since the second text box is the longest text box among all text boxes of the object to which the second text box belongs, the long side of the second text box is often consistent with the edge line of the object to which the second text box belongs. In this case, the angle (i.e., the target angle) of the included angle between the straight line where the long side of the second text box is located and the horizontal axis of the image coordinate system may represent the angle of the included angle between the edge of the object to which the second text box belongs and the horizontal axis of the image coordinate system, so that the target angle can reflect the posture of the object in the image. That is, when the target angle is small, it indicates that the posture of the object in the image is relatively correct. When the target angle is large, this indicates that the pose of the object in the image is relatively skewed.

(3) Rotating the text boxes by a target angle by taking one corner point of the second text box as a rotating point to obtain a plurality of corresponding target boxes; and sorting the text boxes according to the position information of the target boxes and the designated sorting rule.

It should be noted that the target angle may represent an angle of an included angle between a border of the object to which the second text box belongs and a horizontal axis of the image coordinate system. And thus, taking one corner point of the second text box as a rotation point, and rotating each text box in the plurality of text boxes by a target angle to obtain a corresponding target box, wherein the gestures of the plurality of target boxes in the image are correct. In this case, the plurality of text boxes corresponding to the plurality of target boxes one by one may be sorted according to the specified sorting rule based on the position information of the plurality of target boxes, so that the sorting accuracy of the plurality of text boxes may be improved.

The operation of sorting the text boxes according to the specified sorting rule according to the position information of the target boxes may be: ordering the text boxes corresponding to the target boxes one by one according to the sequence from small to large of the ordinate of the target corner points of the target boxes; and if the ordinate of the target corner points of the at least two target frames are the same, sequencing at least two text boxes corresponding to the at least two target frames one by one according to the order of the abscissa of the target corner points of the at least two target frames from small to large.

It should be noted that the target corner point may be a fixed corner point in the quadrangular frame. For example, one of the upper left corner, the upper right corner, the lower left corner, and the lower right corner may be predefined as the target corner.

In the embodiment of the application, the plurality of target frames can be sequenced preferentially from small to large according to the ordinate and from small to large according to the abscissa, and then the sequencing of the plurality of target frames is regarded as the sequencing of the plurality of text frames corresponding to the plurality of target frames one by one. Thus, the plurality of text boxes are ordered in a top-to-bottom and left-to-right order.

It is noted that after the target angle is obtained in step (2), step (3) may be directly performed. Or, after the target angle is obtained in the step (2), whether the target angle is greater than or equal to the designated angle can be judged first; if the target angle is greater than or equal to the specified angle, executing the step (3); if the target angle is smaller than the designated angle, the text boxes can be ordered according to the order of the ordinate of the target angular points of the text boxes from small to large; and if the ordinate of the target corner points of the at least two text boxes are the same, sequencing the at least two text boxes according to the order of the abscissa of the target corner points of the at least two text boxes from small to large.

It should be noted that the specified angle may be set in advance, and the specified angle may be set smaller. When the target angle is greater than or equal to the specified angle, the object is indicated to be at a non-0 degree, that is, the gesture of the object in the image is relatively askew, so that the step (3) needs to be executed to rotate the text boxes to obtain a plurality of target boxes, and then the text boxes are ranked according to the target boxes, so that the ranking accuracy of the text boxes can be improved. When the target angle is smaller than the specified angle, the object is 0 degrees, namely the gesture of the object in the image is correct, so that the text boxes can be ordered directly according to the position information of the text boxes, and the ordering speed of the text boxes can be improved.

In general, the meaning of text content at various locations in an object (e.g., an identification card, a port-and-australian pass, a driver's license, etc.) is generally relatively fixed. For example, for the identity card shown in fig. 1, the attribute names of the text boxes from top to bottom and from left to right may be: attribute name of first text box: resident identification card, attribute name of second text box: chinese name, attribute name of third text box: english name, attribute name of fourth text box: number, attribute name of fifth text box: birth date, attribute name of sixth text box: date 1, attribute name of seventh text box: gender, attribute name of eighth text box: issue date, attribute name of ninth text box: date 2.

In this way, the correspondence relationship between the sequence number representing the sequence of the text boxes and the corresponding attribute names can be set in advance for each object. For example, for the identity card shown in fig. 1, the correspondence between the set serial number and the attribute name may be as shown in the following table 1:

TABLE 1

Sequence number	Attribute names
		1	Resident identification card
2	Chinese name
		3	English name
4	Number of number
		5	Birth date
6	Date 1
		7	Sex (sex)
8	Issue date
		9	Date 2

The correspondence between sequence numbers and attribute names is described by taking the above table 1 as an example in the embodiment of the present application, and the above table 1 does not limit the embodiment of the present application.

According to the method and the device for obtaining the attribute names of the text boxes, the attribute names of each text box in the text boxes can be accurately obtained according to the sequence numbers obtained after the text boxes are sequenced, and therefore the speed and the accuracy for obtaining the attribute names of the text boxes are improved.

Step 203: the position information and attribute names of the plurality of text boxes are taken as text box marking information of the image.

It should be noted that, for any image in a large batch of images, text box marking information of the image may be obtained according to the text box marking method provided in the embodiment of the present application.

In addition, after the text box label information of the image is obtained, the image and the text box label information of the image can be used as training samples to carry out parameter tuning on the text box detection algorithm, or the image and the text box label information of the image can be used as test samples to carry out verification test on the text box detection algorithm.

In the embodiment of the application, after the position information of a plurality of text boxes in the image is acquired, the attribute names of the text boxes are determined according to the position information of the text boxes, so that the speed and accuracy of acquiring the attribute names of the text boxes are improved. Finally, the position information and the attribute names of the text boxes are used as text box marking information of the image. Therefore, the text box marking information of the image is automatically obtained, so that the marking efficiency can be improved, the marking time is reduced, subjective errors caused by manual marking are avoided, and the yield and quality of the text box marking work can be improved.

For ease of understanding, the text box labeling method provided by the embodiment of the present application is illustrated in conjunction with fig. 3 and 4.

Referring to fig. 3, the text box labeling method may include the following steps 301-303.

Step 301: and automatically generating a basic annotation file containing the text box position information and the attribute name.

And inputting each image (such as credentials) in the plurality of images to be subjected to text box labeling into a text box detection model, and obtaining the position information of the plurality of text boxes in each image output by the text box detection model. In addition, a unified attribute name (which may be a value customized in advance) may be set to all text boxes that are output. In this way, a base annotation file containing the text box location information and the text box attribute name is generated for each image. The format of the basic annotation file needs to support loading of annotation software, and the format of the basic annotation file can be predefined, such as json (JavaScript Object Notation, JS object numbered musical notation), xml (Extensible Markup Language) and the like.

Step 302: and manually reviewing the corrected text box position information.

All the basic annotation files and the plurality of images output in step 301 are loaded with annotation software to display the plurality of images, and a text box indicated by each text box position information is displayed in each of the plurality of images. The technician performs review correction on the text box position information (the text box position information output by the text detection model is incorrect and needs to be corrected, and the correction is not needed if the text detection model is correct), for example, the text box position information can be corrected through adjustment operation on the displayed text box. The technician may also add a new text box and may continue to set the attribute name of the newly added text box to the attribute name that was uniformly set in step 301. Then, a new annotation file is regenerated for each image, wherein the annotation file contains text box position information and text box attribute names.

Step 303: the changed text box attribute name is automatically added.

For each image annotation file output in step 302, a specified ordering rule is provided to order the text boxes in each image, e.g., may order in a top-to-bottom, left-to-right order. And, for any one of a plurality of text boxes ordered in a certain image, acquiring the attribute name corresponding to the serial number of the text box as the attribute name of the text box, and modifying the originally set uniform attribute name into the acquired attribute name of each text box. In this way, the position information and the attribute name of each of the plurality of text boxes in each image are obtained as text box annotation information. By using the text box annotation information, parameter tuning and test verification can be performed for a text box detection algorithm.

The embodiment of fig. 3 described above is described in detail below in conjunction with fig. 4. Referring to fig. 4, the following three parts may be included:

a first part: software module for defining automatic generation of basic annotation file

Python loads an open source or an existing text box detection model, and aims at inputting a large amount of images, so that the function of automatically generating text box position information is realized. After generating the text box position information of each image, attribute names are automatically added to each text box, and the attribute names can be uniformly set to a value. The annotation file generated by the software module and containing the text box position information and the text box attribute name is called a basic annotation file. Each image can generate a basic annotation file, and the format of the basic annotation file accords with the loading requirement of annotation software.

A second part: manual review correction of text box positions by marking software

And loading all basic annotation files and the images output by the first part by using annotation software to display the images, and displaying the text box indicated by the position information of each text box in each image in the images.

The technician performs review correction on the text box position information (the text box position information output by the text detection model is incorrect and needs to be corrected, and the correction is not needed if the text detection model is correct), for example, the text box position information can be corrected through adjustment operation on the displayed text box.

The technician may further add a text box, and may continue to set the attribute names of the newly added text box to the attribute names uniformly set in the first portion.

And after the loaded multiple images are subjected to review and correction, regenerating a new annotation file for each image, wherein the annotation file contains text box position information and text box attribute names. Thereafter, the annotation files for all images are exported again.

Third section: software module for defining automatic adding and changing text box attribute name

(1) Text box attribute name definition

The meaning represented by each text box on an image containing a particular object may be known in advance, such as by looking at the text boxes on an identity card in a top-to-bottom, left-to-right order, the meaning represented by each text box may be known. Accordingly, a set of attribute names may be preset in the order of the text boxes, that is, a correspondence relationship between the sequence numbers and the attribute names is set.

(2) Text box ordering algorithm implementation

Regarding a plurality of text boxes in each image, taking one text box with the largest length of a long side in the plurality of text boxes as a second text box; acquiring an angle of an included angle between a straight line where a long side of the second text box is positioned and a transverse axis of an image coordinate system as a target angle; rotating each text box in the plurality of text boxes by a target angle by taking one corner point of the second text box as a rotating point to obtain a corresponding target box; and ordering the text boxes according to the specified ordering rule according to the position information of the target boxes.

(3) Automatic addition and modification of text box attribute names

And acquiring the attribute name corresponding to the sequence number of the text box from the corresponding relation between the sequence number and the attribute name as the attribute name of the text box, and modifying the originally set unified attribute name into the acquired attribute name of each text box. In this way, the position information and the attribute name of each of the plurality of text boxes in each image are obtained as the text box annotation information of each image.

Fig. 5 is a schematic structural diagram of a text box labeling device according to an embodiment of the present application. Referring to fig. 5, the apparatus includes:

An obtaining module 501, configured to obtain location information of a plurality of text boxes in an image;

a determining module 502, configured to determine attribute names of the text boxes according to the location information of the text boxes;

and the labeling module 503 is configured to take the location information and attribute names of the multiple text boxes as text box labeling information of the image.

Optionally, the obtaining module 501 is configured to:

inputting the image into a text box detection model to obtain position information of a plurality of text boxes in the image;

and acquiring the position information of the text boxes in the image according to the position information of the text boxes.

Optionally, the obtaining module 501 is configured to:

displaying a text box indicated by the plurality of text box position information in the image;

adjusting the position of one text box in response to an adjustment operation for the one text box displayed in the image;

responding to a frame selection operation executed in the image, and displaying a frame selection frame corresponding to the frame selection operation in the image as a text frame;

and acquiring the position information of all text boxes displayed in the image.

Optionally, the determining module 502 is configured to:

according to the position information of the text boxes, ordering the text boxes according to a specified ordering rule to obtain serial numbers of the text boxes;

And acquiring the attribute name corresponding to the serial number of the first text box from the corresponding relation between the serial number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in a plurality of text boxes.

Optionally, the determining module 502 is configured to:

taking one text box with the largest length of the long sides in the text boxes as a second text box, wherein the long sides of the text boxes are sides between two upper corner points or two lower corner points;

acquiring an angle of an included angle between a straight line where a long side of the second text box is positioned and a transverse axis of an image coordinate system as a target angle, wherein the image coordinate system is a coordinate system for determining position information of the text box;

rotating the text boxes by target angles by taking one corner point of the second text box as a rotating point to obtain a plurality of corresponding target boxes;

and ordering the text boxes according to the specified ordering rule according to the position information of the target boxes.

Optionally, the determining module 502 is configured to:

ordering a plurality of text boxes corresponding to the plurality of target boxes one by one according to the sequence from small to large of the ordinate of the target corner points of the plurality of target boxes; and if the ordinate of the target corner points of the at least two target frames are the same, sequencing at least two text frames corresponding to the at least two target frames one by one according to the order of the abscissa of the target corner points of the at least two target frames from small to large.

It should be noted that: when the text box labeling device provided in the above embodiment displays a page, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the text box labeling device provided in the above embodiment and the text box labeling method embodiment belong to the same concept, and detailed implementation processes of the text box labeling device are detailed in the method embodiment, and are not repeated here.

Fig. 6 is a schematic structural diagram of a server 600 according to an embodiment of the present application. The server 600 may be a server in a backend server cluster. Specifically, the present application relates to a method for manufacturing a semiconductor device.

The server 600 includes a CPU (Central Processing Unit ) 601, a system Memory 604 including a RAM (Random Access Memory ) 602 and a ROM (Read-Only Memory) 603, and a system bus 605 connecting the system Memory 604 and the central processing unit 601. The server 600 also includes a basic I/O (Input/Output) system 606 for facilitating the transfer of information between various devices within the computer, and a mass storage device 607 for storing an operating system 613, application programs 614, and other program modules 615.

The basic input/output system 606 includes a display 608 for displaying information and an input device 609, such as a mouse, keyboard, etc., for a user to input information. Wherein both the display 608 and the input device 609 are coupled to the central processing unit 601 via an input/output controller 610 coupled to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input/output controller 610 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the server 600. That is, the mass storage device 607 may include a computer readable medium (not shown) such as a hard disk or CD-ROM (Compact Disc Read-Only Memory) drive.

Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Electrically Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, as well as CD-ROM, DVD (Digital Versatile Disc, digital versatile disk) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The system memory 604 and mass storage 607 may be collectively referred to as memory.

The server 600 may also operate by a remote computer connected to the network through a network such as the internet, according to various embodiments of the present application. I.e., server 600 may be connected to network 612 through a network interface unit 611 coupled to system bus 605, or other types of networks or remote computer systems (not shown) may be coupled to using network interface unit 611.

The memory also includes one or more programs, one or more programs stored in the memory and configured to be executed by the CPU. The one or more programs include instructions for performing the operations performed in the text box labeling method provided by the method embodiments of the present application.

Fig. 7 is a schematic structural diagram of a terminal 700 according to an embodiment of the present application. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 700 includes: a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen is required to display. In some embodiments, the processor 701 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to perform the operations performed in the text box labeling method provided by the method embodiments of the present application.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, touch display 705, camera 706, audio circuitry 707, positioning component 708, and power supply 709.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, as the application is not limited in this regard.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, etc. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (Near Field Communication ) related circuitry, which is not limiting of the application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one and disposed on the front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The location component 708 is operative to locate the current geographic location of the terminal 700 for navigation or LBS (Location Based Service, location-based services).

A power supply 709 is used to power the various components in the terminal 700. The power supply 709 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyroscope sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch display screen 705 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may collect a 3D motion of the user to the terminal 700 in cooperation with the acceleration sensor 711. The processor 701 may implement the following functions based on the data collected by the gyro sensor 712: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 713 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the touch display screen 705. When the pressure sensor 713 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at the lower layer of the touch display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 714 is used to collect a fingerprint of the user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 714 may be provided on the front, back, or side of the terminal 700. When a physical key or vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical key or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 715.

The proximity sensor 716, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front face of the terminal 700 gradually decreases, the processor 701 controls the touch display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the touch display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In some embodiments, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the text box labeling method provided by the embodiment of fig. 2 described above. For example, the computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is noted that the computer readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.

It should be understood that all or part of the steps to implement the above-described embodiments may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

In some embodiments, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps of the text box labeling method provided by the embodiment of fig. 2 described above.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A text box marking method, the method comprising:

acquiring position information of a plurality of text boxes in an image;

acquiring an angle of an included angle between a straight line where a long side of the second text box is located and a transverse axis of an image coordinate system as a target angle, wherein the image coordinate system is a coordinate system for determining position information of the text box;

rotating the text boxes by the target angles by taking one corner point of the second text box as a rotation point to obtain a plurality of corresponding target boxes;

according to the position information of the plurality of target boxes, sequencing the plurality of text boxes according to a specified sequencing rule to obtain sequence numbers of the plurality of text boxes;

acquiring an attribute name corresponding to the serial number of a first text box from the corresponding relation between the serial number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the plurality of text boxes;

2. The method of claim 1, wherein the acquiring the positional information of the plurality of text boxes in the image comprises:

and acquiring the position information of a plurality of text boxes in the image according to the position information of the text boxes.

3. The method of claim 2, wherein the obtaining location information of a plurality of text boxes in the image based on the plurality of text box location information comprises:

adjusting the position of one text box displayed in the image in response to an adjustment operation of the one text box;

4. The method of claim 1, wherein the sorting the plurality of text boxes according to a specified sorting rule based on the position information of the plurality of target boxes comprises:

Sorting the text boxes corresponding to the target boxes one by one according to the sequence from small to large of the ordinate of the target corner points of the target boxes; and if the ordinate of the target corner points of the at least two target frames are the same, sequencing at least two text boxes corresponding to the at least two target frames one by one according to the order of the abscissa of the target corner points of the at least two target frames from small to large.

5. A text box marking device, the device comprising:

the determining module is used for taking one text box with the largest length of the long sides in the plurality of text boxes as a second text box, wherein the long sides of the text boxes are sides between two upper corner points or two lower corner points; acquiring an angle of an included angle between a straight line where a long side of the second text box is located and a transverse axis of an image coordinate system as a target angle, wherein the image coordinate system is a coordinate system for determining position information of the text box; rotating the text boxes by the target angles by taking one corner point of the second text box as a rotation point to obtain a plurality of corresponding target boxes; according to the position information of the plurality of target boxes, sequencing the plurality of text boxes according to a specified sequencing rule to obtain sequence numbers of the plurality of text boxes; acquiring an attribute name corresponding to the serial number of a first text box from the corresponding relation between the serial number and the attribute name as the attribute name of the first text box, wherein the first text box is one text box in the plurality of text boxes;

6. The apparatus of claim 5, wherein the acquisition module is to:

7. The apparatus of claim 6, wherein the acquisition module is to:

8. The apparatus of claim 5, wherein the determination module is to:

9. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the text box marking method of any of claims 1-4.