CN111881916A

CN111881916A - Character positioning method, device and equipment

Info

Publication number: CN111881916A
Application number: CN202010692775.5A
Authority: CN
Inventors: 卢健
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2020-11-03
Anticipated expiration: 2040-07-17
Also published as: CN111881916B

Abstract

The embodiment of the specification discloses a method, a device and equipment for positioning characters, wherein the method comprises the steps of obtaining an original image comprising the characters; generating a plurality of intermediate images based on the original image; the intermediate image comprises a predicted character area and a predicted background area; generating a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; and taking a binary matrix corresponding to the intermediate image with the smallest area of the expected character region as a reference matrix, and taking binary matrices corresponding to other intermediate images as constraint matrices. And the element corresponding to the expected character area is a first value, and the element corresponding to the expected background area is a second value. And selecting a constraint matrix in the sequence that the areas of the predicted character areas of the intermediate image are sequentially increased, expanding the number of first-valued elements in the reference matrix, and determining the positions of the characters in the original image by using the first-valued elements in the target binary matrix. By utilizing the embodiments of the specification, the efficiency of character positioning can be improved.

Description

Character positioning method, device and equipment

Technical Field

The present disclosure relates to the field of text recognition technologies, and in particular, to a text positioning method, apparatus, and device.

Background

Character recognition is a method for extracting character characters from images, and character positioning is an important link in a character recognition task. At present, there are two main methods for character positioning, one is a target detection method based on a detection frame, and the other is an example segmentation method based on pixels. In the present example segmentation method, after obtaining the segmented image, the text characters can be further positioned by using methods such as progressive expansion, so as to accurately identify the text characters. However, in the current example segmentation method, the expansion processing is mostly performed in a queue manner, and the method needs to traverse each pixel point adjacent to the text region, so that the expansion processing efficiency is low. As the application of the character recognition technology is more and more extensive, the data processing amount of the character recognition is also more and more large, and how to further improve the efficiency of character positioning and further improve the efficiency of character recognition becomes a technical problem to be solved urgently.

Disclosure of Invention

An object of the embodiments of the present specification is to provide a method, an apparatus, and a device for locating a text, which can further improve efficiency of text locating.

The specification provides a character positioning method, a character positioning device and a character positioning device, which are realized in the following modes:

a character positioning method is applied to a server and comprises the following steps: an original image including text is acquired. Generating a plurality of intermediate images based on the original image; wherein the intermediate image comprises a color representing a predicted text region and a color representing a predicted background region; the areas of the intended text regions of the plurality of intermediate images are different. Generating a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; taking a binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking a binary matrix corresponding to the intermediate image except the intermediate image with the minimum area of the predicted character region as a constraint matrix; in the binary matrix corresponding to the intermediate image, the element corresponding to the expected text region is a first value, and the element corresponding to the expected background region is a second value. Selecting a constraint matrix in the sequence that the area of the predicted character region of the intermediate image is sequentially increased, and expanding the number of the first-valued elements in the reference matrix to obtain a target binary matrix; and the region in the original image corresponding to the element of the first value in the target binary matrix is a character region.

In other embodiments of the method provided in this specification, the expanding the number of the first valued elements in the reference matrix includes: and determining first position information of the outer boundary of the corresponding expected character area in the reference matrix. Performing the following expansion steps based on the first location information: obtaining the value of an element corresponding to the first position information in the selected constraint matrix; and under the condition that a first value exists in the obtained values, expanding the number of elements of the first value in the reference matrix according to the obtained values.

In other embodiments of the method provided in this specification, the expanding the number of the first valued elements in the reference matrix includes: and determining second position information corresponding to the element of the first value in the obtained values. And updating the value of the element corresponding to the second position information in the reference matrix to be the first value to obtain the expanded reference matrix.

In other embodiments of the method provided in this specification, the expanding the number of the first valued elements in the reference matrix includes: and updating the value of the element corresponding to the first position information in the reference matrix to the value of the element corresponding to the corresponding first position information in the obtained values.

In other embodiments of the method provided in this specification, the determining first position information of an outer boundary of the corresponding expected text region in the reference matrix includes: and shifting the positions of the elements of the first value in the reference matrix to the periphery by one element unit respectively by taking the dimension range of the reference matrix as a limit. And performing matrix operation on the translated reference matrix and the reference matrix before translation to obtain a boundary matrix. And taking the position information of the first value-taking element in the boundary matrix as the first position information of the outer boundary of the corresponding predicted character area in the reference matrix.

In other embodiments of the method provided herein, the method further comprises: and replacing the expanded reference matrix with the reference matrix, and repeating the expanding step until no first value exists in the values of the elements corresponding to the first position information in the selected constraint matrix.

In other embodiments of the method provided herein, the method further comprises: and under the condition that no first value exists in the values of the elements corresponding to the first position information in the selected constraint matrix, obtaining the next constraint matrix of the selected constraint matrix, and repeating the first position information determination and expansion steps on the reference matrix expanded for the last time under the constraint of the selected constraint matrix until no constraint matrix exists, so as to obtain a target binary matrix.

In other embodiments of the method provided in this specification, the expanding the number of the first valued elements in the reference matrix includes: and calling GPU resources by using a pytorech tool and expanding the number of the first valued elements in the reference matrix in parallel.

On the other hand, an embodiment of the present specification further provides a text positioning apparatus, which is applied to a server, and includes: and the original image acquisition module is used for acquiring an original image comprising characters. An intermediate image generation module for generating a plurality of intermediate images based on the original image; wherein the intermediate image comprises a color representing a predicted text region and a color representing a predicted background region; the areas of the intended text regions of the plurality of intermediate images are different. A binary matrix generation module for generating a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; taking a binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking a binary matrix corresponding to the intermediate image except the intermediate image with the minimum area of the predicted character region as a constraint matrix; in the binary matrix corresponding to the intermediate image, the element corresponding to the expected text region is a first value, and the element corresponding to the expected background region is a second value. The extension processing module is used for selecting a constraint matrix according to the sequence that the area of the expected character area of the intermediate image is increased in sequence, and performing extension processing on the number of the first-valued elements in the reference matrix by using the selected constraint matrix to obtain a target binary matrix; and the region in the original image corresponding to the element of the first value in the target binary matrix is a character region.

In another aspect, an embodiment of the present specification further provides a text-positioning apparatus, where the apparatus includes at least one processor and a memory for storing processor-executable instructions, and the instructions, when executed by the processor, implement steps of a method including any one or more of the above embodiments.

In the text positioning method, the text positioning device, and the text positioning apparatus provided in one or more embodiments of the present specification, after receiving an original image to be text-positioned, a predicted text region corresponding to a text in the original image and a predicted background region outside the text may be initially segmented to obtain a plurality of intermediate images for different predicted text region areas. And then, converting the intermediate image into a binary matrix, and representing the predicted character area and the predicted background area in the two-dimensional matrix by using different values. And then, carrying out progressive expansion processing on the pixel area occupied by the characters by using the binary matrix. The expansion processing is carried out based on the binary matrix, the data processing in the expansion process can be effectively carried out by utilizing matrix operation and indexes, the complexity of processing time is reduced, the efficiency of pixel expansion processing is greatly improved, the character positioning efficiency is further improved, and the requirement of character recognition application is met.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the drawings:

fig. 1 is a schematic flow chart of an embodiment of a text positioning method provided in the present specification;

FIG. 2 is a flow diagram illustrating progressive expansion in one embodiment provided herein;

FIG. 3 is a flow diagram illustrating progressive expansion in one embodiment provided herein;

FIG. 4 is a schematic representation of a binary matrix of an intermediate image before panning in one embodiment provided by the present specification;

FIG. 5 is a schematic representation of a binary matrix of the translated intermediate image in one embodiment provided herein;

FIG. 6 is a schematic diagram of a binary matrix of outer boundaries in one embodiment provided herein;

FIG. 7 is a schematic illustration of an intermediate image before expansion in one embodiment provided by the present specification;

FIG. 8 is a boundary line schematic of an intermediate image before expansion in one embodiment provided herein;

FIG. 9 is a schematic illustration of an expanded intermediate image in one embodiment provided herein;

FIG. 10 is a diagram illustrating an original image to be text-oriented in one embodiment provided in the present specification;

fig. 11 is a schematic structural diagram of a module of a text positioning device provided in this specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on one or more embodiments of the present specification without making any creative effort shall fall within the protection scope of the embodiments of the present specification.

In a scenario example provided in the embodiment of the present specification, the text positioning method may be applied to a server that performs text recognition, and may also be applied to a server that only implements text positioning. After receiving the original image including the characters, the server may first segment the original image to preliminarily locate an expected character region where the characters are located, distinguish the expected character region where the characters are located from an expected background region, and obtain a plurality of intermediate images with different areas of the expected character region. Then, a plurality of binary matrices are generated corresponding to the plurality of intermediate images, and the pixel area occupied by the characters is positioned based on the binary matrices. The binary matrix-based positioning of the pixels where the characters are located can greatly reduce the complexity of data processing, so that the character positioning efficiency can be greatly improved, and the requirements of practical application are met.

Fig. 1 is a schematic flow chart of an embodiment of the text positioning method provided in this specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the described method or module structure is applied to a device, a server or an end product in practice, the method or module structure according to the embodiment or the figures may be executed sequentially or in parallel (for example, in a parallel processor or multi-thread processing environment, or even in an implementation environment including distributed processing and server clustering). Fig. 1 shows a specific embodiment, and in an embodiment of the text positioning method provided in this specification, the method may be applied to the server, and the method may include the following steps:

s20: an original image including text is acquired.

The server may obtain an original image that includes text. The original image may refer to an image before segmentation processing of the expected text region and the expected background region. For example, the image acquired by the information acquisition equipment in the modes of photographing the physical carrier containing the characters and the like can be acquired. Or an image obtained by preprocessing an originally acquired image. Of course, the image may be an image that is produced based on an electronic device, contains text, and is converted into an image format.

The server can acquire an original image including characters sent by the equipment, so that the characters in the original image are processed by using methods such as character positioning and recognition, and the characters in the original image are converted into characters which can be edited by a computer.

S22: generating a plurality of intermediate images based on the original image; wherein the intermediate image comprises a color representing a predicted text region and a color representing a predicted background region; the areas of the intended text regions of the plurality of intermediate images are different.

The intended text region may refer to an image region containing a partial pixel region or a full pixel region of text. The expected background region may refer to an image region other than the expected text region in the original image. The server can preliminarily position the area where the characters in the original image are located, the preliminarily positioned area where the characters are located is used as a predicted character area, the image area outside the predicted character area is used as a predicted background area, and an intermediate image containing the predicted character area and the predicted background area is obtained. The server may also generate a plurality of intermediate images containing the expected text regions of different areas.

In some embodiments, the server may determine, based on different segmentation scales, a region where the text in the original image is located, with a central point of the text preliminarily located in the original image as a reference point, and segment the region where the text is located and an image region outside the region where the text is located, to obtain an intermediate image obtained by segmentation processing of the original image with respect to different segmentation scales. It should be noted that the center point of the text is a reference point of text positioning preliminarily determined based on an algorithm, and is not an actual center point of the text character, and may or may not coincide with the actual center point. The segmentation scale may represent a ratio of a pixel range value adopted when the region where the text in the image is located to a pixel range value of the complete text with the center point of the text as a reference. Accordingly, the predicted text region in the portion of the intermediate image may include only a portion of the pixel region of the text, rather than the entire pixel region, and the predicted text region in the portion of the intermediate image may include the entire pixel region of the text. Of course, in practical application, other manners may also be adopted to preliminarily determine the region where the characters in the original image are located, and generate a plurality of intermediate images, which is not limited herein.

When the expected character area is divided from the expected background area outside the expected character area, different pixel values can be respectively assigned to the expected character area and the expected background area, so that the expected character area and the expected background area can be distinguished more simply and conveniently. Accordingly, the intermediate image may include a color representing the expected text region and a color representing the expected background region. In some embodiments, the intermediate image may, for example, take the form of a binary image. For example, the pixel value of the expected text region in the intermediate image can be assigned a fixed value, and the expected background region can be assigned another fixed value. And the expected character image and the expected background image in the intermediate image are represented in a binarization mode, so that the processing efficiency of subsequent character positioning can be further improved. Of course, if the original image includes a plurality of characters, different pixel values may be assigned to the expected character regions corresponding to different characters, and another pixel value may be assigned to the expected background region outside the expected character region.

The raw image may be processed using the PSENET method to obtain a plurality of intermediate images. As shown in fig. 2, a, e, and f in fig. 2 are intermediate images obtained by segmenting an original image by using the PSENET algorithm. The white area is an expected character area, and the black area is an expected background area. The segmentation scales corresponding to a, e and f are larger and larger, that is, the character core (kenel) is larger and larger, and the area of the corresponding segmented predicted character region is larger and larger. a. The predicted text region in e includes only a portion of the pixel regions of the text, while the predicted text region in f includes all of the pixel regions of the text.

S24: generating a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; taking a binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking a binary matrix corresponding to the intermediate image except the intermediate image with the minimum area of the predicted character region as a constraint matrix; in the binary matrix corresponding to the intermediate image, the element corresponding to the expected text region is a first value, and the element corresponding to the expected background region is a second value.

After obtaining the intermediate image, the server may generate a plurality of binary matrices corresponding to the plurality of intermediate images, respectively. Correspondingly, elements in the binary matrix correspond to the pixel points of the intermediate image, and the value of each element can be determined according to the value of the corresponding pixel point. For example, the value of each element in the binary matrix can be directly the value of the corresponding pixel. In some embodiments, a binary matrix corresponding to the intermediate image may be configured, where an element corresponding to the expected text region is a first value, and an element corresponding to the expected background region is a second value. If the first value corresponding to the element of the expected text region is 1, and the second value corresponding to the element of the expected background region is 0. Of course, other values may be used, and are not limited herein.

For convenience of description, the intermediate image with the smallest area of the predicted character region may be used as the reference image, and accordingly, the binary matrix corresponding to the reference image may be used as the reference matrix, the intermediate images except the intermediate image with the smallest area of the predicted character region may be used as the constrained images, and accordingly, the binary matrix corresponding to the constrained images may be used as the constrained matrix.

S26: selecting a constraint matrix in the sequence that the area of the predicted character area of the intermediate image is sequentially increased, and expanding the number of the first-valued elements in the reference matrix by using the selected constraint matrix to obtain a target binary matrix; and the region in the original image corresponding to the element of the first value in the target binary matrix is a character region.

The server may select the constraint matrices in order that the areas of the expected text regions of the intermediate images increase in order. Then, the selected constraint matrix can be used as a constraint, and the number of the elements of the first value in the reference matrix is gradually expanded to obtain a target binary matrix.

For example, the intermediate images may be sorted in order of the area of the expected text region from small to large. And taking a binary matrix corresponding to the intermediate image ranked at the first bit as a reference matrix. And sequentially selecting the binary matrixes corresponding to the intermediate images as constraint matrixes from the intermediate images ranked at the second position. If the binary matrix corresponding to the intermediate image arranged at the second position is taken as the constraint matrix, the first value-taking element in the constraint matrix is taken as the constraint, and the number of the first value-taking elements in the reference matrix is expanded until the expansion cannot be carried out. Then, whether the intermediate image arranged at the third position exists is determined, if yes, the binary matrix corresponding to the intermediate image arranged at the third position is used as a constraint matrix, and the expanded reference matrix is expanded under the constraint of the intermediate image at the second position.

And repeating the steps, and sequentially determining whether the next intermediate image exists or not until the intermediate image does not exist any more. And outputting the expanded reference matrix under the last constraint as a target binary matrix. The region where the corresponding character in the original image is located can be located by utilizing the target binary matrix output after the expansion under the last constraint.

The scheme provided by the embodiment takes the value of the element corresponding to the predicted character area in the binary matrix corresponding to the intermediate image as the constraint, and gradually expands the pixel position occupied by the characters, so that the data processing complexity can be effectively reduced, and the character area corresponding to the characters in the original image can be more simply and effectively positioned.

In some embodiments, the server may perform expansion processing on the number of elements of the first value in the reference matrix in the following manner:

and determining first position information of the outer boundary of the corresponding expected character area in the reference matrix. And performing the following expansion steps based on the first location information: obtaining the value of an element corresponding to the first position information in the selected constraint matrix; and under the condition that a first value exists in the obtained values, expanding the number of elements of the first value in the reference matrix according to the obtained values.

In some embodiments, the dimension range of the reference matrix may be used as a limit, the position of the first valued element in the reference matrix is respectively translated to the periphery by one element unit, and the translated reference matrix and the reference matrix before translation are subjected to matrix operation to obtain the boundary matrix. Then, the position information of the element with the first value in the boundary matrix may be used as the first position information of the outer boundary of the corresponding predicted text region in the reference matrix.

The following description will be given by taking an example in which the position of the first valued element in the reference matrix is shifted downward by one element unit. The position of the first valued element in the reference matrix may be shifted downward by one element unit to obtain a shifted reference matrix, and then, matrix operation may be performed on the shifted reference matrix and the reference matrix before shifting to obtain a boundary matrix. The specific operation form of the matrix operation can be determined according to the setting mode of the value of each element in the binary matrix corresponding to the intermediate image, so that the position information of the element corresponding to the outer boundary of the expected character area can be accurately and efficiently determined. For example, the matrix operation may include an operation between all corresponding elements in the reference matrix after the translation and the reference matrix before the translation, or may include an operation between some elements in the reference matrix after the translation and the reference matrix before the translation. The matrix operation may be, for example, a dot product operation, an addition operation, or the like. Based on the above scheme, the position of the element of the first value in the reference matrix may be respectively translated upwards, leftwards or rightwards.

The translation of one element unit to the periphery by taking the dimension range of the reference matrix as a limitation means that if the expected text area reaches a certain boundary of the intermediate image, that is, one or more first-value elements are located in a boundary row or column of the reference matrix, the text area can be translated to no row or row in the corresponding direction, and at this time, the translation operation is not required to be performed in the direction, and the translation operation is performed only in other directions capable of translating. The translation by one element unit means that the element is translated by one element position up or down or left or right. The position information can be characterized by a row number and a column number, for example.

In the embodiment, when the boundary of the predicted character area is determined, the outer boundary of the predicted character area can be determined by directly utilizing one-time matrix operation, so that the efficiency of determining the boundary can be greatly improved, and the expansion efficiency is further improved.

Of course, in practical applications, the reference matrix may also be subjected to translation processing in other manners, which is not limited herein.

Then, in order that the areas of the expected character regions of the intermediate images sequentially increase, the binary matrix corresponding to the intermediate image arranged at the second position may be selected as the constraint matrix, and then, whether the first value exists in the values of the elements corresponding to the first position information in the selected constraint matrix may be obtained and determined. In a case where it is determined that the first value exists, the number of elements of the first value in the reference matrix may be expanded based on the obtained value.

In some embodiments, for example, the value of the element corresponding to the first location information in the reference matrix may be updated to the value of the element corresponding to the corresponding first location information in the obtained values. For example, whether a first value exists in values of each element under a row number and a column number corresponding to first position information in the selected constraint matrix may be obtained and determined according to a matrix element index manner, and if the first value exists, the value of the element corresponding to the corresponding row number and column number in the reference matrix may be updated to the value of the element corresponding to the corresponding row number and column number in the obtained values.

Or, in other embodiments, second position information corresponding to the element of the first value in the obtained values in the selected constraint matrix may be further determined. And updating the value of the element corresponding to the second position information in the reference matrix to be the first value to obtain the expanded reference matrix. By further determining second position information corresponding to the element of the first value in the obtained values in the selected constraint matrix, and then only replacing the value of the element corresponding to the second position information in the reference matrix with the first value, the number of the first value elements in the reference matrix can be expanded more simply and efficiently.

In the above embodiment, the extension range of the pixel points occupied by the characters in the extension process is determined by the matrix element value, and the extension processing of the pixel points occupied by the characters is performed in the matrix element value index manner, so that the data processing complexity can be reduced, and the extension processing of the pixel points occupied by the characters can be realized more simply and efficiently.

Then, the expanded reference matrix can replace the reference matrix before expansion, and the first position information corresponding to the outer boundary of the expected character area in the expanded reference matrix can be determined again. And repeating the expanding step based on the newly determined first position information until no first value exists in the values of the elements corresponding to the first position information in the currently selected constraint matrix, thereby realizing the gradual expansion of the number of the first value elements in the reference matrix by taking a single constraint matrix as constraint,

then, under the condition that no first value exists in the values of the elements corresponding to the first position information in the selected constraint matrix, the next constraint matrix of the currently selected constraint matrix is obtained in the sequence of sequentially increasing the area of the expected character region of the intermediate image, the first position information determining and expanding steps are repeated on the reference matrix expanded for the last time under the constraint of the selected constraint matrix until no constraint matrix exists, and the expanding processing under the constraint of each intermediate image is completed to obtain a target binary matrix.

In the embodiment, by using the matrix operation and the matrix index method, the corresponding range judgment and the expansion processing during single expansion can be simply and efficiently realized, the complexity of data processing during single expansion of the pixel points corresponding to each character is effectively reduced, and the single expansion efficiency is further improved. Meanwhile, the expansion processing is carried out based on a matrix operation and index mode, and GPU resources can be effectively called to carry out parallel processing, so that the efficiency of the expansion processing can be further improved, and the efficiency of character positioning is further improved.

Correspondingly, in some embodiments, a pitoch tool may be used to call GPU resources and perform expansion processing on the number of the first valued elements in the reference matrix, so as to further improve the efficiency of data processing.

Based on the solutions provided in the above embodiments, in some embodiments, the following method may be used for the extension process:

s240: and respectively converting the plurality of intermediate images into binary matrixes, wherein elements in the binary matrixes can be used for representing values of corresponding pixels in the intermediate images.

S241: and sequencing the plurality of intermediate images according to the sequence of the areas of the expected character areas from small to large. And taking the binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking the binary matrix corresponding to the intermediate images except the intermediate image with the minimum area of the predicted character region as a constraint matrix. The following iterative steps S242 to S249 are performed.

S242: and shifting the positions of the elements of the first value in the reference matrix to the periphery by one element unit respectively by taking the dimension range of the reference matrix as a limit.

S243: and performing matrix operation on the translated reference matrix and the reference matrix before translation to obtain a boundary matrix.

S244: and determining the position information of the first value-taking element in the boundary matrix as first position information.

S245: and acquiring the value of the element corresponding to the first position information in the selected constraint matrix.

S246: and (4) judging whether the first value exists in the values acquired in the step (S245), and if so, executing the step (S247). If not, the process goes to step S249.

S247: and performing expansion processing on the number of the first-valued elements in the reference matrix according to the values obtained in the step S245.

S248: and replacing the reference matrix before the expansion with the expanded reference matrix, and repeating the steps from the step S242 to the step S247.

S249: and acquiring the next constraint matrix of the selected constraint matrix, repeating the steps from S242 to S248, and continuing to expand the reference matrix expanded for the last time under the constraint of the selected constraint matrix until no constraint matrix exists, so as to obtain a target binary matrix.

Then, the pixel position of the corresponding character in the original image can be determined according to the first value-taking element in the target binary matrix, and the character positioning is realized.

As shown in fig. 3, the above-described expanding process is described below, assuming that the expansion can be performed all the way down, taking only the downward expansion as an example.

S301: all intermediate images are stored as tensors (binary matrices), i.e. binary matrices. Each element on the matrix corresponds to a pixel point on the intermediate image. It is assumed that the element with the value of 1 corresponds to the pixel point occupied by the expected text region, and the element with the value of 0 corresponds to the pixel point occupied by the expected background region. As shown in fig. 4. Fig. 4 shows a schematic diagram of the distribution of elements of the reference matrix a.

The following iterative steps are performed:

s302: the reference matrix a is shifted down by one unit, and the first row is subjected to all 0-complementing processing to form a shifted reference matrix B as shown in fig. 5. Then, matrix operation is performed on a and B to obtain a boundary matrix C, as shown in fig. 6. The pseudo code for the matrix operation can be expressed as:

Down＝(B>0)×(A＝＝0)

thus, the position information of the lower boundary of the intermediate image can be directly determined by judging the pixel positions equal to 1 in B and equal to 0 in A by only one matrix dot multiplication. The part of the dashed box in fig. 6 is the element corresponding to the determined lower boundary. The complexity of the operation time corresponding to the matrix operation mode is only O (1), and GPU parallel computing resources can be used, so that the processing efficiency can be greatly improved. In some embodiments, the operations described above may be performed using a pytorech algorithm. If the queue method is adopted, it is necessary to traverse each pixel point adjacent to the expected text region, and the time operation complexity is o (n), for example, the length n of the queue that needs to be established for the intermediate image corresponding to fig. 4 is 20, which obviously results in low processing efficiency.

As shown in fig. 7 and 8, fig. 7 shows an intermediate image in which the area of the predicted character region is the smallest. The white area in fig. 8 indicates the lower boundary of the intermediate image in which the area of the expected character area is smallest.

S303: and determining the position information of the element with the value larger than 0 in the boundary matrix as first position information. It is determined whether the selected constraint matrix has an element greater than 0 in the first location information.

S304: and if the elements with the values larger than 0 exist, performing primary expansion processing on the reference matrix. Specifically, the second position information corresponding to the element whose value is greater than 0 in the selected constraint matrix may be further determined, and the value of the element corresponding to the second position information in the reference matrix may be updated to 1. The pseudo-code is implemented as follows:

Down＝Down×B

Origin[Down>0]＝Down

the high-level index operation Origin [ Down >0] of the tensor is used, the time complexity is O (1) as well, and therefore the data processing efficiency can be further improved. In some embodiments, such as a store algorithm, a tensor indexing operation may be used.

Replacing the reference matrix before expansion with the expanded reference matrix, repeating S302 to re-determine the first position information of the element corresponding to the outer boundary of the expected character region in the expanded reference image, and repeating S303-S304 until the value of the constraint matrix is not greater than 0 element under the first position information, that is, the constraint matrix cannot be re-expanded under the constraint of the intermediate image.

S305: it is determined whether there is a next constraint matrix. If so, the process goes to S306. If not, go to 307.

S306: and repeating the steps S302 to S304, taking the next constraint matrix as the selected constraint matrix, and continuing the expansion processing on the expanded reference matrix. And turning to S307 until the unconstrained matrix is achieved.

S307: and outputting the reference matrix after the last expansion to obtain a target binary matrix.

As shown in fig. 9, fig. 9 shows an intermediate image corresponding to the expanded target binary matrix, where a white portion is a position where a text pixel is located, that is, a position where a date in fig. 10 is located.

In some embodiments, the original image may be binarized by using a PSENET (progressive expansion network) to obtain a plurality of intermediate images. The PESNET adopts a mode of gradually expanding the region of the characters outwards by taking the region of the center of the characters as the minimum scale, finally determines the region of the complete characters, and obtains a plurality of binary intermediate images. As shown in fig. 2, black in the intermediate image represents the expected background area, and white represents the expected text area. The white part area increases from small to large in different intermediate images.

Then, the expansion method provided by the above embodiment of the present specification may be used to sequentially expand the pixels of the characters in a with the binary e and f maps as constraints. As shown in the g diagram in fig. 2, the left side indicates the initial pixel regions corresponding to the two characters in the a binary diagram, and the characters occupying 2 pixels initially occupy 12 pixels after Expansion (Scale Expansion) and the characters occupying 1 pixel initially occupy 5 pixels after Expansion, with the e binary diagram as the constraint. The b diagram in FIG. 2 is a schematic diagram of the text region positioned based on the a diagram; the diagram c in fig. 2 is a schematic diagram of a text region which is constrained by the diagram e and is positioned after expansion; the diagram d in fig. 2 is a schematic diagram of the text region located after expansion with the diagram f as a constraint.

The pixel extension range on the current binary image a is determined by the area of the corresponding predicted character area on the next binary image e, and the area of the character pixel after the character pixel in the a is extended under the constraint of e does not exceed the area of the white part on the e binary image. And after the expansion can not be performed again by taking e as a constraint, further expanding the a expanded by taking e as a constraint by taking f as a constraint. For the above example, if f is the last binary image, then f is used as the constraint, and after the binary image cannot be expanded, the binary image of the last expansion can be output. And if other binary images exist after f, continuing to expand by taking the other binary images as constraints. And analogizing in sequence, finally, the pixel area corresponding to each character does not exceed the character area of the binary image with the largest scale.

Compared with a queue-based mode, the expansion mode provided by the embodiment of the specification has the advantages that the processing time complexity is greatly reduced, and GPU resources can be called for parallel processing, so that the expansion processing efficiency can be greatly improved, the character positioning efficiency is improved, and the requirements of character recognition application are met.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. For details, reference may be made to the description of the related embodiments of the related processing, and details are not repeated herein.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the text positioning method provided in one or more embodiments of the present specification, after receiving an original image to be text-positioned, a predicted text region corresponding to a text in the original image and a predicted background region outside the text may be initially segmented to obtain a plurality of intermediate images for different predicted text region areas. And then, converting the intermediate image into a binary matrix, and representing the predicted character area and the predicted background area in the two-dimensional matrix by using different values. And then, carrying out progressive expansion processing on the pixel area occupied by the characters by using the binary matrix. The expansion processing is carried out based on the binary matrix, the data processing in the expansion process can be effectively carried out by utilizing matrix operation and indexes, the complexity of processing time is reduced, the efficiency of pixel expansion processing is greatly improved, the character positioning efficiency is further improved, and the requirement of character recognition application is met.

Based on the above text positioning method, one or more embodiments of the present specification further provide a text positioning device. The apparatus may include systems, software (applications), modules, components, servers, etc. that utilize the methods described in the embodiments of the present specification in conjunction with hardware implementations as necessary. Based on the same innovative conception, embodiments of the present specification provide an apparatus as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific implementation of the apparatus in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Specifically, fig. 11 is a schematic diagram of a module structure of an embodiment of a text positioning device provided in the specification, and as shown in fig. 11, the text positioning device is applied to a server, and the text positioning device includes:

an original image acquisition module 102, configured to acquire an original image including text;

an intermediate image generation module 104 operable to generate a plurality of intermediate images based on the original image; wherein the intermediate image comprises a color representing a predicted text region and a color representing a predicted background region; the areas of the expected character areas of the plurality of intermediate images are different;

a binary matrix generating module 106, configured to generate a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; taking a binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking a binary matrix corresponding to the intermediate image except the intermediate image with the minimum area of the predicted character region as a constraint matrix; in a binary matrix corresponding to the intermediate image, elements corresponding to the expected text region are first values, and elements corresponding to the expected background region are second values;

the expansion processing module 108 may be configured to select a constraint matrix in an order that areas of expected text regions of the intermediate image sequentially increase, and perform expansion processing on the number of first-valued elements in the reference matrix by using the selected constraint matrix to obtain a target binary matrix; and the region in the original image corresponding to the element of the first value in the target binary matrix is a character region.

In other embodiments, the expansion processing module 108 may include:

the position information determining unit may be configured to determine first position information corresponding to an outer boundary of the expected text region in the reference matrix.

An extension processing unit operable to perform the following extension steps based on the first location information: obtaining the value of an element corresponding to the first position information in the selected constraint matrix; and under the condition that a first value exists in the obtained values, expanding the number of elements of the first value in the reference matrix according to the obtained values.

In other embodiments, the expansion processing unit may be further configured to determine second location information corresponding to an element of a first value in the obtained values; and updating the value of the element corresponding to the second position information in the reference matrix to be the first value to obtain the expanded reference matrix.

In other embodiments, the extended processing unit may be further configured to update a value of an element corresponding to the first location information in the reference matrix to a value of an element corresponding to the corresponding first location information in the obtained values.

In other embodiments, the translation processing unit may be further configured to translate, by taking a dimension range of the reference matrix as a limit, positions of the first valued elements in the reference matrix to four sides by one element unit, respectively; performing matrix operation on the translated reference matrix and the reference matrix before translation to obtain a boundary matrix; and taking the position information of the first value-taking element in the boundary matrix as the first position information of the outer boundary of the corresponding predicted character area in the reference matrix.

In other embodiments, the expansion processing module 108 may be further configured to perform translation processing on the expanded reference matrix, and repeat the expansion step based on the first position information corresponding to the expanded reference matrix until there is no first value in the values of the elements corresponding to the first position information in the selected constraint matrix.

In other embodiments, the expansion processing module 108 may be further configured to, in a case that there is no first value in the values of the elements corresponding to the first location information in the selected constraint matrix, obtain a next constraint matrix of the selected constraint matrix, and repeat the translation processing and the expansion steps on the reference matrix expanded last time under the constraint of the selected constraint matrix until there is no constraint matrix, so as to obtain a target binary matrix.

In other embodiments, the expansion processing module 108 may be further configured to call, by using a pytorch tool, a GPU resource and perform expansion processing on the number of the first valued element in the reference matrix in parallel.

It should be noted that the above-described apparatus may also include other embodiments according to the description of the method embodiment. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.

In the text positioning device provided in one or more embodiments of the present specification, after receiving an original image to be text-positioned, a predicted text region corresponding to a text in the original image and a predicted background region outside the text may be initially segmented to obtain a plurality of intermediate images for different predicted text region areas. And then, converting the intermediate image into a binary matrix, and representing the predicted character area and the predicted background area in the two-dimensional matrix by using different values. And then, carrying out progressive expansion processing on the pixel area occupied by the characters by using the binary matrix. The expansion processing is carried out based on the binary matrix, the data processing in the expansion process can be effectively carried out by utilizing matrix operation and indexes, the complexity of processing time is reduced, the efficiency of pixel expansion processing is greatly improved, the character positioning efficiency is further improved, and the requirement of character recognition application is met.

This specification also provides a text positioning apparatus, which may be a single text positioning apparatus, or may be applied to a variety of computer data processing systems. The system may be a single server, or may include a server cluster, a system (including a distributed system), software (applications), an actual operating device, a logic gate device, a quantum computer, etc. using one or more of the methods or one or more of the example devices of the present specification, in combination with a terminal device implementing hardware as necessary. In some embodiments, the apparatus may include at least one processor and a memory for storing processor-executable instructions that, when executed by the processor, perform steps comprising the method of any one or more of the embodiments described above.

The memory may include physical means for storing information, typically by digitizing the information for storage on a medium using electrical, magnetic or optical means. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.

It should be noted that the above-mentioned system may also include other implementation manners according to the description of the method or apparatus embodiment, and specific implementation manners may refer to the description of the related method embodiment, which is not described in detail herein.

After receiving the original image to be subjected to character positioning, the character positioning device in the above embodiment may initially segment the expected character region corresponding to the characters in the original image and the expected background region outside the characters, so as to obtain a plurality of intermediate images for different expected character region areas. And then, converting the intermediate image into a binary matrix, and representing the predicted character area and the predicted background area in the two-dimensional matrix by using different values. And then, carrying out progressive expansion processing on the pixel area occupied by the characters by using the binary matrix. The expansion processing is carried out based on the binary matrix, the data processing in the expansion process can be effectively carried out by utilizing matrix operation and indexes, the complexity of processing time is reduced, the efficiency of pixel expansion processing is greatly improved, the character positioning efficiency is further improved, and the requirement of character recognition application is met.

It should be noted that the embodiments of the present disclosure are not limited to the cases where the data model/template is necessarily compliant with the standard data model/template or the description of the embodiments of the present disclosure. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using these modified or transformed data acquisition, storage, judgment, processing, etc. may still fall within the scope of the alternative embodiments of the present description.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A character positioning method is applied to a server and comprises the following steps:

acquiring an original image comprising characters;

generating a plurality of intermediate images based on the original image; wherein the intermediate image comprises a color representing a predicted text region and a color representing a predicted background region; the areas of the expected character areas of the plurality of intermediate images are different;

generating a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; taking a binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking a binary matrix corresponding to the intermediate image except the intermediate image with the minimum area of the predicted character region as a constraint matrix; in a binary matrix corresponding to the intermediate image, elements corresponding to the expected text region are first values, and elements corresponding to the expected background region are second values;

selecting a constraint matrix in the sequence that the area of the predicted character region of the intermediate image is sequentially increased, and expanding the number of the first-valued elements in the reference matrix to obtain a target binary matrix; and the region in the original image corresponding to the element of the first value in the target binary matrix is a character region.

2. The method according to claim 1, wherein the expanding the number of the first valued elements in the reference matrix comprises:

determining first position information of an outer boundary of the corresponding expected character area in the reference matrix;

performing the following expansion steps based on the first location information: obtaining the value of an element corresponding to the first position information in the selected constraint matrix; and under the condition that a first value exists in the obtained values, expanding the number of elements of the first value in the reference matrix according to the obtained values.

3. The method according to claim 2, wherein the expanding the number of the first valued elements in the reference matrix comprises:

determining second position information corresponding to the element of the first value in the obtained values;

and updating the value of the element corresponding to the second position information in the reference matrix to be the first value to obtain the expanded reference matrix.

4. The method according to claim 2, wherein the expanding the number of the first valued elements in the reference matrix comprises:

and updating the value of the element corresponding to the first position information in the reference matrix to the value of the element corresponding to the corresponding first position information in the obtained values.

5. The method of claim 2, wherein determining the first location information of the outer boundary of the corresponding expected text region in the reference matrix comprises:

shifting the positions of the first valued elements in the reference matrix to the periphery by one element unit respectively by taking the dimension range of the reference matrix as a limit;

performing matrix operation on the translated reference matrix and the reference matrix before translation to obtain a boundary matrix;

and taking the position information of the first value-taking element in the boundary matrix as the first position information of the outer boundary of the corresponding predicted character area in the reference matrix.

6. The method of claim 2, further comprising:

and replacing the expanded reference matrix with the reference matrix, and repeating the expanding step until no first value exists in the values of the elements corresponding to the first position information in the selected constraint matrix.

7. The method of claim 6, further comprising:

and under the condition that no first value exists in the values of the elements corresponding to the first position information in the selected constraint matrix, obtaining the next constraint matrix of the selected constraint matrix, and repeating the first position information determination and expansion steps on the reference matrix expanded for the last time under the constraint of the selected constraint matrix until no constraint matrix exists, so as to obtain a target binary matrix.

8. The method according to claim 1, wherein the expanding the number of the first valued elements in the reference matrix comprises:

and calling GPU resources by using a pytorech tool and expanding the number of the first valued elements in the reference matrix in parallel.

9. A character positioning device is applied to a server and comprises:

the original image acquisition module is used for acquiring an original image comprising characters;

an intermediate image generation module for generating a plurality of intermediate images based on the original image; wherein the intermediate image comprises a color representing a predicted text region and a color representing a predicted background region; the areas of the expected character areas of the plurality of intermediate images are different;

a binary matrix generation module for generating a plurality of binary matrices corresponding to the plurality of intermediate images, respectively; taking a binary matrix corresponding to the intermediate image with the minimum area of the predicted character region as a reference matrix, and taking a binary matrix corresponding to the intermediate image except the intermediate image with the minimum area of the predicted character region as a constraint matrix; in a binary matrix corresponding to the intermediate image, elements corresponding to the expected text region are first values, and elements corresponding to the expected background region are second values;

the extension processing module is used for selecting a constraint matrix according to the sequence that the area of the expected character area of the intermediate image is increased in sequence, and performing extension processing on the number of the first-valued elements in the reference matrix by using the selected constraint matrix to obtain a target binary matrix; and the region in the original image corresponding to the element of the first value in the target binary matrix is a character region.

10. A word-positioning device for use in a server, the device comprising at least one processor and a memory storing processor-executable instructions which, when executed by the processor, implement steps comprising the method of any one of claims 1 to 8.