CN110807454A

CN110807454A - Character positioning method, device and equipment based on image segmentation and storage medium

Info

Publication number: CN110807454A
Application number: CN201910884634.0A
Authority: CN
Inventors: 孙强
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-02-18
Anticipated expiration: 2039-09-19
Also published as: WO2021051527A1; CN110807454B

Abstract

The invention relates to the field of artificial intelligence and discloses a character positioning method, device and equipment based on image segmentation and a storage medium. The character positioning method based on image segmentation comprises the following steps: acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background; carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is a bill image or a certificate image; carrying out affine transformation on the distorted image to obtain a distortion corrected image, wherein characters in the distortion corrected image are forward characters; and carrying out character positioning on the image after the distortion correction to obtain a positioning result. According to the invention, the image segmentation network processing is carried out on the image under the complex character background to obtain the accurate image foreground image, and the character positioning processing is carried out on the image foreground image to obtain the positioning result, so that the image character positioning accuracy is improved, and the robustness of the complex background is enhanced.

Description

Character positioning method, device and equipment based on image segmentation and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a character positioning method, a character positioning device, character positioning equipment and a storage medium based on image segmentation.

Background

Optical Character Recognition (OCR) refers to a process in which an electronic device checks characters printed on paper, such as a scanner or a digital camera, and then translates shapes into computer characters by using a character recognition method, that is, a process in which text data is scanned, and then image files are analyzed to obtain characters and layout information. OCR includes text positioning and text recognition, where text positioning is the precise positioning of text positions in an image, primarily based on extracting relevant text features.

In the prior art, a special scanner is usually adopted to scan bills and certificates, characters on the bills and the certificates are converted into image information, bill images and certificate images with high image quality are obtained, information in the bill images and the certificate images is converted into computer characters through an OCR technology, and by adopting the mode, the character positioning accuracy of the bill images and the certificate images collected under a complex background is low.

Disclosure of Invention

The invention mainly aims to solve the technical problem of low accuracy of character positioning in an image with a complex character background.

In order to achieve the above object, a first aspect of the present invention provides a text positioning method based on image segmentation, including: acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background; carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image; carrying out affine transformation on the distorted image to obtain an image after distortion correction, wherein characters in the image after distortion correction are forward characters; and carrying out character positioning on the image after the distortion correction to obtain a positioning result.

Optionally, in a first implementation manner of the first aspect of the present invention, the image segmentation is performed on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the ticket image or the certificate image and includes: inputting the original image into a preset image segmentation network model; performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type; and segmenting the original image according to the segmentation label image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image.

Optionally, in a second implementation manner of the first aspect of the present invention, the segmenting the original image according to the segmentation label image to obtain a distorted image, where the distorted image is the ticket image or the certificate image includes: determining a region to be segmented according to the segmentation label image, setting a pixel value in the region to be segmented as 1, and setting a pixel value outside the region to be segmented as 0 to obtain a mask image; and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image separated from the text background in the original image.

Optionally, in a third implementation manner of the first aspect of the present invention, the performing affine transformation on the distorted image to obtain a distortion-corrected image, where characters in the distortion-corrected image that are forward characters include: determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image; determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; calculating to obtain an affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain an image after distortion correction, wherein characters in the image after distortion correction are forward characters.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing character positioning on the image after distortion correction to obtain a positioning result includes: determining a template corresponding to the image after the distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward characters are located according to preset coordinate values; performing character positioning on the image after the distortion correction according to a preset algorithm and the template to obtain a positioning result; and storing the positioning result into a preset file.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the acquiring an original image, where the original image is a ticket image or a certificate image acquired in a text background, includes: receiving a bill image or a certificate image collected under a text background, and setting the bill image or the certificate image as an original image; setting the name of the original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image; and writing the storage path of the original image and the name of the original image into a target data table.

Optionally, in a sixth implementation manner of the first aspect of the present invention, after performing text positioning on the image after the distortion correction to obtain a positioning result, the text positioning method based on image segmentation includes: determining a new bill image or certificate image; setting the newly added bill image or certificate image as a sample image to be trained; and performing iterative optimization on the preset image segmentation network model according to the sample image to be trained.

The second aspect of the present invention provides a character positioning device based on image segmentation, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original image, and the original image is a bill image or a certificate image acquired under a text background; the segmentation unit is used for carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image; the transformation unit is used for carrying out affine transformation on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters; and the positioning unit is used for carrying out character positioning on the image after the distortion correction to obtain a positioning result.

Optionally, in a first implementation manner of the second aspect of the present invention, the dividing unit further includes: the input subunit is used for inputting the original image into a preset image segmentation network model; the first segmentation subunit is used for performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type; and the second segmentation subunit is used for segmenting the original image according to the segmentation label image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image.

Optionally, in a second implementation manner of the second aspect of the present invention, the second segmentation subunit is specifically configured to: determining a region to be segmented according to the segmentation label image, setting a pixel value in the region to be segmented as 1, and setting a pixel value outside the region to be segmented as 0 to obtain a mask image; and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image separated from the text background in the original image.

Optionally, in a third implementation manner of the second aspect of the present invention, the transformation unit is specifically configured to: determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image; determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; calculating to obtain an affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain an image after distortion correction, wherein characters in the image after distortion correction are forward characters.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the positioning unit is specifically configured to: determining a template corresponding to the image after the distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward characters are located according to preset coordinate values; performing character positioning on the image after the distortion correction according to a preset algorithm and the template to obtain a positioning result; and storing the positioning result into a preset file.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the obtaining unit is specifically configured to: receiving a bill image or a certificate image collected under a text background, and setting the bill image or the certificate image as an original image; setting the name of the original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image; and writing the storage path of the original image and the name of the original image into a target data table.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the text positioning apparatus based on image segmentation further includes: the determining unit is used for determining the bill image or the certificate image of the newly added type; the setting unit is used for setting the newly added bill image or certificate image as a sample image to be trained; and the iteration unit is used for performing iterative optimization on the preset image segmentation network model according to the sample image to be trained.

The third aspect of the present invention provides a character positioning apparatus based on image segmentation, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the image segmentation based text positioning apparatus to perform the method of the first aspect.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In the technical scheme provided by the invention, an original image is obtained, wherein the original image is a bill image or a certificate image collected under a text background; carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image; carrying out affine transformation on the distorted image to obtain an image after distortion correction, wherein characters in the image after distortion correction are forward characters; and carrying out character positioning on the image after the distortion correction to obtain a positioning result. In the embodiment of the invention, the accurate image foreground image is obtained by carrying out image segmentation network processing on the image under the complex background, and the character positioning processing is carried out on the image foreground image according to the preset template to obtain the positioning result, thereby improving the image character positioning accuracy and enhancing the robustness of the complex background.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a text positioning method based on image segmentation according to an embodiment of the present invention;

FIG. 2 is a diagram of another embodiment of a text positioning method based on image segmentation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a text positioning apparatus based on image segmentation according to the present invention;

FIG. 4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of a text positioning apparatus based on image segmentation in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a character positioning method, a character positioning device, character positioning equipment and a storage medium based on image segmentation, which are used for performing image segmentation network processing on an image under a complex background to obtain an accurate image foreground image, and performing character positioning processing on the image foreground image according to a preset template to obtain a positioning result, so that the image character positioning accuracy is improved, and the robustness of the complex background is enhanced.

In order to make the technical field of the invention better understand the scheme of the invention, the embodiment of the invention will be described in conjunction with the attached drawings in the embodiment of the invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a text positioning method based on image segmentation in an embodiment of the present invention includes:

101. acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

the server acquires an original image, wherein the original image is a bill image or a certificate image acquired under a text background. The character background with strong interference exists in the original image, and the character background with strong interference means that character targets, especially handwritten numbers and printed characters, exist in the background of the original image, so that the difficulty of directly positioning the characters in the original image is increased. Specifically, the server receives a bill image or a certificate image collected under a text background, and sets the bill image or the certificate image as an original image; the server stores the original image into a preset path according to a preset format and records the storage path of the original image in a data table.

It can be understood that the server stores the original image into the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image. The preset format includes a preset naming rule and a picture format, the picture format is jpg, png or other types of picture formats, and the specific details are not limited herein. After the server names the original image according to the preset format, the server places the original image in a preset path, wherein the preset path is a file directory appointed in advance. For example, the server receives the original image, which is the identification card image, and the server names the identification card image as car 1.jpg, and stores the car 1.jpg under the directory/var/www/html/ID.

102. Carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is a bill image or a certificate image;

and the server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is a bill image or a certificate image. Specifically, the server performs image segmentation on an original image according to a preset image segmentation network model to obtain a segmentation label image; the server determines a mask image according to the segmentation label image and processes the original image according to the mask image to obtain a distorted image, wherein the distorted image is a local image obtained after the server separates a complex background in the original image, the local image is in a trapezoid shape, and the local image comprises a bill image or a certificate image.

It can be understood that the server trains the image segmentation network model according to the preset samples, determines parameters in the image segmentation network model, and obtains a preset image segmentation network model, and the preset image segmentation network model is used for performing image segmentation on the original image.

103. Carrying out affine transformation on the distorted image to obtain a distortion corrected image, wherein characters in the distortion corrected image are forward characters;

and the server performs affine transformation on the distorted image to obtain a distortion-corrected image, wherein characters in the distortion-corrected image are forward characters. The forward character is a character which is forward on the horizontal reference and is not turned upside down, that is, distorted images of 90 degrees, 180 degrees and 270 degrees deviated from the horizontal reference are corrected to be 0 degree deviated from the horizontal reference, so that the character in the image after the distortion correction is the forward character. Specifically, the server determines an affine transformation rule corresponding to the distorted image; and the server performs affine transformation on the distorted image according to the mapping rule and the preset size to obtain the image after distortion correction. It can be understood that the distorted image is a trapezoid image, the server performs distortion correction on the distorted image according to affine transformation to obtain a distortion-corrected image, characters in the distortion-corrected image are forward, and the size of the distortion-corrected image is a preset fixed value and is consistent with the size of the template corresponding to the distorted image.

It should be noted that the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, a point on the original image is mapped to a corresponding point on the target image, including rotation, translation, scaling and shearing of the original image.

104. And carrying out character positioning on the image after the distortion correction to obtain a positioning result.

And the server performs character positioning on the image after the distortion correction to obtain a positioning result. Specifically, the server performs character positioning processing on the image after distortion correction according to a preset algorithm and a template to obtain a positioning result. The template comprises at least one rectangular frame, the rectangular frame is used for indicating a position area where the forward characters are located according to a preset coordinate value mark, the positioning result is character positioning coordinate information selected from the image after distortion correction, and the number of the character positioning coordinate information is equal to that of the rectangular frame. For example: and further, the server determines a positioning result according to preset coordinate values of the two rectangular frames, and the positioning result comprises the preset coordinate values of the certain rural commercial bank and the transfer check and the two rectangular frames.

It can be understood that if the original image is directly labeled, each character in the original image area needs to be labeled, and meanwhile, in order to avoid character background interference, a large number of original images containing different character backgrounds need to be collected, and when a new bill type is added, the labeling is continued. For example, a bank bill has n characters and m backgrounds, and n × m labeling is performed before, and now the workload is n + m. And m is large, the stronger the adaptability of the positioning image to a complex background is, the stronger the robustness is, wherein m is related to image segmentation processing, and a large number of sample images are subjected to enhancement training.

In the embodiment of the invention, the accurate image foreground image is obtained by carrying out image segmentation network processing on the image under the complex background, and the character positioning processing is carried out on the image foreground image according to the preset template to obtain the positioning result, thereby improving the image character positioning accuracy and enhancing the robustness of the complex background.

Referring to fig. 2, another embodiment of the text positioning method based on image segmentation in the embodiment of the present invention includes:

201. acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

the server acquires an original image, wherein the original image is a bill image or a certificate image acquired under a text background. Specifically, the server receives a bill image or a certificate image collected under a text background, and sets the bill image or the certificate image as an original image; the server sets the name of an original image according to a preset format, and stores the original image into a preset path to obtain a storage path of the original image, wherein the preset path is a preset file directory, the preset format comprises a preset naming rule and a picture format, the picture format is jpg, png or other types of picture formats, and the specific situation is not limited herein; the server writes the storage path of the original image and the name of the original image into the target data table.

For example, the server receives a bank bill image, sets the bank bill image as an original image, names the original image as bank1.jpg, and then stores the bank1.jpg in a catalog/var/www/html/bankimage; the server writes the storage path of the original image and the name of the original image into a target data table, for example, the name of the original image is bank1.jpg, the storage path of the original image is/var/www/html/bankimage/bank 1.jpg, and the server generates a structured data query language (SQL) insert statement according to the storage path of the original image and the name of the original image and writes the SQL insert statement into the target data table.

It should be noted that a text background with strong interference exists in the original image, where the text background with strong interference means that text objects, especially handwritten numbers and printed text, exist in the background of the original image, and if the text in the original image is directly located, the locating difficulty is large.

202. Inputting an original image into a preset image segmentation network model, and performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type;

the server inputs the original image into a preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type. Further, the server performs image semantic segmentation on the original image by using a preset deplab 3+ model, and it can be understood that the preset deplab 3+ model is a preset image segmentation network model. The server performs semantic image segmentation on the original image through a preset depeplabv 3+ model, and the main purpose of the semantic image segmentation is to assign a semantic label to each pixel of the original image, that is, the numerical value of each pixel in the segmentation label image represents the type of the pixel.

It should be noted that deplab 3+ is an apical depth learning model for semantic segmentation of images, with the goal of assigning a semantic label to each pixel of the input image, and deplab 3+ includes a simple and efficient decoder module that improves the segmentation result.

203. Segmenting the original image according to the segmentation label image to obtain a distorted image, wherein the distorted image is a bill image or a certificate image;

and the server divides the original image according to the division label image to obtain a distorted image, wherein the distorted image is a bill image or a certificate image. Specifically, the server determines a region to be segmented according to the segmentation label image, sets a pixel value in the region to be segmented to be 1, and sets a pixel value outside the region to be segmented to be 0 to obtain a mask image; and the server multiplies the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating a bill image or a certificate image separated from the text background in the original image.

Optionally, the server root compares the original image with the segmentation label image to obtain a comparison result, and determines the original image as a region to be segmented according to the comparison result; the server divides the area to be divided to obtain a distorted image, wherein the distorted image is a bill image or a certificate image; the server stores the distorted image.

It should be noted that, because a plurality of certificates may exist in the original image, the file is finally saved as a foreground four-point coordinate image with the same name as the original image, for example, the server performs image segmentation processing on the certificate image with the name of image1.png to obtain eight coordinate points of two certificate foreground images, and the server digitally saves the two certificate foreground images, and the file content is as follows:

1| coordinate 1, coordinate 2, coordinate 3, coordinate 4

2| coordinate 1, coordinate 2, coordinate 3, coordinate 4

204. Carrying out affine transformation on the distorted image to obtain a distortion corrected image, wherein characters in the distortion corrected image are forward characters;

and the server performs affine transformation on the distorted image to obtain a distortion-corrected image, wherein characters in the distortion-corrected image are forward characters. The forward character is a character which is forward on the horizontal reference and is not turned upside down, that is, distorted images of 90 degrees, 180 degrees and 270 degrees deviated from the horizontal reference are corrected to be 0 degree deviated from the horizontal reference, so that the character in the image after the distortion correction is the forward character. Specifically, the server determines a standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server calculates to obtain an affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; the server carries out affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correctionThe character in the image after the distortion correction is a forward character. For example, the server determines three pixel reference point coordinates D (x) from a standard image of the identification card₁,y₁)、E(x₂,y₂) And F (x)₃,y₃) The server determines from the distorted image the corresponding pixel coordinates D ' (x ') based on the three pixel reference point coordinates D, E and F '₁,y'₁)、E'(x'₂,y'₂) And F '(x'₃,y'₃) And the server calculates according to a homogeneous coordinate formula, which is as follows:

wherein (x, y) corresponds to the pixel coordinates of the distorted image, (u, v) corresponds to the three pixel reference point coordinates of the standard image of the identification card, and the server compares D' (x)₁',y₁')、E'(x'₂,y'₂)、F'(x'₃,y'₃) And D (x)₁,y₁)、E(x₂,y₂)、F(x₃,y₃) And sequentially substituting the affine transformation matrix into a homogeneous coordinate formula for calculation to obtain an affine transformation matrix, namely determining values of variables a, b, c, d, e and f of the affine transformation matrix by the server, carrying out affine transformation on the distorted image by the server according to the affine transformation matrix to obtain the identity card image after distortion correction, wherein the corresponding size of the identity card image after distortion correction is 85.6 mm multiplied by 54 mm. It is understood that when affine transforming the distorted image, the server also determines the rotation direction and rotation angle so that the text in the distortion-corrected image is forward.

It should be noted that, the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), the distorted image is an image of a trapezoid, the affine transformation is a mapping of a point on the original image to a corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally the distorted image is transformed from the irregular quadrilateral into a rectangle.

205. Determining a template corresponding to the image after distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward characters are located according to a preset coordinate value mark;

and the server determines a template corresponding to the image after distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward characters are located according to preset coordinate values. The rectangular frame is a rectangular area formed by 4 point coordinates, for example, a template corresponding to the front horizontal forward image of the identity card comprises 6 rectangular frames of name, gender, ethnicity, date of birth, address and national identity card number; the corresponding template of the horizontal forward image on the front surface of the bank card comprises 1 rectangular frame of the bank card number.

It should be noted that the size of the template corresponding to the image after distortion correction is the same as the size of the image after distortion correction, the template includes a rectangular frame indicating a position area where the forward text is located according to a preset coordinate value, and after the server matches the image after distortion correction according to the image type to obtain the template, the server further determines the text of the image after distortion correction according to the rectangular frame in the template.

206. Performing character positioning on the image subjected to distortion correction according to a preset algorithm and a template to obtain a positioning result;

and the server performs character positioning on the image after the distortion correction according to a preset algorithm and a template to obtain a positioning result. Specifically, the server determines the position information of a strip-shaped object to be segmented of the image after distortion correction according to a preset algorithm and a template, wherein the position information of the strip-shaped object comprises an upper left point coordinate, a lower right point coordinate and corresponding characters of a corresponding area, a character positioning rule follows the sequence from the upper left coordinate to the lower right coordinate, the image after distortion correction is sequentially scanned line by line, and the same type information of the same line is simultaneously positioned; and the server sets the coordinates of the upper left point, the coordinates of the lower right point and the corresponding characters as positioning results. For example, the server performs text positioning on the name area of the identity card, and the obtained text positioning result comprises coordinates (13, 14) of an upper left point, coordinates (744, 49) of a lower right point and the name.

Optionally, the server selects a text region frame of the image after distortion correction by using a PixelLink algorithm. PixelLink proposes example segmentation to implement text detection, and performs two pixel predictions, namely text/non-text prediction and link prediction, based on a deep neural network algorithm DNN. Specifically, the server marks text pixels in the image after distortion correction as positive according to a PixelLink algorithm, and marks non-text pixels of the image after distortion correction as negative; the server judges whether the given pixel and an adjacent pixel of the pixel are positioned in the same instance or not; if a given pixel and a neighboring pixel of the pixel are in the same instance, the server marks the link between them as positive; if a given pixel and one of its neighbors are not in the same instance, the server marks the link between them as negative, with 8 neighbors per pixel. The predicted positive pixels are connected in connected component CCs through predicted forward links, each CC represents a detected text, the server finally obtains a boundary box of each connected component as a final detection result, and the server sets coordinate information of the final detection result as a positioning result.

207. And storing the positioning result into a preset file.

And the server puts the positioning result into a preset file. Specifically, the server locates the image after the distortion correction to obtain a plurality of locating rectangular areas, records the coordinates of the upper left point and the lower right point of each locating rectangular area, and stores a plurality of locating results in a txt format. For example, the service performs text positioning on a certain rural commercial bank, the obtained positioning result includes 6 rectangular boxes and text information obtained by positioning the rectangular boxes, and the server stores the text information into sds _0.txt file, where the file content is as follows:

standard_build/sds_0.png|13 14 744 49|

standard_build/sds_0.png|22 52 645 88|

standard_build/sds_0.png|12 94 446 130|

standard_build/sds_0.png|28 135 775 170|

standard_build/sds_0.png|13 177 544 212|

standard_build/sds_0.png|22 217 348 252|；

it should be noted that the positioning result in the sds _0.txt file can be further used for text recognition, and the positioning result includes a preset flag for prompting the text recognition to discard the line, for example, for the positioning result standard _ built/sds _0.png | 131474449 | XXXX, where XXXX is a preset flag for indicating that the server does not perform text recognition, and the positioning result can also be marked with other types of preset flags, which is not limited herein.

Optionally, the server determines a new type of bill image or certificate image; the server sets the newly added bill image or certificate image as a sample image to be trained; and the server performs iterative optimization on the preset image segmentation network according to the sample image to be trained. For example, the current bill types include 1 to 10 types, when the addition to the 11 types is detected, the bill images of the newly added type are set as sample images to be trained, and the image segmentation network is subjected to iterative optimization according to the 11 th type bill images. It can be understood that, before the iterative optimization is performed on the preset image segmentation network, the parameters in the preset image segmentation network are frozen, and then the iterative optimization is performed.

In the above description of the text positioning method based on image segmentation in the embodiment of the present invention, the following description of the text positioning device based on image segmentation in the embodiment of the present invention refers to fig. 3, and an embodiment of the text positioning device based on image segmentation in the embodiment of the present invention includes:

an acquiring unit 301, configured to acquire an original image, where the original image is a ticket image or a certificate image acquired in a text background;

the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;

a transformation unit 303, configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, where characters in the distortion-corrected image are forward characters;

and the positioning unit 304 is configured to perform character positioning on the image after the distortion correction to obtain a positioning result.

Referring to fig. 4, another embodiment of the text positioning device based on image segmentation in the embodiment of the present invention includes:

Optionally, the segmentation unit 302 may further include:

an input subunit 3021, configured to input an original image into a preset image segmentation network model;

the first segmentation subunit 3022 is configured to perform image semantic segmentation on the original image through a preset image segmentation network model to obtain a segmentation label image and an image type;

and the second dividing subunit 3023 is configured to divide the original image according to the division label image to obtain a distorted image, where the distorted image is a bill image or a certificate image.

Optionally, the second dividing subunit 3023 may be further specifically configured to:

determining a region to be segmented according to the segmentation label image, setting a pixel value in the region to be segmented as 1, and setting a pixel value outside the region to be segmented as 0 to obtain a mask image;

and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating a bill image or a certificate image separated from the text background in the original image.

Optionally, the transformation unit 303 may be further specifically configured to:

determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;

determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;

calculating to obtain an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;

and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correction.

Optionally, the positioning unit 304 may be further specifically configured to:

determining a template corresponding to the image after distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward characters are located according to a preset coordinate value mark;

performing character positioning on the image subjected to distortion correction according to a preset algorithm and a template to obtain a positioning result;

and storing the positioning result into a preset file.

Optionally, the obtaining unit 301 may be further specifically configured to:

receiving a bill image or a certificate image collected under a text background, and setting the bill image or the certificate image as an original image;

setting the name of an original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image;

and writing the storage path of the original image and the name of the original image into the target data table.

Optionally, the text positioning apparatus based on image segmentation may further include:

a determination unit 305 for determining a ticket image or a certificate image of the newly added type;

a setting unit 306, configured to set a new type of ticket image or certificate image as a sample image to be trained;

and the iteration unit 307 is configured to perform iterative optimization on the preset image segmentation network model according to the sample image to be trained.

The text positioning device based on image segmentation in the embodiment of the present invention is described in detail in the above fig. 3 and fig. 4 from the perspective of the modular functional entity, and the text positioning device based on image segmentation in the embodiment of the present invention is described in detail in the following from the perspective of hardware processing.

Fig. 5 is a schematic structural diagram of a text positioning apparatus based on image segmentation according to an embodiment of the present invention, where the text positioning apparatus 500 based on image segmentation may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 501 (e.g., one or more processors) and a memory 509, and one or more storage media 508 (e.g., one or more mass storage devices) storing an application 509 or data 509. Memory 509 and storage medium 508 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a sequence of instruction operations for text positioning based on image segmentation. Still further, the processor 501 may be configured to communicate with the storage medium 508 to execute a series of instruction operations in the storage medium 508 on the image segmentation based text pointing device 500.

The image segmentation based text positioning apparatus 500 may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input-output interfaces 504, and/or one or more operating systems 505, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will appreciate that the image segmentation based text positioning device architecture shown in fig. 5 does not constitute a limitation of image segmentation based text positioning devices and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A character positioning method based on image segmentation is characterized by comprising the following steps:

acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image;

carrying out affine transformation on the distorted image to obtain an image after distortion correction, wherein characters in the image after distortion correction are forward characters;

and carrying out character positioning on the image after the distortion correction to obtain a positioning result.

2. The method of claim 1, wherein the image segmentation of the original image by a preset image segmentation network model to obtain a distorted image, and the image segmentation of the distorted image is the bill image or the certificate image, and the method comprises:

inputting the original image into a preset image segmentation network model;

performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type;

and segmenting the original image according to the segmentation label image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image.

3. The method of claim 2, wherein the segmenting the original image according to the segmentation label image to obtain a distorted image, and the distorting image is the bill image or the certificate image, and comprises:

and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image separated from the text background in the original image.

4. The method according to claim 2, wherein the affine transformation is performed on the distorted image to obtain a distortion-corrected image, and the distortion-corrected image includes a text in a forward direction:

determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates;

calculating to obtain an affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates;

and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain an image after distortion correction, wherein characters in the image after distortion correction are forward characters.

5. The method of claim 4, wherein the performing text positioning on the distortion-corrected image to obtain a positioning result comprises:

determining a template corresponding to the image after distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward characters are located according to preset coordinate values;

performing character positioning on the image after the distortion correction according to a preset algorithm and the template to obtain a positioning result;

and storing the positioning result into a preset file.

6. The method of claim 1, wherein the acquiring an original image, the original image being a document image or a receipt image captured in a text background, comprises:

setting the name of the original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image;

and writing the storage path of the original image and the name of the original image into a target data table.

7. The method according to any one of claims 1 to 6, wherein after the distortion-corrected image is subjected to character positioning to obtain a positioning result, the method comprises:

determining a new bill image or certificate image;

setting the newly added bill image or certificate image as a sample image to be trained;

and performing iterative optimization on the preset image segmentation network model according to the sample image to be trained.

8. A character positioning device based on image segmentation is characterized in that the character positioning device based on image segmentation comprises:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original image, and the original image is a bill image or a certificate image acquired under a text background;

the segmentation unit is used for carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image;

the transformation unit is used for carrying out affine transformation on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters;

and the positioning unit is used for carrying out character positioning on the image after the distortion correction to obtain a positioning result.

9. An image segmentation-based text positioning device, the image segmentation-based text positioning device comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the image segmentation based text positioning device to perform the method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program realizing the steps of the method according to any one of claims 1-7 when executed by a processor.