CN110807454B

CN110807454B - Text positioning method, device, equipment and storage medium based on image segmentation

Info

Publication number: CN110807454B
Application number: CN201910884634.0A
Authority: CN
Inventors: 孙强
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2024-05-14
Anticipated expiration: 2039-09-19
Also published as: CN110807454A; WO2021051527A1

Abstract

The invention relates to the field of artificial intelligence and discloses a text positioning method, device and equipment based on image segmentation and a storage medium. The text positioning method based on image segmentation comprises the following steps: acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background; image segmentation is carried out on an original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is a bill image or a certificate image; affine transformation is carried out on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters; and performing text positioning on the image after distortion correction to obtain a positioning result. According to the invention, through carrying out image segmentation network processing on the image under the complex text background, an accurate image foreground image is obtained, and carrying out text positioning processing on the image foreground image, a positioning result is obtained, the accuracy of image text positioning is improved, and the robustness of the complex background is enhanced.

Description

Text positioning method, device, equipment and storage medium based on image segmentation

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a text positioning method, apparatus, device, and storage medium based on image segmentation.

Background

Optical character recognition (optical character recognition, OCR) refers to the process of electronic devices inspecting characters printed on paper, such as scanners or digital cameras, and then translating the shapes into computer text using a character recognition method, i.e., scanning text material, and then analyzing and processing image files to obtain text and layout information. OCR involves text localization, which is the accurate localization of text locations in an image, based primarily on extracting relevant text features, and text recognition.

In the prior art, a special scanner is generally adopted to scan bills and certificates, characters on the bills and the certificates are converted into image information, so that high-quality bill images and high-quality certificate images are obtained, the information in the bill images and the certificate images is converted into computer characters through an OCR technology, and the accuracy of character positioning on the bill images and the certificate images acquired under a complex background is low in the mode.

Disclosure of Invention

The invention mainly aims to solve the technical problem of low text positioning accuracy rate in the image of the complex text background.

In order to achieve the above object, a first aspect of the present invention provides a text positioning method based on image segmentation, including: acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background; performing image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image; affine transformation is carried out on the distorted image to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters; and performing text positioning on the image after distortion correction to obtain a positioning result.

Optionally, in a first implementation manner of the first aspect of the present invention, the image segmentation is performed on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the ticket image or the certificate image, and the method includes: inputting the original image into a preset image segmentation network model; performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type; and dividing the original image according to the division tag image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image.

Optionally, in a second implementation manner of the first aspect of the present invention, the dividing the original image according to the division tag image to obtain a distorted image, where the distorted image is the ticket image or the certificate image includes: determining a region to be segmented according to the segmentation tag image, setting the pixel value in the region to be segmented to be 1, and setting the pixel value outside the region to be segmented to be 0 to obtain a mask image; and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image which is separated from the text background in the original image.

Optionally, in a third implementation manner of the first aspect of the present invention, affine transforming the distorted image to obtain a distortion corrected image, where characters in the distortion corrected image are forward characters includes: determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image; determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; calculating affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing text positioning on the image after distortion correction to obtain a positioning result includes: determining a template corresponding to the distortion corrected image according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward text is identified according to a preset coordinate value; performing text positioning on the image after distortion correction according to a preset algorithm and the template to obtain a positioning result; and storing the positioning result into a preset file.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the acquiring an original image, where the original image is a ticket image or a certificate image acquired in a text background includes: receiving a bill image or a certificate image collected under a text background, and setting the bill image or the certificate image as an original image; setting the name of the original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image; and writing the storage path of the original image and the name of the original image into a target data table.

Optionally, in a sixth implementation manner of the first aspect of the present invention, after performing text positioning on the image after distortion correction to obtain a positioning result, the text positioning method based on image segmentation includes: determining a newly added type bill image or certificate image; setting the newly added bill image or certificate image as a sample image to be trained; and carrying out iterative optimization on the preset image segmentation network model according to the sample image to be trained.

The second aspect of the present invention provides a text positioning device based on image segmentation, comprising: the acquisition unit is used for acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background; the segmentation unit is used for carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image; the transformation unit is used for carrying out affine transformation on the distorted image to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters; and the positioning unit is used for performing text positioning on the image after distortion correction to obtain a positioning result.

Optionally, in a first implementation manner of the second aspect of the present invention, the dividing unit further includes: an input subunit, configured to input the original image into a preset image segmentation network model; the first segmentation subunit is used for carrying out image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation tag image and an image type; and the second segmentation subunit is used for segmenting the original image according to the segmentation tag image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image.

Optionally, in a second implementation manner of the second aspect of the present invention, the second dividing subunit is specifically configured to: determining a region to be segmented according to the segmentation tag image, setting the pixel value in the region to be segmented to be 1, and setting the pixel value outside the region to be segmented to be 0 to obtain a mask image; and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image which is separated from the text background in the original image.

Optionally, in a third implementation manner of the second aspect of the present invention, the transformation unit is specifically configured to: determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image; determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; calculating affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the positioning unit is specifically configured to: determining a template corresponding to the distortion corrected image according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the forward text is identified according to a preset coordinate value; performing text positioning on the image after distortion correction according to a preset algorithm and the template to obtain a positioning result; and storing the positioning result into a preset file.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the acquiring unit is specifically configured to: receiving a bill image or a certificate image collected under a text background, and setting the bill image or the certificate image as an original image; setting the name of the original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image; and writing the storage path of the original image and the name of the original image into a target data table.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the text positioning device based on image segmentation further includes: the determining unit is used for determining the bill image or the certificate image of the newly added type; the setting unit is used for setting the newly added bill image or the certificate image as a sample image to be trained; and the iteration unit is used for carrying out iteration optimization on the preset image segmentation network model according to the sample image to be trained.

A third aspect of the present invention provides a text positioning apparatus based on image segmentation, comprising: the system comprises a memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor invokes the instructions in the memory to cause the image segmentation based text positioning device to perform the method of the first aspect described above.

A fourth aspect of the invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In the technical scheme provided by the invention, an original image is acquired, wherein the original image is a bill image or a certificate image acquired under a text background; performing image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image; affine transformation is carried out on the distorted image to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters; and performing text positioning on the image after distortion correction to obtain a positioning result. In the embodiment of the invention, the accurate image foreground image is obtained by carrying out image segmentation network processing on the image under the complex background, and the text positioning processing is carried out on the image foreground image according to the preset template, so that the positioning result is obtained, the accuracy of the text positioning of the image is improved, and the robustness of the complex background is enhanced.

Drawings

FIG. 1 is a diagram of an embodiment of a text positioning method based on image segmentation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of another embodiment of a text positioning method based on image segmentation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a text positioning device based on image segmentation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a text positioning method, device, equipment and storage medium based on image segmentation, which are used for obtaining an accurate image foreground image by carrying out image segmentation network processing on an image under a complex background, carrying out text positioning processing on the image foreground image according to a preset template to obtain a positioning result, improving the accuracy of image text positioning and enhancing the robustness of the complex background.

In order to enable those skilled in the art to better understand the present invention, embodiments of the present invention will be described below with reference to the accompanying drawings.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, and an embodiment of a text positioning method based on image segmentation in an embodiment of the present invention includes:

101. Acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

The server acquires an original image, wherein the original image is a bill image or a certificate image acquired under a text background. The text background with stronger interference exists in the original image, which means that text targets, especially handwritten numbers and printed text, exist in the background of the original image, so that the difficulty of directly positioning the text in the original image is increased. Specifically, the server receives a bill image or a certificate image collected under a text background and sets the bill image or the certificate image as an original image; the server stores the original image in a preset path according to a preset format, and records the stored path of the original image in a data table.

It can be understood that the server stores the original image in the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image. The preset formats include preset naming rules and picture formats, which are jpg, png or other types of picture formats, and are not limited herein. After naming the original image according to the preset format, the server places the original image in a preset path, wherein the preset path is a file directory appointed in advance. For example, the server receives an original image, the original image being an identification card image, the server names the identification card image as card1.Jpg, and stores the card1.Jpg under catalog/var/www/html/ID.

102. Image segmentation is carried out on an original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is a bill image or a certificate image;

The server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is a bill image or a certificate image. Specifically, the server performs image segmentation on an original image according to a preset image segmentation network model to obtain a segmentation label image; the server determines a mask image according to the segmentation tag image, and processes the original image according to the mask image to obtain a distorted image, wherein the distorted image is a local image obtained after the server separates a complex background in the original image, the local image is in a trapezoid shape, and the local image comprises a bill image or a certificate image.

It can be understood that the server trains the image segmentation network model according to the preset sample, determines parameters in the image segmentation network model, and obtains the preset image segmentation network model, wherein the preset image segmentation network model is used for carrying out image segmentation on the original image.

103. Affine transformation is carried out on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters;

The server carries out affine transformation on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters. The forward text refers to text which takes a horizontal reference as a forward direction and is not upside down, namely, the distorted images of 90 degrees, 180 degrees and 270 degrees which deviate from the horizontal reference are corrected to be 0 degrees which deviate from the horizontal reference, so that the text in the image after distortion correction is the forward text. Specifically, the server determines affine transformation rules corresponding to the distorted images; and carrying out affine transformation on the distorted image by the server according to the mapping rule and the preset size to obtain the image after distortion correction. It can be understood that the distorted image is a trapezoid image, the server performs distortion correction on the distorted image according to affine transformation to obtain a distorted corrected image, characters in the distorted corrected image are forward, and the size of the distorted corrected image is a preset fixed value and is consistent with the size of a template corresponding to the distorted image.

It should be noted that affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, mapping a point on the original to a corresponding point on the target, including rotation, translation, scaling, and shearing of the original.

104. And performing text positioning on the image after distortion correction to obtain a positioning result.

And the server performs text positioning on the image after distortion correction to obtain a positioning result. Specifically, the server performs text positioning processing on the image after distortion correction according to a preset algorithm and a template to obtain a positioning result. The template comprises at least one rectangular frame, the rectangular frame is used for indicating a position area where the forward text is located according to preset coordinate values, a positioning result is text positioning coordinate information selected from the distortion corrected image, and the number of the text positioning coordinate information is equal to that of the rectangular frames. For example: aiming at a certain rural commercial bank and a transfer check in the bill image after distortion correction, the server is matched to obtain a corresponding template, two rectangular frames exist in the template and are used for indicating the certain rural commercial bank and the transfer check, and further, the server determines a positioning result according to preset coordinate values of the two rectangular frames, wherein the positioning result comprises preset coordinate values of the certain rural commercial bank and the transfer check and the two rectangular frames.

It can be understood that if the original image is directly marked, each text in the original image area is marked, and meanwhile, in order to avoid text background interference, a large number of original images containing different text backgrounds are collected, and when the bill variety is newly added, the marking is continued. For example, a bank note has n characters and m backgrounds, and conventionally, n x m labels are required, and the labeling workload is n+m. And the stronger the adaptability of the positioning image to the complex background is, the stronger the robustness is, wherein m is related to image segmentation processing, and the enhancement training is carried out on a large number of sample images.

In the embodiment of the invention, the accurate image foreground image is obtained by carrying out image segmentation network processing on the image under the complex background, and the text positioning processing is carried out on the image foreground image according to the preset template, so that the positioning result is obtained, the accuracy of the text positioning of the image is improved, and the robustness of the complex background is enhanced.

Referring to fig. 2, another embodiment of a text positioning method based on image segmentation in an embodiment of the present invention includes:

201. acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

The server acquires an original image, wherein the original image is a bill image or a certificate image acquired under a text background. Specifically, the server receives a bill image or a certificate image collected under a text background and sets the bill image or the certificate image as an original image; the server sets the name of the original image according to a preset format, and stores the original image in a preset path to obtain a storage path of the original image, wherein the preset path is a preset file directory, the preset format comprises a preset naming rule and a picture format, the picture format is jpg, png or other types of picture formats, and the method is not limited in detail herein; the server writes the storage path of the original image and the name of the original image into the target data table.

For example, the server receives the bank note image and sets the bank note image as an original image, and names the original image as a bank1.Jpg, and then the server stores the bank1.Jpg under catalog/var/www/html/bankimage; the server writes the original image storage path and the name of the original image into a target data table, for example, the name of the original image is bank1.Jpg, the storage path of the original image is/var/www/html/bankimage/bank 1.Jpg, the server generates a structured data query language SQL insert sentence according to the storage path of the original image and the name of the original image, and writes the structured data query language SQL insert sentence into the target data table according to the SQL insert sentence.

It should be noted that, there is a text background with strong interference in the original image, where the text background with strong interference refers to that there is a text target, especially a handwritten number and a printed text, in the original image background, if the text in the original image is directly located, the locating difficulty is high.

202. Inputting an original image into a preset image segmentation network model, and performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type;

The server inputs the original image into a preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type. Further, the server performs image semantic segmentation on the original image by using a preset deeplabv3+ model, and it can be understood that the preset deeplabv3+ model is a preset image segmentation network model. The main purpose of the server for carrying out semantic image segmentation on the original image through a preset deeplabv & lt3+ & gt model is to assign semantic tags to each pixel of the original image, namely the numerical value of each pixel point in the segmented tag image represents the type of the pixel point.

Note that deeplabv3+ is a top-depth learning model for semantic segmentation of images, whose goal is to assign semantic tags to each pixel of the input image, deeplabv3+ includes a simple and efficient decoder module that improves the segmentation results.

203. Dividing the original image according to the divided tag image to obtain a distorted image, wherein the distorted image is a bill image or a certificate image;

The server segments the original image according to the segmentation tag image to obtain a distorted image, wherein the distorted image is a bill image or a certificate image. Specifically, the server determines a region to be segmented according to the segmentation tag image, sets the pixel value in the region to be segmented to be 1, and sets the pixel value outside the region to be segmented to be 0, so as to obtain a mask image; the server multiplies the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating a bill image or a certificate image separated from a text background in the original image.

Optionally, the server root compares the original image with the segmentation label image to obtain a comparison result, and determines the comparison result as a region to be segmented; the server segments the region to be segmented to obtain a distorted image, wherein the distorted image is a bill image or a certificate image; the server stores the distorted image.

It should be noted that, because there may be multiple certificates in the original image, the final saved file is a foreground four-point coordinate image with the same name as the original image, for example, the server performs image segmentation processing on the certificate image with the name of image1.Png to obtain eight coordinate points of two certificate foreground images, and the server digitally saves the two certificate foreground images, where the file content is as follows:

1|coordinate 1, coordinate 2, coordinate 3, coordinate 4

2|Coordinate 1, coordinate 2, coordinate 3, coordinate 4

204. Affine transformation is carried out on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters;

The server carries out affine transformation on the distorted image to obtain an image after distortion correction, and characters in the image after distortion correction are forward characters. The forward text refers to text which takes a horizontal reference as a forward direction and is not upside down, namely, the distorted images of 90 degrees, 180 degrees and 270 degrees which deviate from the horizontal reference are corrected to be 0 degrees which deviate from the horizontal reference, so that the text in the image after distortion correction is the forward text. Specifically, the server determines a standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server calculates affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image by the server according to the affine transformation matrix to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters. For example, the server determines three pixel reference point coordinates D (x ₁,y₁)、E(x₂,y₂) and F (x ₃,y₃) from the standard image of the identification card, the server determines corresponding pixel coordinates D '(x' ₁,y'₁)、E'(x'₂,y'₂) and F '(x' ₃,y'₃) from the distorted image according to the three pixel reference point coordinates D, E and F, and the server calculates according to a homogeneous coordinate formula as follows:

Wherein, (x, y) corresponds to pixel coordinates of the distorted image, (u, v) corresponds to coordinates of three pixel reference points of the standard image of the identity card, the server substitutes D' (x ₁',y₁')、E'(x'₂,y'₂)、F'(x'₃,y'₃) and D (x ₁,y₁)、E(x₂,y₂)、F(x₃,y₃) into a homogeneous coordinate formula in sequence to calculate to obtain an affine transformation matrix, namely the server determines values of variables a, b, c, D, e and f of the affine transformation matrix, and the server carries out affine transformation on the distorted image according to the affine transformation matrix to obtain an identity card image after distortion correction, wherein the size corresponding to the identity card image after distortion correction is 85.6 mm multiplied by 54 mm. It will be appreciated that when affine transforming a distorted image, the server also determines the direction and angle of rotation such that the text in the distortion corrected image is forward.

It should be noted that affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), the distorted image is a trapezoid image, affine transformation is to map a point on the original image to a corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally transform the distorted image from the trapezoid to a rectangle.

205. Determining a template corresponding to the distortion corrected image according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the identification forward text is located according to preset coordinate values;

The server determines a template corresponding to the image after distortion correction according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the identification forward text is located according to preset coordinate values. The rectangular frame is a rectangular area formed by 4 point coordinates, for example, a template corresponding to the front horizontal forward image of the identity card comprises 6 rectangular frames of name, gender, ethnicity, birth date, address and citizen identity card number; the corresponding template of the horizontal forward image of the front side of the bank card comprises 1 rectangular frame of the bank card number.

The template corresponding to the image after distortion correction is consistent with the size of the image after distortion correction, the template comprises a rectangular frame indicating a position area where the forward text is located according to a preset coordinate value, the server obtains the template by matching the image after distortion correction according to the image type, and further, the server determines the text of the image after distortion correction according to the rectangular frame in the template.

206. Performing text positioning on the image after distortion correction according to a preset algorithm and a template to obtain a positioning result;

And the server performs text positioning on the image after distortion correction according to a preset algorithm and a template to obtain a positioning result. Specifically, the server determines the position information of the long-strip-shaped object to be segmented of the image after distortion correction according to a preset algorithm and a template, wherein the position information of the long-strip-shaped object comprises an upper left point coordinate, a lower right point coordinate and corresponding characters of a corresponding area, the character positioning rule follows the sequence from the upper left coordinate to the lower right coordinate, the image after distortion correction is scanned line by line in sequence, and the same-row and same-category information is positioned at the same time; the server sets the coordinates of the upper left point and the coordinates of the lower right point and the corresponding characters as positioning results. For example, the server performs text positioning on the name area of the identification card, and the obtained text positioning result includes coordinates (13, 14) of the upper left point, coordinates (744, 49) of the lower right point and the name.

Optionally, the server adopts PixelLink algorithm to select text region frame of the image after distortion correction. PixelLink proposes an example segmentation to achieve text detection, two pixel predictions, namely text/non-text prediction and link prediction, are made based on the deep neural network algorithm DNN. Specifically, the server marks text pixels in the image after distortion correction as positive according to PixelLink algorithm, and marks non-text of the image after distortion correction as negative; the server judges whether a given pixel and a neighboring pixel of the pixel are located in the same instance; if a given pixel and one of the pixels' neighbors are in the same instance, the server marks the link between them as positive; if a given pixel and one of the pixels' neighbors are not located in the same instance, the server marks the link between them as negative, with 8 neighbors per pixel. The predicted positive pixels are connected in connected component CCs through a predicted forward link, each CC represents a detected text, the server eventually takes a bounding box from which each connected component is obtained as a final detection result, and the server sets coordinate information of the final detection result as a positioning result.

207. And storing the positioning result into a preset file.

The server locates the results in the preset file. Specifically, the server locates the image after distortion correction to obtain a plurality of locating rectangular areas, records coordinates of upper left and lower right points of each locating rectangular area, and stores a plurality of locating results in txt format. For example, the service performs text positioning on a rural commercial bank, the positioning result includes 6 rectangular frames and text information obtained by positioning the rectangular frames, and the server stores the text information in sds _0.txt file, where the file content is as follows:

standard_build/sds_0.png|13 14 744 49|

standard_build/sds_0.png|22 52 645 88|

standard_build/sds_0.png|12 94 446 130|

standard_build/sds_0.png|28 135 775 170|

standard_build/sds_0.png|13 177 544 212|

standard_build/sds_0.png|22 217 348 252|；

It should be noted that, the positioning result in the sds _0.txt file may be further used for word recognition, and the positioning result includes a preset identifier for prompting word recognition to discard the line, for example, for the positioning result standard_build/sds _0.png|13 14 744 49|XXXX, where XXXX is a preset identifier for indicating that the server does not perform word recognition, and the positioning result may also be marked by using other types of preset identifiers, which is not limited herein.

Optionally, the server determines a new type of ticket image or certificate image; the server sets the newly added bill image or certificate image as a sample image to be trained; and the server performs iterative optimization on the preset image segmentation network according to the sample image to be trained. For example, the current bill type includes 1 to 10 types, when an increase to 11 types is detected, a newly added bill image is set as a sample image to be trained, and the image segmentation network is iteratively optimized according to the 11-type bill image. It can be appreciated that before performing iterative optimization on the preset image segmentation network, parameters in the preset image segmentation network are frozen, and then iterative optimization is performed.

The above description is given of the text positioning method based on image segmentation in the embodiment of the present invention, and the following description is given of the text positioning device based on image segmentation in the embodiment of the present invention, referring to fig. 3, one embodiment of the text positioning device based on image segmentation in the embodiment of the present invention includes:

an acquiring unit 301, configured to acquire an original image, where the original image is a ticket image or a document image acquired under a text background;

The segmentation unit 302 is configured to perform image segmentation on an original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a ticket image or a certificate image;

a transformation unit 303, configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, where characters in the distortion-corrected image are forward characters;

And the positioning unit 304 is used for performing text positioning on the image after distortion correction to obtain a positioning result.

Referring to fig. 4, another embodiment of a text positioning device based on image segmentation according to an embodiment of the present invention includes:

Optionally, the dividing unit 302 may further include:

An input subunit 3021 for inputting an original image into a preset image segmentation network model;

a first segmentation subunit 3022, configured to perform image semantic segmentation on an original image through a preset image segmentation network model to obtain a segmentation label image and an image type;

The second segmentation subunit 3023 is configured to segment the original image according to the segmentation tag image, so as to obtain a distorted image, where the distorted image is a ticket image or a document image.

Optionally, the second splitting subunit 3023 may be further specifically configured to:

Determining a region to be segmented according to the segmentation tag image, setting the pixel value in the region to be segmented as 1, and setting the pixel value outside the region to be segmented as 0 to obtain a mask image;

and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image separated from the text background in the original image.

Optionally, the transforming unit 303 may be further specifically configured to:

determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;

determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates;

calculating to obtain an affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates;

and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion corrected image.

Optionally, the positioning unit 304 may be further specifically configured to:

Determining a template corresponding to the distortion corrected image according to the image type, wherein the template comprises at least one rectangular frame, and the rectangular frame is used for indicating a position area where the identification forward text is located according to preset coordinate values;

performing text positioning on the image after distortion correction according to a preset algorithm and a template to obtain a positioning result;

And storing the positioning result into a preset file.

Optionally, the obtaining unit 301 may be further specifically configured to:

receiving a bill image or a certificate image collected under a text background, and setting the bill image or the certificate image as an original image;

Setting the name of an original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image;

And writing the storage path of the original image and the name of the original image into a target data table.

Optionally, the text positioning device based on image segmentation may further include:

A determining unit 305 for determining a ticket image or a document image of a newly added type;

A setting unit 306, configured to set the newly added type of ticket image or certificate image as a sample image to be trained;

And the iteration unit 307 is configured to perform iterative optimization on the preset image segmentation network model according to the sample image to be trained.

The text positioning device based on image segmentation in the embodiment of the present invention is described in detail from the point of view of modularized functional entities in fig. 3 and fig. 4, and the text positioning device based on image segmentation in the embodiment of the present invention is described in detail from the point of view of hardware processing.

Fig. 5 is a schematic structural diagram of an image segmentation-based text positioning device according to an embodiment of the present invention, where the image segmentation-based text positioning device 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 501 (e.g., one or more processors) and a memory 509, and one or more storage mediums 508 (e.g., one or more mass storage devices) for storing application programs 509 or data 509. Wherein the memory 509 and storage medium 508 may be transitory or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations in text positioning based on image segmentation. Still further, the processor 501 may be configured to communicate with the storage medium 508 and execute a series of instruction operations in the storage medium 508 on the image segmentation-based text positioning device 500.

The image segmentation based text pointing device 500 may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input/output interfaces 504, and/or one or more operating systems 505, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the image segmentation-based text positioning device structure shown in fig. 5 does not constitute a limitation of the image segmentation-based text positioning device, and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The character positioning method based on image segmentation is characterized by comprising the following steps:

acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

Performing image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image;

affine transformation is carried out on the distorted image to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters;

Performing text positioning on the image subjected to distortion correction to obtain a positioning result;

the step of performing text positioning on the image after distortion correction to obtain a positioning result comprises the following steps: predicting text pixels and non-text pixels of the image after distortion correction through a deep neural network algorithm DNN; establishing a pixel link for the text pixel by utilizing PixelLink algorithm, and connecting a forward link in a pixel link result of the text pixel in connected components CC to obtain a boundary box of each connected component, wherein one CC represents a detected text; setting the coordinate information of the boundary boxes of the connected components as a positioning result;

The original image is subjected to image segmentation through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image and comprises: inputting the original image into a preset image segmentation network model; performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type; dividing the original image according to the division tag image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image;

Carrying out affine transformation on the distorted image to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters and comprise: determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image; determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; calculating affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters.

2. The text positioning method based on image segmentation according to claim 1, wherein the segmenting the original image according to the segmentation tag image to obtain a distorted image, the distorted image being the ticket image or the document image comprises:

Determining a region to be segmented according to the segmentation tag image, setting the pixel value in the region to be segmented to be 1, and setting the pixel value outside the region to be segmented to be 0 to obtain a mask image;

and multiplying the original image and the mask image to obtain a distorted image, wherein the distorted image is used for indicating the bill image or the certificate image which is separated from the text background in the original image.

3. The image segmentation-based text positioning method according to claim 1, wherein the acquiring an original image, the original image being a ticket image or a document image acquired in a text background comprises:

setting the name of the original image according to a preset format, and storing the original image into a preset path to obtain a storage path of the original image;

4. A text positioning method based on image segmentation according to any one of claims 1 to 3, wherein after the text positioning is performed on the distortion corrected image to obtain a positioning result, the text positioning method based on image segmentation comprises:

Determining a newly added type bill image or certificate image;

Setting the newly added bill image or certificate image as a sample image to be trained;

And carrying out iterative optimization on the preset image segmentation network model according to the sample image to be trained.

5. A text positioning device based on image segmentation, characterized in that the text positioning device based on image segmentation comprises:

the acquisition unit is used for acquiring an original image, wherein the original image is a bill image or a certificate image acquired under a text background;

the segmentation unit is used for carrying out image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, wherein the distorted image is the bill image or the certificate image;

the transformation unit is used for carrying out affine transformation on the distorted image to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters;

the positioning unit is used for performing text positioning on the image subjected to distortion correction to obtain a positioning result;

The positioning unit is specifically used for: predicting text pixels and non-text pixels of the image after distortion correction through a deep neural network algorithm DNN; establishing a pixel link for the text pixel by utilizing PixelLink algorithm, and connecting a forward link in a pixel link result of the text pixel in connected components CC to obtain a boundary box of each connected component, wherein one CC represents a detected text; setting the coordinate information of the boundary boxes of the connected components as a positioning result;

The dividing unit is specifically configured to: inputting the original image into a preset image segmentation network model; performing image semantic segmentation on the original image through the preset image segmentation network model to obtain a segmentation label image and an image type; dividing the original image according to the division tag image to obtain a distorted image, wherein the distorted image is the bill image or the certificate image;

The transformation unit is specifically configured to: determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image; determining corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; calculating affine transformation matrix according to the three pixel reference point coordinates and the corresponding pixel coordinates; and carrying out affine transformation on the distorted image according to the affine transformation matrix to obtain a distorted and corrected image, wherein characters in the distorted and corrected image are forward characters.

6. An electronic device, the electronic device comprising: the system comprises a memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line;

The at least one processor invoking the instructions in the memory to cause the electronic device to perform the method of any of claims 1-4.

7. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method according to any of claims 1-4 when executed by a processor.