WO2021051527A1 - Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage - Google Patents
Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage Download PDFInfo
- Publication number
- WO2021051527A1 WO2021051527A1 PCT/CN2019/117036 CN2019117036W WO2021051527A1 WO 2021051527 A1 WO2021051527 A1 WO 2021051527A1 CN 2019117036 W CN2019117036 W CN 2019117036W WO 2021051527 A1 WO2021051527 A1 WO 2021051527A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- text
- segmentation
- distorted
- distortion
- Prior art date
Links
- 238000003709 image segmentation Methods 0.000 title claims abstract description 115
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000009466 transformation Effects 0.000 claims abstract description 51
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 42
- 230000011218 segmentation Effects 0.000 claims description 31
- 238000012937 correction Methods 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims 6
- 238000012545 processing Methods 0.000 abstract description 20
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 230000001788 irregular Effects 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 3
- 238000010008 shearing Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
- G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/02—Affine transformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This application relates to the field of computer technology, and in particular to text positioning methods, devices, equipment and storage media based on image segmentation.
- OCR Optical character recognition
- electronic devices check the characters printed on paper, such as scanners or digital cameras, and then use character recognition methods to translate the shapes into computer text, that is, to scan text data. Then the image file is analyzed and processed to obtain text and layout information.
- OCR includes text positioning and text recognition. The text positioning is the precise positioning of the text position in the image, mainly based on the extraction of relevant text features.
- the main purpose of this application is to solve the technical problem of low accuracy of text positioning from images with complex text backgrounds.
- the first aspect of the present application provides a text positioning method based on image segmentation, including: acquiring an original image, the original image being a bill image or a certificate image collected in a text background;
- the network model performs image segmentation on the original image to obtain a distorted image, where the distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distortion corrected image, the distortion
- the text in the corrected image is a forward text; the text positioning is performed on the image after the distortion correction to obtain the positioning result.
- a second aspect of the present application provides a text positioning device based on image segmentation, including: an acquisition unit for acquiring an original image, the original image being a bill image or a document image collected in the context of the text; and a segmentation unit for Perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image; a transformation unit for performing affine transformation on the distorted image, Obtain a distortion-corrected image, and the text in the distortion-corrected image is a positive text; the positioning unit is used for positioning the text in the distortion-corrected image to obtain a positioning result.
- a third aspect of the present application provides a text positioning device based on image segmentation, including: a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires; At least one processor calls the instructions in the memory, so that the text positioning device based on image segmentation executes the method described in the first aspect.
- the fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.
- an original image is acquired, and the original image is a bill image or a certificate image collected under a text background; the original image is image-segmented through a preset image segmentation network model to obtain a distorted image, so The distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distorted image, and the text in the distorted image is forward text; the distortion is corrected The corrected image is positioned for text, and the positioning result is obtained.
- an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
- FIG. 1 is a schematic diagram of an embodiment of a text positioning method based on image segmentation in an embodiment of the application
- FIG. 2 is a schematic diagram of another embodiment of a text positioning method based on image segmentation in an embodiment of this application;
- FIG. 3 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application
- FIG. 4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation in an embodiment of the application;
- Fig. 5 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application.
- the embodiments of the present application provide a text positioning method, device, equipment, and storage medium based on image segmentation, which are used to obtain accurate image foreground images by performing image segmentation network processing on images under complex backgrounds, and according to preset templates Perform text positioning processing on the image foreground map to obtain positioning results, improve the accuracy of image text positioning, and enhance the robustness of complex backgrounds.
- An embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
- the server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text.
- a text background with strong interference refers to the presence of text targets in the background of the original image, especially handwritten numbers and printed text, adding direct positioning of the text in the original image
- the difficulty Specifically, the server receives the bill image or the credential image collected in the context of the text, and sets the bill image or credential image as the original image; the server stores the original image in the preset path according to the preset format, and stores the original image The path is recorded in the data sheet.
- the server stores the original image in the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image.
- the preset format includes preset naming rules and picture formats.
- the picture format is jpg, png or other types of picture formats, which are not specifically limited here.
- the server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image.
- the distorted image is a bill image or a certificate image.
- the server performs image segmentation on the original image according to a preset image segmentation network model to obtain a segmented label image; the server determines a mask image according to the segmented label image, and processes the original image according to the mask image to obtain a distorted image, where:
- the distorted image is a partial image obtained after the server separates the complex background in the original image.
- the shape of the partial image is an irregular quadrilateral, and the partial image includes a bill image or a certificate image.
- the server trains the image segmentation network model according to the preset samples, determines the parameters in the image segmentation network model, and obtains the preset image segmentation network model, which is used to perform image segmentation on the original image .
- the server performs affine transformation on the distorted image to obtain a distortion-corrected image
- the text in the distortion-corrected image is a forward text.
- the forward text refers to the text that is based on the horizontal reference and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion
- the text in the corrected image is forward text.
- the server determines the affine transformation rule corresponding to the distorted image; the server performs affine transformation on the distorted image according to the mapping rule and the preset size to obtain a distortion-corrected image. It is understandable that the distorted image is an irregular quadrilateral image.
- the server performs distortion correction on the distorted image according to affine transformation to obtain a distorted image.
- the text in the distorted image is positive.
- the size of the image is a preset fixed value, consistent with the template size corresponding to the distorted image.
- the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, a point on the original image is mapped to a corresponding point on the target image. Including the rotation, translation, scaling and shearing of the original image.
- the server performs text positioning on the image after the distortion correction to obtain the positioning result. Specifically, the server performs text positioning processing on the distortion-corrected image according to the preset algorithm and template to obtain the positioning result.
- the template includes at least one rectangular frame, the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate value, the positioning result is the text positioning coordinate information selected from the image after the distortion correction, and the text positioning coordinate
- the amount of information is equal to the number of rectangular boxes. For example: for a rural commercial bank and a transfer check in the bill image after distortion correction, the server matches the corresponding template. There are two rectangular boxes in the template, which are used to indicate a rural commercial bank and the transfer check. The preset coordinate values of the two rectangular boxes determine the positioning result, which includes a rural commercial bank and transfer check and the preset coordinate values of the two rectangular boxes.
- an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
- another embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
- the server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Specifically, the server receives the bill image or the credential image collected in the text background, and sets the bill image or credential image as the original image; the server sets the name of the original image according to the preset format, and stores the original image in the preset path , Get the storage path of the original image, the preset path is the preset file directory, the preset format includes preset naming rules and picture format, the picture format is jpg, png or other types of picture formats, the specifics are not limited here ; The server writes the storage path of the original image and the name of the original image into the target data table.
- the server receives the bank note image and sets the bank note image as the original image, and at the same time names the original image bank1.jpg, and then the server stores bank1.jpg in the directory /var/www/html/bankimage; the server Write the storage path of the original image and the name of the original image into the target data table.
- the name of the original image is bank1.jpg
- the storage path of the original image is /var/www/html/bankimage/bank1.jpg.
- the storage path of the image and the name of the original image generate structured data query language SQL insert statements, and write them into the target data table according to the SQL insert statements.
- the strong noisy text background refers to the existence of text targets in the background of the original image, especially handwritten numbers and printed text. If you directly locate the original image The text is difficult to locate.
- the server inputs the original image into the preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain the segmentation label image and the image type. Further, the server uses a preset deeplabv3+ model to perform image semantic segmentation on the original image. It can be understood that the preset deeplabv3+ model is a preset image segmentation network model.
- the main purpose of the server to perform semantic image segmentation on the original image through the preset deeplabv3+ model is to specify a semantic label for each pixel of the original image, that is, the value of each pixel in the segmented label image represents the type of the pixel.
- Deeplabv3+ is a state-of-the-art deep learning model for image semantic segmentation. Its goal is to assign a semantic label to each pixel of the input image. Deeplabv3+ includes a simple and efficient decoder module that improves the segmentation results.
- the original image is segmented according to the segmented label image to obtain a distorted image, where the distorted image is a bill image or a certificate image;
- the server divides the original image according to the segmented label image to obtain a distorted image.
- the distorted image is a bill image or a certificate image. Specifically, the server determines the area to be divided according to the segmented label image, and sets the pixel value in the area to be divided to 1, and sets the pixel value outside the area to be divided to 0 to obtain the mask image; the server combines the original image and the mask image Multiplication is performed to obtain a distorted image.
- the distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
- the server root compares the original image with the segmented label image to obtain a comparison result, and determines the area to be segmented according to the comparison result; the server segment the area to be segmented to obtain a distorted image, and the distorted image is a bill image or a certificate image; The server stores the distorted image.
- the final saved file is the foreground four-point coordinate image with the same name as the original image.
- the server performs image segmentation processing on the certificate image named image1.png to obtain two For the eight coordinate points of the foreground image of each certificate, the server will digitally save the two foreground images of the certificate.
- the content of the file is as follows:
- the server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
- the forward text refers to the text that takes the horizontal reference as the positive direction and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion
- the text in the corrected image is forward text.
- the server determines the standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; The coordinates of each pixel reference point and the corresponding pixel coordinates are calculated to obtain the affine transformation matrix; the server performs affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correction, and the text in the image after distortion correction is forward text .
- the server determines from the standard image of the ID card that the coordinates of the three pixel reference points are D(x 1 ,y 1 ), E(x 2 ,y 2 ), and F(x 3 ,y 3 ).
- the reference point coordinates D, E and F determine the corresponding pixel coordinates D'(x' 1 ,y' 1 ), E'(x' 2 ,y' 2 ) and F'(x' 3 ,y' from the distorted image 3 ), the server calculates according to the homogeneous coordinate formula, the homogeneous coordinate formula is as follows:
- (x, y) corresponds to the pixel coordinates of the distorted image
- (u, v) corresponds to the three pixel reference point coordinates of the standard image of the ID card
- the server will D'(x' 1 ,y' 1 ), E '(x' 2 ,y' 2 ), F'(x' 3 ,y' 3 ) and D(x 1 ,y 1 ), E(x 2 ,y 2 ), F(x 3 ,y 3 ) successively Substitute into the homogeneous coordinate formula for calculation to obtain the affine transformation matrix, that is, the server determines the values of the affine transformation matrix variables a, b, c, d, e, and f, and the server affines the distorted image according to the affine transformation matrix After transformation, the ID card image after distortion correction is obtained, and the corresponding size of the ID card image after distortion correction is 85.6 mm times 54 mm. It is understandable that when performing affine transformation on the distorted
- the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v).
- the distorted image is an irregular quadrilateral image.
- the affine transformation is to put on the original image A point of is mapped to the corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally the distorted image is transformed from an irregular quadrilateral to a rectangle.
- 205 Determine a template corresponding to the distortion-corrected image according to the image type, where the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
- the server determines a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values.
- the rectangular box is a rectangular area composed of 4 point coordinates.
- the template corresponding to the front horizontal forward image of the ID card includes 6 rectangular boxes of name, gender, ethnicity, date of birth, address, and citizen ID number; bank;
- the corresponding template for the horizontal forward image of the card front includes a rectangular frame of the bank card number.
- the template corresponding to the distortion-corrected image is consistent with the size of the distortion-corrected image.
- the template includes a rectangular frame indicating the location area where the forward text is located according to the preset coordinate values.
- the server matches the distortion according to the image type. After the corrected image obtains the template, further, the server determines the text of the distortion corrected image according to the rectangular frame in the template.
- the server performs text positioning on the distortion-corrected image according to the preset algorithm and template, and obtains the positioning result. Specifically, the server determines the position information of the strip-shaped object to be divided in the distortion-corrected image according to the preset algorithm and template.
- the position information of the strip-shaped object includes the coordinates of the upper left point and the lower right point of the corresponding area and the corresponding
- the text positioning rules follow the order of positioning from the upper left coordinate to the lower right coordinate.
- the image after distortion correction is scanned line by line, and the same line of the same category information is located at the same time; the server will coordinate the upper left point and the lower right point And the corresponding text is set as the positioning result.
- the server performs text positioning on the name area of the ID card, and the obtained text positioning results include the coordinates of the upper left point (13, 14), the coordinates of the lower right point (744, 49), and the name.
- the server uses the PixelLink algorithm to frame the text area of the image after distortion correction.
- PixelLink proposes instance segmentation to realize text detection.
- DNN deep neural network algorithm
- two types of pixel prediction are performed, namely text/non-text prediction and link prediction.
- the server marks the text pixels in the distortion-corrected image as positive according to the PixelLink algorithm, and marks the non-text of the distortion-corrected image as negative; the server determines whether the given pixel and an adjacent pixel of the pixel are Are located in the same instance; if a given pixel and an adjacent pixel of the pixel are located in the same instance, the server will mark the link between them as positive; if the given pixel and an adjacent pixel of the pixel are not located In the same instance, the server marks the link between them as negative, and each pixel has 8 neighbors.
- the predicted positive pixels are connected to the connected components CC through the predicted forward link. Each CC represents a detected text.
- the server will finally obtain the bounding box of each connected component as the final detection result, and the server will determine the coordinates of the final detection result.
- the information is set as the positioning result.
- the server will locate the result into the preset file. Specifically, the server locates the image after the distortion correction to obtain multiple positioning rectangular areas. The server records the coordinates of the upper left point and the lower right point of each positioning rectangular area, and saves the multiple positioning results in a txt format. For example, the service performs text positioning for a rural commercial bank, and the positioning result includes 6 rectangular boxes and the text information obtained by the rectangular box positioning. The server saves it in the sds_0.txt file.
- the content of the file is as follows:
- the positioning result in the sds_0.txt file can be further used for text recognition.
- the positioning result includes a preset mark, which is used to prompt the text recognition to discard the line.
- a preset mark which is used to prompt the text recognition to discard the line.
- XXXX where XXXX is a preset mark, which is used to instruct the server not to perform text recognition.
- the positioning result can also be marked with other types of preset marks, which are not specifically limited here.
- the server determines the newly-added type of bill image or credential image; the server sets the newly-added type of bill image or credential image as the sample image to be trained; the server iterates the preset image segmentation network according to the sample image to be trained optimization.
- the current bill types include 1 to 10 categories.
- the newly-added bill image is set as the sample image to be trained, and the image segmentation network is iteratively optimized based on the 11th type of bill image . It is understandable that before the iterative optimization of the preset image segmentation network, the parameters in the preset image segmentation network are frozen, and then the iterative optimization is performed.
- an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
- the text positioning device based on image segmentation in the embodiment of this application includes:
- the acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
- the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
- the transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
- the positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
- an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
- another embodiment of the text positioning device based on image segmentation in the embodiment of the present application includes:
- the acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
- the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
- the transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
- the positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
- the dividing unit 302 may further include:
- the input subunit 3021 is used to input the original image into the preset image segmentation network model
- the first segmentation subunit 3022 is configured to perform image semantic segmentation on the original image through a preset image segmentation network model to obtain segmentation label images and image types;
- the second segmentation subunit 3023 is configured to segment the original image according to the segmented label image to obtain a distorted image, and the distorted image is a bill image or a certificate image.
- the second dividing subunit 3023 may also be specifically used for:
- the original image and the mask image are multiplied to obtain a distorted image.
- the distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
- the transformation unit 303 may also be specifically configured to:
- the distorted image is subjected to affine transformation to obtain the image after distortion correction.
- the positioning unit 304 may also be specifically configured to:
- the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
- the obtaining unit 301 may also be specifically configured to:
- the text positioning device based on image segmentation may further include:
- the determining unit 305 is used to determine the newly-added type of bill image or certificate image
- the setting unit 306 is configured to set the newly-added type of bill image or certificate image as the sample image to be trained
- the iterative unit 307 is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.
- an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
- FIG. 5 is a schematic structural diagram of a text positioning device based on image segmentation provided by an embodiment of the present application.
- the text positioning device 500 based on image segmentation may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506.
- the memory 509 and the storage medium 508 may be short-term storage or persistent storage.
- the program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for character positioning based on image segmentation.
- the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the text positioning device 500 based on image segmentation.
- the text positioning device 500 based on image segmentation may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or one or more operating systems 505, For example, Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc.
- Windows Serve Windows Serve
- Mac OS X Unix
- Linux FreeBSD
- FIG. 5 does not constitute a limitation on the text positioning device based on image segmentation, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.
- the present application also provides a computer-readable storage medium.
- the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
- the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
- the text positioning is performed on the image after the distortion correction to obtain the positioning result.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image, et support de stockage, se rapportant au domaine technique de l'intelligence artificielle. Ledit procédé comprend : l'acquisition d'une image d'origine, l'image d'origine étant une image de ticket ou une image de certificat qui est acquise sous un arrière-plan de texte (101) ; la réalisation d'une segmentation d'image sur l'image d'origine au moyen d'un modèle de réseau de segmentation d'image prédéfini, de façon à obtenir une image déformée, l'image déformée étant l'image de ticket ou l'image de certificat (102) ; la réalisation d'une transformation affine sur l'image déformée pour obtenir une image à déformation corrigée, le texte dans l'image à déformation corrigée étant un texte dans une direction avant (103) ; et la réalisation d'un positionnement de texte sur l'image à déformation corrigée, de façon à obtenir un résultat de positionnement (104). Un traitement de réseau de segmentation d'image est réalisé sur une image sous un arrière-plan de texte complexe, de manière à obtenir une image d'avant-plan d'image précise, et un traitement de positionnement de texte est réalisé sur l'image d'avant-plan d'image, de façon à obtenir un résultat de positionnement, de telle sorte que la précision du positionnement de texte d'image est améliorée, et la robustesse de l'arrière-plan complexe est améliorée.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910884634.0 | 2019-09-19 | ||
CN201910884634.0A CN110807454B (zh) | 2019-09-19 | 2019-09-19 | 基于图像分割的文字定位方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021051527A1 true WO2021051527A1 (fr) | 2021-03-25 |
Family
ID=69487698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/117036 WO2021051527A1 (fr) | 2019-09-19 | 2019-11-11 | Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110807454B (fr) |
WO (1) | WO2021051527A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111880A (zh) * | 2021-05-12 | 2021-07-13 | 中国平安人寿保险股份有限公司 | 证件图像校正方法、装置、电子设备及存储介质 |
CN113687823A (zh) * | 2021-07-30 | 2021-11-23 | 稿定(厦门)科技有限公司 | 基于html的四边形区块非线性变换方法及其系统 |
CN114565915A (zh) * | 2022-04-24 | 2022-05-31 | 深圳思谋信息科技有限公司 | 样本文本图像获取方法、文本识别模型训练方法和装置 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111768345B (zh) * | 2020-05-12 | 2023-07-14 | 北京奇艺世纪科技有限公司 | 身份证背面图像的校正方法、装置、设备及存储介质 |
CN113963339A (zh) * | 2021-09-02 | 2022-01-21 | 泰康保险集团股份有限公司 | 一种信息提取方法和装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458770A (zh) * | 2008-12-24 | 2009-06-17 | 北京文通科技有限公司 | 一种文字识别的方法和系统 |
CN101515984A (zh) * | 2008-02-19 | 2009-08-26 | 佳能株式会社 | 电子文档生成设备及电子文档生成方法 |
US20170124417A1 (en) * | 2014-11-14 | 2017-05-04 | Adobe Systems Incorporated | Facilitating Text Identification and Editing in Images |
CN108885699A (zh) * | 2018-07-11 | 2018-11-23 | 深圳前海达闼云端智能科技有限公司 | 字符识别方法、装置、存储介质及电子设备 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4201812B2 (ja) * | 2004-03-25 | 2008-12-24 | 三洋電機株式会社 | 情報データ提供装置、および画像処理装置 |
CN105574513B (zh) * | 2015-12-22 | 2017-11-24 | 北京旷视科技有限公司 | 文字检测方法和装置 |
CN109993160B (zh) * | 2019-02-18 | 2022-02-25 | 北京联合大学 | 一种图像矫正及文本与位置识别方法及系统 |
-
2019
- 2019-09-19 CN CN201910884634.0A patent/CN110807454B/zh active Active
- 2019-11-11 WO PCT/CN2019/117036 patent/WO2021051527A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101515984A (zh) * | 2008-02-19 | 2009-08-26 | 佳能株式会社 | 电子文档生成设备及电子文档生成方法 |
CN101458770A (zh) * | 2008-12-24 | 2009-06-17 | 北京文通科技有限公司 | 一种文字识别的方法和系统 |
US20170124417A1 (en) * | 2014-11-14 | 2017-05-04 | Adobe Systems Incorporated | Facilitating Text Identification and Editing in Images |
CN108885699A (zh) * | 2018-07-11 | 2018-11-23 | 深圳前海达闼云端智能科技有限公司 | 字符识别方法、装置、存储介质及电子设备 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111880A (zh) * | 2021-05-12 | 2021-07-13 | 中国平安人寿保险股份有限公司 | 证件图像校正方法、装置、电子设备及存储介质 |
CN113111880B (zh) * | 2021-05-12 | 2023-10-17 | 中国平安人寿保险股份有限公司 | 证件图像校正方法、装置、电子设备及存储介质 |
CN113687823A (zh) * | 2021-07-30 | 2021-11-23 | 稿定(厦门)科技有限公司 | 基于html的四边形区块非线性变换方法及其系统 |
CN113687823B (zh) * | 2021-07-30 | 2023-08-01 | 稿定(厦门)科技有限公司 | 基于html的四边形区块非线性变换方法及其系统 |
CN114565915A (zh) * | 2022-04-24 | 2022-05-31 | 深圳思谋信息科技有限公司 | 样本文本图像获取方法、文本识别模型训练方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN110807454A (zh) | 2020-02-18 |
CN110807454B (zh) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021051527A1 (fr) | Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage | |
US11645826B2 (en) | Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks | |
CN109492643B (zh) | 基于ocr的证件识别方法、装置、计算机设备及存储介质 | |
CN110569832B (zh) | 基于深度学习注意力机制的文本实时定位识别方法 | |
US20190304066A1 (en) | Synthesis method of chinese printed character images and device thereof | |
WO2018233055A1 (fr) | Procédé et appareil d'entrée d'informations de police, dispositif informatique et support d'informations | |
WO2018233038A1 (fr) | Procédé basé sur un apprentissage profond, appareil et dispositif de reconnaissance de plaque d'immatriculation, et support d'informations | |
CN110874618B (zh) | 基于小样本的ocr模板学习方法、装置、电子设备及介质 | |
CN109255300B (zh) | 票据信息提取方法、装置、计算机设备及存储介质 | |
US11341605B1 (en) | Document rectification via homography recovery using machine learning | |
Khare et al. | Arbitrarily-oriented multi-lingual text detection in video | |
CN112396047B (zh) | 训练样本生成方法、装置、计算机设备和存储介质 | |
US11881043B2 (en) | Image processing system, image processing method, and program | |
CN112926469A (zh) | 基于深度学习ocr与版面结构的证件识别方法 | |
CN113158895A (zh) | 票据识别方法、装置、电子设备及存储介质 | |
Zhang et al. | Marior: Margin removal and iterative content rectification for document dewarping in the wild | |
CN111145124A (zh) | 一种图像倾斜的校正方法及装置 | |
US20210209393A1 (en) | Image processing system, image processing method, and program | |
CN108090728B (zh) | 一种基于智能终端的快递信息录入方法及录入系统 | |
CN115457585A (zh) | 作业批改的处理方法、装置、计算机设备及可读存储介质 | |
WO2019071476A1 (fr) | Procédé et système d'entrée d'informations express basés sur un terminal intelligent | |
JPH07168910A (ja) | 文書レイアウト解析装置及び文書フォ−マット識別装置 | |
Konya et al. | Adaptive methods for robust document image understanding | |
WO2021098861A1 (fr) | Procédé de reconnaissance de texte, appareil, dispositif de reconnaissance et support de stockage | |
CN117935271A (zh) | 一种用于版面还原过程的字体大小归一化方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19945752 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19945752 Country of ref document: EP Kind code of ref document: A1 |