WO2021051527A1 - Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage - Google Patents

Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage Download PDF

Info

Publication number
WO2021051527A1
WO2021051527A1 PCT/CN2019/117036 CN2019117036W WO2021051527A1 WO 2021051527 A1 WO2021051527 A1 WO 2021051527A1 CN 2019117036 W CN2019117036 W CN 2019117036W WO 2021051527 A1 WO2021051527 A1 WO 2021051527A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
text
segmentation
distorted
distortion
Prior art date
Application number
PCT/CN2019/117036
Other languages
English (en)
Chinese (zh)
Inventor
孙强
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051527A1 publication Critical patent/WO2021051527A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This application relates to the field of computer technology, and in particular to text positioning methods, devices, equipment and storage media based on image segmentation.
  • OCR Optical character recognition
  • electronic devices check the characters printed on paper, such as scanners or digital cameras, and then use character recognition methods to translate the shapes into computer text, that is, to scan text data. Then the image file is analyzed and processed to obtain text and layout information.
  • OCR includes text positioning and text recognition. The text positioning is the precise positioning of the text position in the image, mainly based on the extraction of relevant text features.
  • the main purpose of this application is to solve the technical problem of low accuracy of text positioning from images with complex text backgrounds.
  • the first aspect of the present application provides a text positioning method based on image segmentation, including: acquiring an original image, the original image being a bill image or a certificate image collected in a text background;
  • the network model performs image segmentation on the original image to obtain a distorted image, where the distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distortion corrected image, the distortion
  • the text in the corrected image is a forward text; the text positioning is performed on the image after the distortion correction to obtain the positioning result.
  • a second aspect of the present application provides a text positioning device based on image segmentation, including: an acquisition unit for acquiring an original image, the original image being a bill image or a document image collected in the context of the text; and a segmentation unit for Perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image; a transformation unit for performing affine transformation on the distorted image, Obtain a distortion-corrected image, and the text in the distortion-corrected image is a positive text; the positioning unit is used for positioning the text in the distortion-corrected image to obtain a positioning result.
  • a third aspect of the present application provides a text positioning device based on image segmentation, including: a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires; At least one processor calls the instructions in the memory, so that the text positioning device based on image segmentation executes the method described in the first aspect.
  • the fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.
  • an original image is acquired, and the original image is a bill image or a certificate image collected under a text background; the original image is image-segmented through a preset image segmentation network model to obtain a distorted image, so The distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distorted image, and the text in the distorted image is forward text; the distortion is corrected The corrected image is positioned for text, and the positioning result is obtained.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • FIG. 1 is a schematic diagram of an embodiment of a text positioning method based on image segmentation in an embodiment of the application
  • FIG. 2 is a schematic diagram of another embodiment of a text positioning method based on image segmentation in an embodiment of this application;
  • FIG. 3 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application
  • FIG. 4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation in an embodiment of the application;
  • Fig. 5 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application.
  • the embodiments of the present application provide a text positioning method, device, equipment, and storage medium based on image segmentation, which are used to obtain accurate image foreground images by performing image segmentation network processing on images under complex backgrounds, and according to preset templates Perform text positioning processing on the image foreground map to obtain positioning results, improve the accuracy of image text positioning, and enhance the robustness of complex backgrounds.
  • An embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
  • the server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text.
  • a text background with strong interference refers to the presence of text targets in the background of the original image, especially handwritten numbers and printed text, adding direct positioning of the text in the original image
  • the difficulty Specifically, the server receives the bill image or the credential image collected in the context of the text, and sets the bill image or credential image as the original image; the server stores the original image in the preset path according to the preset format, and stores the original image The path is recorded in the data sheet.
  • the server stores the original image in the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image.
  • the preset format includes preset naming rules and picture formats.
  • the picture format is jpg, png or other types of picture formats, which are not specifically limited here.
  • the server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image.
  • the distorted image is a bill image or a certificate image.
  • the server performs image segmentation on the original image according to a preset image segmentation network model to obtain a segmented label image; the server determines a mask image according to the segmented label image, and processes the original image according to the mask image to obtain a distorted image, where:
  • the distorted image is a partial image obtained after the server separates the complex background in the original image.
  • the shape of the partial image is an irregular quadrilateral, and the partial image includes a bill image or a certificate image.
  • the server trains the image segmentation network model according to the preset samples, determines the parameters in the image segmentation network model, and obtains the preset image segmentation network model, which is used to perform image segmentation on the original image .
  • the server performs affine transformation on the distorted image to obtain a distortion-corrected image
  • the text in the distortion-corrected image is a forward text.
  • the forward text refers to the text that is based on the horizontal reference and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion
  • the text in the corrected image is forward text.
  • the server determines the affine transformation rule corresponding to the distorted image; the server performs affine transformation on the distorted image according to the mapping rule and the preset size to obtain a distortion-corrected image. It is understandable that the distorted image is an irregular quadrilateral image.
  • the server performs distortion correction on the distorted image according to affine transformation to obtain a distorted image.
  • the text in the distorted image is positive.
  • the size of the image is a preset fixed value, consistent with the template size corresponding to the distorted image.
  • the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, a point on the original image is mapped to a corresponding point on the target image. Including the rotation, translation, scaling and shearing of the original image.
  • the server performs text positioning on the image after the distortion correction to obtain the positioning result. Specifically, the server performs text positioning processing on the distortion-corrected image according to the preset algorithm and template to obtain the positioning result.
  • the template includes at least one rectangular frame, the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate value, the positioning result is the text positioning coordinate information selected from the image after the distortion correction, and the text positioning coordinate
  • the amount of information is equal to the number of rectangular boxes. For example: for a rural commercial bank and a transfer check in the bill image after distortion correction, the server matches the corresponding template. There are two rectangular boxes in the template, which are used to indicate a rural commercial bank and the transfer check. The preset coordinate values of the two rectangular boxes determine the positioning result, which includes a rural commercial bank and transfer check and the preset coordinate values of the two rectangular boxes.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • another embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
  • the server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Specifically, the server receives the bill image or the credential image collected in the text background, and sets the bill image or credential image as the original image; the server sets the name of the original image according to the preset format, and stores the original image in the preset path , Get the storage path of the original image, the preset path is the preset file directory, the preset format includes preset naming rules and picture format, the picture format is jpg, png or other types of picture formats, the specifics are not limited here ; The server writes the storage path of the original image and the name of the original image into the target data table.
  • the server receives the bank note image and sets the bank note image as the original image, and at the same time names the original image bank1.jpg, and then the server stores bank1.jpg in the directory /var/www/html/bankimage; the server Write the storage path of the original image and the name of the original image into the target data table.
  • the name of the original image is bank1.jpg
  • the storage path of the original image is /var/www/html/bankimage/bank1.jpg.
  • the storage path of the image and the name of the original image generate structured data query language SQL insert statements, and write them into the target data table according to the SQL insert statements.
  • the strong noisy text background refers to the existence of text targets in the background of the original image, especially handwritten numbers and printed text. If you directly locate the original image The text is difficult to locate.
  • the server inputs the original image into the preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain the segmentation label image and the image type. Further, the server uses a preset deeplabv3+ model to perform image semantic segmentation on the original image. It can be understood that the preset deeplabv3+ model is a preset image segmentation network model.
  • the main purpose of the server to perform semantic image segmentation on the original image through the preset deeplabv3+ model is to specify a semantic label for each pixel of the original image, that is, the value of each pixel in the segmented label image represents the type of the pixel.
  • Deeplabv3+ is a state-of-the-art deep learning model for image semantic segmentation. Its goal is to assign a semantic label to each pixel of the input image. Deeplabv3+ includes a simple and efficient decoder module that improves the segmentation results.
  • the original image is segmented according to the segmented label image to obtain a distorted image, where the distorted image is a bill image or a certificate image;
  • the server divides the original image according to the segmented label image to obtain a distorted image.
  • the distorted image is a bill image or a certificate image. Specifically, the server determines the area to be divided according to the segmented label image, and sets the pixel value in the area to be divided to 1, and sets the pixel value outside the area to be divided to 0 to obtain the mask image; the server combines the original image and the mask image Multiplication is performed to obtain a distorted image.
  • the distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
  • the server root compares the original image with the segmented label image to obtain a comparison result, and determines the area to be segmented according to the comparison result; the server segment the area to be segmented to obtain a distorted image, and the distorted image is a bill image or a certificate image; The server stores the distorted image.
  • the final saved file is the foreground four-point coordinate image with the same name as the original image.
  • the server performs image segmentation processing on the certificate image named image1.png to obtain two For the eight coordinate points of the foreground image of each certificate, the server will digitally save the two foreground images of the certificate.
  • the content of the file is as follows:
  • the server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
  • the forward text refers to the text that takes the horizontal reference as the positive direction and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion
  • the text in the corrected image is forward text.
  • the server determines the standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; The coordinates of each pixel reference point and the corresponding pixel coordinates are calculated to obtain the affine transformation matrix; the server performs affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correction, and the text in the image after distortion correction is forward text .
  • the server determines from the standard image of the ID card that the coordinates of the three pixel reference points are D(x 1 ,y 1 ), E(x 2 ,y 2 ), and F(x 3 ,y 3 ).
  • the reference point coordinates D, E and F determine the corresponding pixel coordinates D'(x' 1 ,y' 1 ), E'(x' 2 ,y' 2 ) and F'(x' 3 ,y' from the distorted image 3 ), the server calculates according to the homogeneous coordinate formula, the homogeneous coordinate formula is as follows:
  • (x, y) corresponds to the pixel coordinates of the distorted image
  • (u, v) corresponds to the three pixel reference point coordinates of the standard image of the ID card
  • the server will D'(x' 1 ,y' 1 ), E '(x' 2 ,y' 2 ), F'(x' 3 ,y' 3 ) and D(x 1 ,y 1 ), E(x 2 ,y 2 ), F(x 3 ,y 3 ) successively Substitute into the homogeneous coordinate formula for calculation to obtain the affine transformation matrix, that is, the server determines the values of the affine transformation matrix variables a, b, c, d, e, and f, and the server affines the distorted image according to the affine transformation matrix After transformation, the ID card image after distortion correction is obtained, and the corresponding size of the ID card image after distortion correction is 85.6 mm times 54 mm. It is understandable that when performing affine transformation on the distorted
  • the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v).
  • the distorted image is an irregular quadrilateral image.
  • the affine transformation is to put on the original image A point of is mapped to the corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally the distorted image is transformed from an irregular quadrilateral to a rectangle.
  • 205 Determine a template corresponding to the distortion-corrected image according to the image type, where the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
  • the server determines a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values.
  • the rectangular box is a rectangular area composed of 4 point coordinates.
  • the template corresponding to the front horizontal forward image of the ID card includes 6 rectangular boxes of name, gender, ethnicity, date of birth, address, and citizen ID number; bank;
  • the corresponding template for the horizontal forward image of the card front includes a rectangular frame of the bank card number.
  • the template corresponding to the distortion-corrected image is consistent with the size of the distortion-corrected image.
  • the template includes a rectangular frame indicating the location area where the forward text is located according to the preset coordinate values.
  • the server matches the distortion according to the image type. After the corrected image obtains the template, further, the server determines the text of the distortion corrected image according to the rectangular frame in the template.
  • the server performs text positioning on the distortion-corrected image according to the preset algorithm and template, and obtains the positioning result. Specifically, the server determines the position information of the strip-shaped object to be divided in the distortion-corrected image according to the preset algorithm and template.
  • the position information of the strip-shaped object includes the coordinates of the upper left point and the lower right point of the corresponding area and the corresponding
  • the text positioning rules follow the order of positioning from the upper left coordinate to the lower right coordinate.
  • the image after distortion correction is scanned line by line, and the same line of the same category information is located at the same time; the server will coordinate the upper left point and the lower right point And the corresponding text is set as the positioning result.
  • the server performs text positioning on the name area of the ID card, and the obtained text positioning results include the coordinates of the upper left point (13, 14), the coordinates of the lower right point (744, 49), and the name.
  • the server uses the PixelLink algorithm to frame the text area of the image after distortion correction.
  • PixelLink proposes instance segmentation to realize text detection.
  • DNN deep neural network algorithm
  • two types of pixel prediction are performed, namely text/non-text prediction and link prediction.
  • the server marks the text pixels in the distortion-corrected image as positive according to the PixelLink algorithm, and marks the non-text of the distortion-corrected image as negative; the server determines whether the given pixel and an adjacent pixel of the pixel are Are located in the same instance; if a given pixel and an adjacent pixel of the pixel are located in the same instance, the server will mark the link between them as positive; if the given pixel and an adjacent pixel of the pixel are not located In the same instance, the server marks the link between them as negative, and each pixel has 8 neighbors.
  • the predicted positive pixels are connected to the connected components CC through the predicted forward link. Each CC represents a detected text.
  • the server will finally obtain the bounding box of each connected component as the final detection result, and the server will determine the coordinates of the final detection result.
  • the information is set as the positioning result.
  • the server will locate the result into the preset file. Specifically, the server locates the image after the distortion correction to obtain multiple positioning rectangular areas. The server records the coordinates of the upper left point and the lower right point of each positioning rectangular area, and saves the multiple positioning results in a txt format. For example, the service performs text positioning for a rural commercial bank, and the positioning result includes 6 rectangular boxes and the text information obtained by the rectangular box positioning. The server saves it in the sds_0.txt file.
  • the content of the file is as follows:
  • the positioning result in the sds_0.txt file can be further used for text recognition.
  • the positioning result includes a preset mark, which is used to prompt the text recognition to discard the line.
  • a preset mark which is used to prompt the text recognition to discard the line.
  • XXXX where XXXX is a preset mark, which is used to instruct the server not to perform text recognition.
  • the positioning result can also be marked with other types of preset marks, which are not specifically limited here.
  • the server determines the newly-added type of bill image or credential image; the server sets the newly-added type of bill image or credential image as the sample image to be trained; the server iterates the preset image segmentation network according to the sample image to be trained optimization.
  • the current bill types include 1 to 10 categories.
  • the newly-added bill image is set as the sample image to be trained, and the image segmentation network is iteratively optimized based on the 11th type of bill image . It is understandable that before the iterative optimization of the preset image segmentation network, the parameters in the preset image segmentation network are frozen, and then the iterative optimization is performed.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • the text positioning device based on image segmentation in the embodiment of this application includes:
  • the acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
  • the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
  • the transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
  • the positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • another embodiment of the text positioning device based on image segmentation in the embodiment of the present application includes:
  • the acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
  • the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
  • the transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
  • the positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
  • the dividing unit 302 may further include:
  • the input subunit 3021 is used to input the original image into the preset image segmentation network model
  • the first segmentation subunit 3022 is configured to perform image semantic segmentation on the original image through a preset image segmentation network model to obtain segmentation label images and image types;
  • the second segmentation subunit 3023 is configured to segment the original image according to the segmented label image to obtain a distorted image, and the distorted image is a bill image or a certificate image.
  • the second dividing subunit 3023 may also be specifically used for:
  • the original image and the mask image are multiplied to obtain a distorted image.
  • the distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
  • the transformation unit 303 may also be specifically configured to:
  • the distorted image is subjected to affine transformation to obtain the image after distortion correction.
  • the positioning unit 304 may also be specifically configured to:
  • the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
  • the obtaining unit 301 may also be specifically configured to:
  • the text positioning device based on image segmentation may further include:
  • the determining unit 305 is used to determine the newly-added type of bill image or certificate image
  • the setting unit 306 is configured to set the newly-added type of bill image or certificate image as the sample image to be trained
  • the iterative unit 307 is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • FIG. 5 is a schematic structural diagram of a text positioning device based on image segmentation provided by an embodiment of the present application.
  • the text positioning device 500 based on image segmentation may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506.
  • the memory 509 and the storage medium 508 may be short-term storage or persistent storage.
  • the program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for character positioning based on image segmentation.
  • the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the text positioning device 500 based on image segmentation.
  • the text positioning device 500 based on image segmentation may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or one or more operating systems 505, For example, Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc.
  • Windows Serve Windows Serve
  • Mac OS X Unix
  • Linux FreeBSD
  • FIG. 5 does not constitute a limitation on the text positioning device based on image segmentation, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • the text positioning is performed on the image after the distortion correction to obtain the positioning result.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image, et support de stockage, se rapportant au domaine technique de l'intelligence artificielle. Ledit procédé comprend : l'acquisition d'une image d'origine, l'image d'origine étant une image de ticket ou une image de certificat qui est acquise sous un arrière-plan de texte (101) ; la réalisation d'une segmentation d'image sur l'image d'origine au moyen d'un modèle de réseau de segmentation d'image prédéfini, de façon à obtenir une image déformée, l'image déformée étant l'image de ticket ou l'image de certificat (102) ; la réalisation d'une transformation affine sur l'image déformée pour obtenir une image à déformation corrigée, le texte dans l'image à déformation corrigée étant un texte dans une direction avant (103) ; et la réalisation d'un positionnement de texte sur l'image à déformation corrigée, de façon à obtenir un résultat de positionnement (104). Un traitement de réseau de segmentation d'image est réalisé sur une image sous un arrière-plan de texte complexe, de manière à obtenir une image d'avant-plan d'image précise, et un traitement de positionnement de texte est réalisé sur l'image d'avant-plan d'image, de façon à obtenir un résultat de positionnement, de telle sorte que la précision du positionnement de texte d'image est améliorée, et la robustesse de l'arrière-plan complexe est améliorée.
PCT/CN2019/117036 2019-09-19 2019-11-11 Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage WO2021051527A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910884634.0 2019-09-19
CN201910884634.0A CN110807454B (zh) 2019-09-19 2019-09-19 基于图像分割的文字定位方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021051527A1 true WO2021051527A1 (fr) 2021-03-25

Family

ID=69487698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117036 WO2021051527A1 (fr) 2019-09-19 2019-11-11 Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage

Country Status (2)

Country Link
CN (1) CN110807454B (fr)
WO (1) WO2021051527A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111880A (zh) * 2021-05-12 2021-07-13 中国平安人寿保险股份有限公司 证件图像校正方法、装置、电子设备及存储介质
CN113687823A (zh) * 2021-07-30 2021-11-23 稿定(厦门)科技有限公司 基于html的四边形区块非线性变换方法及其系统
CN114565915A (zh) * 2022-04-24 2022-05-31 深圳思谋信息科技有限公司 样本文本图像获取方法、文本识别模型训练方法和装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768345B (zh) * 2020-05-12 2023-07-14 北京奇艺世纪科技有限公司 身份证背面图像的校正方法、装置、设备及存储介质
CN113963339A (zh) * 2021-09-02 2022-01-21 泰康保险集团股份有限公司 一种信息提取方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458770A (zh) * 2008-12-24 2009-06-17 北京文通科技有限公司 一种文字识别的方法和系统
CN101515984A (zh) * 2008-02-19 2009-08-26 佳能株式会社 电子文档生成设备及电子文档生成方法
US20170124417A1 (en) * 2014-11-14 2017-05-04 Adobe Systems Incorporated Facilitating Text Identification and Editing in Images
CN108885699A (zh) * 2018-07-11 2018-11-23 深圳前海达闼云端智能科技有限公司 字符识别方法、装置、存储介质及电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4201812B2 (ja) * 2004-03-25 2008-12-24 三洋電機株式会社 情報データ提供装置、および画像処理装置
CN105574513B (zh) * 2015-12-22 2017-11-24 北京旷视科技有限公司 文字检测方法和装置
CN109993160B (zh) * 2019-02-18 2022-02-25 北京联合大学 一种图像矫正及文本与位置识别方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515984A (zh) * 2008-02-19 2009-08-26 佳能株式会社 电子文档生成设备及电子文档生成方法
CN101458770A (zh) * 2008-12-24 2009-06-17 北京文通科技有限公司 一种文字识别的方法和系统
US20170124417A1 (en) * 2014-11-14 2017-05-04 Adobe Systems Incorporated Facilitating Text Identification and Editing in Images
CN108885699A (zh) * 2018-07-11 2018-11-23 深圳前海达闼云端智能科技有限公司 字符识别方法、装置、存储介质及电子设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111880A (zh) * 2021-05-12 2021-07-13 中国平安人寿保险股份有限公司 证件图像校正方法、装置、电子设备及存储介质
CN113111880B (zh) * 2021-05-12 2023-10-17 中国平安人寿保险股份有限公司 证件图像校正方法、装置、电子设备及存储介质
CN113687823A (zh) * 2021-07-30 2021-11-23 稿定(厦门)科技有限公司 基于html的四边形区块非线性变换方法及其系统
CN113687823B (zh) * 2021-07-30 2023-08-01 稿定(厦门)科技有限公司 基于html的四边形区块非线性变换方法及其系统
CN114565915A (zh) * 2022-04-24 2022-05-31 深圳思谋信息科技有限公司 样本文本图像获取方法、文本识别模型训练方法和装置

Also Published As

Publication number Publication date
CN110807454A (zh) 2020-02-18
CN110807454B (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
WO2021051527A1 (fr) Procédé, appareil et dispositif de positionnement de texte basé sur la segmentation d'image et support de stockage
US11645826B2 (en) Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks
CN109492643B (zh) 基于ocr的证件识别方法、装置、计算机设备及存储介质
CN110569832B (zh) 基于深度学习注意力机制的文本实时定位识别方法
US20190304066A1 (en) Synthesis method of chinese printed character images and device thereof
WO2018233055A1 (fr) Procédé et appareil d'entrée d'informations de police, dispositif informatique et support d'informations
WO2018233038A1 (fr) Procédé basé sur un apprentissage profond, appareil et dispositif de reconnaissance de plaque d'immatriculation, et support d'informations
CN110874618B (zh) 基于小样本的ocr模板学习方法、装置、电子设备及介质
CN109255300B (zh) 票据信息提取方法、装置、计算机设备及存储介质
US11341605B1 (en) Document rectification via homography recovery using machine learning
Khare et al. Arbitrarily-oriented multi-lingual text detection in video
CN112396047B (zh) 训练样本生成方法、装置、计算机设备和存储介质
US11881043B2 (en) Image processing system, image processing method, and program
CN112926469A (zh) 基于深度学习ocr与版面结构的证件识别方法
CN113158895A (zh) 票据识别方法、装置、电子设备及存储介质
Zhang et al. Marior: Margin removal and iterative content rectification for document dewarping in the wild
CN111145124A (zh) 一种图像倾斜的校正方法及装置
US20210209393A1 (en) Image processing system, image processing method, and program
CN108090728B (zh) 一种基于智能终端的快递信息录入方法及录入系统
CN115457585A (zh) 作业批改的处理方法、装置、计算机设备及可读存储介质
WO2019071476A1 (fr) Procédé et système d'entrée d'informations express basés sur un terminal intelligent
JPH07168910A (ja) 文書レイアウト解析装置及び文書フォ−マット識別装置
Konya et al. Adaptive methods for robust document image understanding
WO2021098861A1 (fr) Procédé de reconnaissance de texte, appareil, dispositif de reconnaissance et support de stockage
CN117935271A (zh) 一种用于版面还原过程的字体大小归一化方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945752

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945752

Country of ref document: EP

Kind code of ref document: A1