CN114495146A - Image text detection method and device, computer equipment and storage medium - Google Patents

Image text detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114495146A
CN114495146A CN202210147007.0A CN202210147007A CN114495146A CN 114495146 A CN114495146 A CN 114495146A CN 202210147007 A CN202210147007 A CN 202210147007A CN 114495146 A CN114495146 A CN 114495146A
Authority
CN
China
Prior art keywords
text
image
sensitive
processed
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210147007.0A
Other languages
Chinese (zh)
Inventor
尹嘉峻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202210147007.0A priority Critical patent/CN114495146A/en
Publication of CN114495146A publication Critical patent/CN114495146A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application is applicable to the technical field of artificial intelligence, and provides an image text detection method, an image text detection device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed containing text information and a structured regional division map corresponding to the image to be processed, wherein the structured regional division map contains distribution information of a text region corresponding to sensitive text information; carrying out angular point detection on the image to be processed to obtain angular point distribution positions; extracting a text candidate area corresponding to sensitive text information from the image to be processed based on the corner distribution position and the structured area segmentation map; and performing text recognition based on the text candidate area to obtain a text detection result. The scheme can realize multi-direction and multi-scale text detection on the image, improves bill authenticity identification and data extraction accuracy, and reduces cost.

Description

Image text detection method and device, computer equipment and storage medium
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to an image text detection method and device, computer equipment and a storage medium.
Background
The automatic processing of image content identification is an important means for intelligent operation.
Taking note content identification as an example, an OCR (Optical Character Recognition) technology is generally adopted in the current automatic identification method for notes, which adopts Optical acquisition to convert characters in a paper document into an image file of a black-and-white dot matrix, and then identifies and converts the black-and-white dot matrix in the image into a text format, so as to realize identification and extraction of data content in the notes.
Due to the influence of the flexibility of bills such as invoices, the printing effect of the bills, the shooting environment and other factors, image files acquired in the automatic identification process often have the characteristics of deformation, unclear text, poor imaging quality and the like.
Disclosure of Invention
The embodiment of the application provides an image text detection method and device, computer equipment and a storage medium, and aims to solve the problems of low recognition accuracy and low recognition efficiency when text recognition is performed on image files with the characteristics of deformation, unclear text, poor imaging quality and the like in the prior art.
A first aspect of an embodiment of the present application provides an image text detection method, including:
acquiring an image to be processed containing text information and a structured regional division map corresponding to the image to be processed, wherein the structured regional division map contains distribution information of a text region corresponding to sensitive text information;
carrying out angular point detection on the image to be processed to obtain angular point distribution positions;
extracting a text candidate area corresponding to the sensitive text information from the image to be processed based on the corner distribution position and the structured area segmentation map;
and performing text recognition based on the text candidate area to obtain a text detection result.
Optionally, the extracting, based on the corner distribution position and the structured region segmentation map, a text candidate region corresponding to the sensitive text information from the image to be processed includes:
based on the structured region segmentation graph, performing text region segmentation on the image to be processed to obtain a sensitive text reference distribution region;
fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information;
and extracting the image area outlined by the text detection box from the image to be processed as the text candidate area.
Optionally, the fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information includes:
carrying out position matching on the sensitive text reference distribution area and the corner point distribution positions to obtain the relative distance between each corner point distribution position and the sensitive text reference distribution area;
selecting a target corner distribution position with the relative distance within a first set deviation range from the corner distribution positions;
and performing coordinate fitting based on the distribution positions of the target corner points to obtain a text detection box corresponding to the sensitive text information.
Optionally, the fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information includes:
performing coordinate fitting based on the corner distribution positions to obtain a candidate bounding box;
performing position matching on the sensitive text reference distribution area and the candidate bounding boxes to obtain the position deviation between each candidate bounding box and the sensitive text reference distribution area;
and selecting the boundary box with the position deviation within a second set deviation range from the candidate boundary boxes as the text detection box corresponding to the sensitive text information.
Optionally, the performing text region segmentation on the image to be processed based on the structured region segmentation map to obtain a sensitive text reference distribution region includes:
determining the positive direction orientation of the image to be processed based on the corner distribution positions;
when the direction of the positive direction is inconsistent with the reference positive direction, correcting the direction of the image to be processed;
and based on the structured region segmentation map, performing text region segmentation on the image to be processed after the orientation correction to obtain a sensitive text reference distribution region.
Optionally, the performing text recognition based on the text candidate region to obtain a text detection result includes:
performing text recognition on the text candidate area to obtain sensitive text content;
and filling the sensitive text content into the text display template according to the display position corresponding to the sensitive text information preset in the text display template to obtain the text detection result.
Optionally, the acquiring the to-be-processed image including the text information includes:
and carrying out image shooting or image scanning on the paper book placed in the image acquisition area by using image acquisition equipment to obtain the image to be processed containing the text information.
A second aspect of the embodiments of the present application provides an image text detection apparatus, including:
the acquisition module is used for acquiring an image to be processed containing text information and a structured regional division map corresponding to the image to be processed, wherein the structured regional division map contains distribution information of a text region corresponding to the sensitive text information;
the detection module is used for carrying out corner detection on the image to be processed to obtain a corner distribution position;
the extraction module is used for extracting a text candidate region corresponding to the sensitive text information from the image to be processed based on the angular point distribution position and the structured region segmentation map;
and the recognition module is used for performing text recognition based on the text candidate area to obtain a text detection result.
A third aspect of embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
A fifth aspect of the present application provides a computer program product, which, when run on a terminal, causes the terminal to perform the steps of the method of the first aspect described above.
As can be seen from the above, in the embodiment of the present application, by obtaining the corner distribution position and the text region segmentation map corresponding to the image to be processed, and performing text boundary verification by using the distribution information of the text region of the sensitive information in the corner distribution position and the text region segmentation map, the determination of the correct text candidate region is achieved, and further the text recognition is achieved, the robustness of the network for detecting multi-direction and multi-scale texts is enhanced, the problem of poor recognition effect caused by the problems of unclear text, incomplete ticket face and poor imaging quality of the current ticket automatic recognition is solved, the ticket authenticity identification and data extraction accuracy is improved, the automation and intellectualization of the audit reimbursement process is facilitated, the time consumption is reduced, and the cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a first flowchart of a text detection method based on an image according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a distribution of structured areas of an image to be processed in an embodiment of the present application;
FIG. 3 is a flowchart II of a method for detecting text based on images according to an embodiment of the present application;
fig. 4 is a structural diagram of an apparatus for detecting a text based on an image according to an embodiment of the present application;
fig. 5 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In particular implementations, the terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).
In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.
Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.
It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a first flowchart of an image text detection method provided in an embodiment of the present application. As shown in fig. 1, an image text detection method includes the following steps:
step 101, obtaining an image to be processed containing text information and a structured region segmentation graph corresponding to the image to be processed.
The structured region segmentation map includes distribution information of text regions corresponding to the sensitive text information.
The different text regions in the structured region segmentation map are arranged in a relatively fixed structured form. Different text areas are used for realizing content filling of the corresponding sensitive text information in the text areas.
The sensitive text information is a text object which needs content identification and extraction.
For example, when the image to be processed is an invoice image, the sensitive text information is specifically information such as a taxpayer identification number, an account opening row and an account number corresponding to a buyer in the invoice, a price tax amount, a taxpayer identification number, an account opening row and an account number corresponding to a seller, and the like.
When the to-be-processed image containing the text information is acquired, the method specifically includes: and carrying out image shooting or image scanning on the paper book placed in the image acquisition area by using image acquisition equipment to obtain an image to be processed containing text information.
The paper book is, for example, an invoice, a work order, a financial statement, and the like, and correspondingly, the image to be processed is, for example, an image with structured information content, such as an invoice, a work order, a financial statement, and the like.
Wherein, different images to be processed correspond to corresponding bill types, and different bill types have corresponding structured region segmentation maps.
When the structured region segmentation map corresponding to the image to be processed is obtained, the method specifically includes: and matching the segmentation graph of the structured area corresponding to the image to be processed based on the bill type of the image to be processed.
That is, the structured regional division map is obtained by selecting a text regional division map corresponding to the type of the bill from the database based on the type of the bill to which the image to be processed belongs.
For example, please refer to fig. 2 together, which is a schematic diagram of a structured region segmentation map corresponding to an invoice in a specific example. And when the type of the image to be processed is determined to be the invoice image, selecting the structured area segmentation graph corresponding to the invoice from the database. The structured region segmentation map contains distribution information of text regions corresponding to the sensitive text information in the invoice. The system specifically comprises information filling columns such as a taxpayer identification number and an account number of an account opening line corresponding to a purchaser, a price tax amount filling column, and information filling columns such as a taxpayer identification number, an account number and the like corresponding to a seller. The fill-in fields are arranged in a relatively fixed structured form.
The structured region segmentation map is divided into a plurality of text structure regions, each text structure region representing a corresponding text distribution position.
And the distribution information of the text region in the segmentation map based on the structured region assists in identifying the position of the text in the image to be processed subsequently, and effective text positioning information is provided.
And 102, carrying out corner detection on the image to be processed to obtain a corner distribution position.
After the image to be processed is obtained, text detection is required, and specifically, the correct distribution position of the text needs to be detected first.
Here, a corner-based text detection algorithm is specifically used to output quadrilateral corners that closely surround the text candidate region to produce a multi-directional text candidate region.
Corner Detection (Corner Detection) is a method used in a computer vision system to obtain image features, and is widely applied in the fields of motion Detection, image matching, video tracking, three-dimensional modeling, target identification and the like, and is also called feature point Detection.
The corner points are extreme points, i.e., points with particularly prominent attributes in some aspect. The corner point may be an intersection of two lines, or a point located on two adjacent objects with different main directions, or an isolated point with maximum or minimum intensity on some property, an end point of a line segment, or a point with maximum local curvature on a curve. More strictly speaking, a local neighborhood of a corner point should have differently oriented borders of two different regions. In practical applications, the corner detection method detects image points with specific features, not just "corners". These feature points have specific coordinates in the image and have certain mathematical features such as local maximum or minimum gray levels, certain gradient features, etc.
In this step, the text detection based on the corner detection can detect the corner distribution positions of the text in four directions, namely, the upper left direction, the upper right direction, the lower right direction and the lower left direction of a suspected text object, without considering the size, the length-width ratio or the direction of the text, so that the text with multiple directions, large length-width ratio variation range and multiple sizes can be detected, and the validity and the robustness of the text detection system for detecting the distorted text, the multiple directions, the multiple sizes and the text with any length are ensured.
And 103, extracting a text candidate area corresponding to the sensitive text information from the image to be processed based on the corner distribution position and the structured area segmentation map.
After the angular point distribution position and the structured area segmentation image are obtained, mutual matching verification can be carried out through position information between the angular point distribution position and the structured area segmentation image, so that the distribution position of effective information is determined from the distorted, deformed and unsharp imaged image to be processed, and a text detection frame corresponding to the distribution position is found.
After the segmentation map of the structured area and the distribution positions of the angular points are combined, effective quadrilateral angular points which tightly surround the text candidate area can be obtained, invalid quadrilateral angular points are eliminated, and the text candidate area is accurately extracted from the distorted and deformed image.
And step 104, performing text recognition based on the text candidate area to obtain a text detection result.
After the text candidate area is obtained, since the text candidate area only contains text boundary information but cannot really determine whether the text is a real target text, text recognition needs to be further performed based on the text candidate area.
Specifically, the text recognition may be performed on the candidate text region in combination with image feature information extracted from the image to be processed, so as to ensure the accuracy of the text recognition.
The image to be processed can be input to a feature extraction network to extract image feature information, and text prediction operation is performed based on image features in the text candidate area to obtain a final result. The feature extraction network is realized by using a deep convolutional neural network VGG, a deep residual error network Resnet, and the like.
Further, in response to the display effect of different text detection results, in a specific embodiment, the performing text recognition based on the text candidate region to obtain the text detection result may include:
performing text recognition on the text candidate area to obtain sensitive text content;
and filling the sensitive text content into the text display template according to the display position corresponding to the preset sensitive text information in the text display template to obtain a text detection result.
The process can effectively improve the ordered display effect of the text recognition result after the text recognition, and improve the intelligence and the user friendliness of the image text recognition.
In the embodiment of the application, the corner distribution position and the text region segmentation map corresponding to the image to be processed are obtained, text boundary verification is carried out by utilizing the distribution information of the corner distribution position and the text region of sensitive information in the text region segmentation map, the determination of a correct text candidate region is realized, and then text recognition is realized, the robustness of a network for detecting multi-direction and multi-scale texts is enhanced, the problem that the recognition effect is poor due to the problems of unclear text, incomplete ticket face and poor imaging quality of current ticket automatic recognition is solved, the ticket authenticity identification and data extraction accuracy is improved, the automation and intellectualization of the verification and reimbursement process are facilitated, the time consumption is reduced, and the cost is reduced.
The embodiment of the application also provides different implementation modes of the image text detection method.
Referring to fig. 3, fig. 3 is a second flowchart of an image text detection method provided in the embodiment of the present application. As shown in fig. 3, an image text detection method includes the following steps:
step 301, obtaining an image to be processed containing text information and a structured region segmentation map corresponding to the image to be processed.
The structured region segmentation map includes distribution information of text regions corresponding to the sensitive text information.
The implementation process of this step is the same as that of step 101 in the foregoing embodiment, and is not described here again.
Step 302, performing corner detection on the image to be processed to obtain a corner distribution position.
The implementation process of this step is the same as that of step 102 in the foregoing embodiment, and is not described here again.
And 303, performing text region segmentation on the image to be processed based on the structured region segmentation map to obtain a sensitive text reference distribution region.
In the process, the image to be processed needs to be segmented based on the distribution information of the text region corresponding to the sensitive text information included in the structured region segmentation map, so as to realize the preliminary positioning of the approximate distribution position of each text part in the image to be processed.
In order to ensure the accuracy of text region segmentation, in an optional embodiment, the segmenting the text region of the image to be processed based on the structured region segmentation map to obtain the sensitive text reference distribution region includes:
determining the positive direction orientation of the image to be processed based on the distribution positions of the angular points;
when the positive direction is inconsistent with the reference positive direction, correcting the direction of the image to be processed;
and based on the structured region segmentation graph, performing text region segmentation on the image to be processed after the orientation correction to obtain a sensitive text reference distribution region.
The determination of the positive direction orientation of the image to be processed is specifically realized based on the corner point distribution positions identified in the image to be processed.
Specifically, the image to be processed is an image with text content, the identified corner distribution positions can indicate the outline range of the text content in the image to be processed, the approximate distribution positions and the arrangement direction of the text are determined based on the text outline range corresponding to the corner distribution positions, and then the positive direction orientation of the image to be processed can be determined.
When the forward direction orientation deviates, the accuracy of text region segmentation of the image to be processed based on the structured region segmentation map is reduced, so that on the basis, the orientation of the image to be processed needs to be corrected based on the reference forward direction, the corrected forward direction orientation of the image to be processed is consistent with the reference forward direction, and the effectiveness and the accuracy of text region segmentation of the image to be processed are further ensured.
Step 304, fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information;
when fitting the corner distribution positions, the invalid corner distribution positions need to be removed based on the segmented sensitive text reference distribution areas, so that the corner information is screened. I.C. A
When fitting the corner distribution positions, the corner distribution position reference sensitive text reference distribution area may be divided into a plurality of groups according to coordinates of the corner distribution positions, a plurality of corner distribution positions included in each group are considered to correspond to one sensitive text message, and coordinates of all the corner distribution positions in each group are connected to obtain a corresponding detection box.
In a specific embodiment, fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information includes:
performing position matching on the sensitive text reference distribution area and the corner point distribution positions to obtain the relative distance between each corner point distribution position and the sensitive text reference distribution area;
selecting a target corner distribution position with a relative distance within a first set deviation range from the corner distribution positions;
and performing coordinate fitting based on the distribution positions of the target corner points to obtain a text detection box corresponding to the sensitive text information.
When the sensitive text reference distribution area is used for screening the corner distribution positions, the relative distance between each corner distribution position and the sensitive text reference distribution area is used for respectively screening different corner distribution positions one by one to obtain target corner distribution positions, and then coordinate fitting is carried out based on the target corner distribution positions to obtain text detection boxes corresponding to sensitive text information.
Namely, the process is that the angular point distribution positions are screened firstly, and then the fitting of the angular point distribution positions is implemented on the basis of the screened target angular point distribution positions which meet the set deviation requirement, so as to obtain the text detection frame.
Differently, in another specific embodiment, fitting the corner distribution positions based on the sensitive text reference distribution region to obtain a text detection box corresponding to the sensitive text information includes:
performing coordinate fitting based on the angular point distribution position to obtain a candidate bounding box;
performing position matching on the sensitive text reference distribution area and the candidate bounding boxes to obtain the position deviation between each candidate bounding box and the sensitive text reference distribution area;
and selecting the boundary box with the position deviation within a second set deviation range from the candidate boundary boxes as a text detection box corresponding to the sensitive text information.
When the sensitive text reference distribution area is used for screening corner distribution positions, coordinate fitting is carried out on each corner distribution position to obtain candidate boundary boxes, screening of the corner distribution positions is carried out on the basis of the candidate boundary boxes by combining the sensitive text reference distribution area, and text boxes meeting set deviation requirements are screened out to serve as text detection boxes corresponding to sensitive text information by obtaining position deviation between the sensitive text reference distribution area and each candidate boundary box.
Namely, the process is that the corner distribution positions are fitted, and then the text detection boxes which accord with the deviation range are screened out finally by combining the sensitive text reference distribution area on the basis of the candidate boundary boxes obtained by fitting.
The process realizes the prediction of the distribution region of the sensitive text information by detecting the corner points of the text and combining the reference distribution region of the divided sensitive text, and simultaneously integrates the processes of sampling the corner points and combining to generate a text boundary box in the process, thereby realizing the error elimination of the text region sensitive to the position, reducing the false alarm rate and improving the accuracy of image text recognition.
Step 305, extracting the image area framed by the text detection box from the image to be processed as a text candidate area.
And after the text detection box is obtained, directly taking the image area selected by the text detection box as a text candidate area to implement text recognition operation and ensure the validity of text recognition.
And step 306, performing text recognition based on the text candidate area to obtain a text detection result.
The implementation process of this step is the same as that of step 104 in the foregoing embodiment, and is not described here again.
The embodiment of the application can acquire and process related data in the implementation process based on the artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
In the embodiment of the application, by acquiring the corner distribution position and the text region segmentation map corresponding to the image to be processed, performing text boundary verification by using the distribution information of the corner distribution position and the text region of the sensitive information in the text region segmentation map, fitting the corner distribution position based on the reference distribution region of the sensitive text segmented from the image to be processed to obtain the text detection frame corresponding to the sensitive text information, determining the correct text candidate region, further realizing text identification, enhancing the robustness of a network for detecting multi-direction and multi-scale texts, solving the problem of poor identification effect caused by the problems of unclear text, incomplete ticket surface and poor imaging quality of the current automatic ticket identification, improving the accuracy of ticket authenticity identification and data extraction, helping to examine and report the marketing process automation, intellectualization and reducing time consumption, the cost is reduced.
Referring to fig. 4, fig. 4 is a structural diagram of an image text detection apparatus according to an embodiment of the present application, and only a part related to the embodiment of the present application is shown for convenience of description.
The image text detection apparatus 400 includes:
an obtaining module 401, configured to obtain an image to be processed including text information and a structured region segmentation map corresponding to the image to be processed, where the structured region segmentation map includes distribution information of a text region corresponding to sensitive text information;
a detection module 402, configured to perform corner detection on the image to be processed to obtain a corner distribution position;
an extracting module 403, configured to extract a text candidate region corresponding to the sensitive text information from the image to be processed based on the corner distribution position and the structured region segmentation map;
and an identifying module 404, configured to perform text identification based on the text candidate region to obtain a text detection result.
The extracting module 403 is specifically configured to:
based on the structured region segmentation graph, performing text region segmentation on the image to be processed to obtain a sensitive text reference distribution region;
fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information;
and extracting the image area outlined by the text detection box from the image to be processed as the text candidate area.
Wherein the extracting module 403 is further configured to:
carrying out position matching on the sensitive text reference distribution area and the corner point distribution positions to obtain the relative distance between each corner point distribution position and the sensitive text reference distribution area;
selecting a target corner distribution position with the relative distance within a first set deviation range from the corner distribution positions;
and performing coordinate fitting based on the distribution positions of the target corner points to obtain a text detection box corresponding to the sensitive text information.
Wherein the extracting module 403 is further configured to:
performing coordinate fitting based on the corner distribution positions to obtain a candidate bounding box;
performing position matching on the sensitive text reference distribution area and the candidate bounding boxes to obtain the position deviation between each candidate bounding box and the sensitive text reference distribution area;
and selecting the boundary box with the position deviation within a second set deviation range from the candidate boundary boxes as the text detection box corresponding to the sensitive text information.
Wherein the extracting module 403 is further configured to:
determining the positive direction orientation of the image to be processed based on the corner distribution positions;
when the positive direction is inconsistent with the reference positive direction, correcting the direction of the image to be processed;
and based on the structured region segmentation map, performing text region segmentation on the image to be processed after the orientation correction to obtain a sensitive text reference distribution region.
The identifying module 404 is specifically configured to:
performing text recognition on the text candidate area to obtain sensitive text content;
and filling the sensitive text content into the text display template according to the display position corresponding to the sensitive text information preset in the text display template to obtain the text detection result.
The obtaining module 401 is specifically configured to:
and carrying out image shooting or image scanning on the paper book placed in the image acquisition area by using image acquisition equipment to obtain the image to be processed containing the text information.
The image text detection device provided by the embodiment of the application can realize each process of the embodiment of the image text detection method, can achieve the same technical effect, and is not repeated here to avoid repetition.
Fig. 5 is a block diagram of a computer device according to an embodiment of the present application. As shown in the figure, the computer device 5 of the embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the steps of any of the various method embodiments described above being implemented when the computer program 52 is executed by the processor 50.
The computer device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer device 5 may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a computer device 5 and is not intended to limit the computer device 5 and may include more or fewer components than shown, or some of the components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.
The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the computer device 5, such as a hard disk or a memory of the computer device 5. The memory 51 may also be an external storage device of the computer device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the computer device 5. The memory 51 is used for storing the computer program and other programs and data required by the computer device. The memory 51 may also be used to temporarily store data that has been output or is to be output.
The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The present application realizes all or part of the processes in the method of the above embodiments, and may also be implemented by a computer program product, when the computer program product runs on a terminal, the steps in the above method embodiments may be implemented when the terminal executes the computer program product.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. An image text detection method, comprising:
acquiring an image to be processed containing text information and a structured regional division map corresponding to the image to be processed, wherein the structured regional division map contains distribution information of a text region corresponding to sensitive text information;
carrying out corner detection on the image to be processed to obtain a corner distribution position;
extracting a text candidate area corresponding to the sensitive text information from the image to be processed based on the corner distribution position and the structured area segmentation map;
and performing text recognition based on the text candidate area to obtain a text detection result.
2. The method according to claim 1, wherein the extracting a text candidate region corresponding to the sensitive text information from the image to be processed based on the corner distribution position and the structured region segmentation map comprises:
based on the structured region segmentation graph, performing text region segmentation on the image to be processed to obtain a sensitive text reference distribution region;
fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information;
and extracting an image area framed by the text detection box from the image to be processed as the text candidate area.
3. The method according to claim 2, wherein the fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information includes:
carrying out position matching on the sensitive text reference distribution area and the corner point distribution positions to obtain the relative distance between each corner point distribution position and the sensitive text reference distribution area;
selecting a target corner distribution position with the relative distance within a first set deviation range from the corner distribution positions;
and performing coordinate fitting based on the distribution positions of the target corner points to obtain a text detection box corresponding to the sensitive text information.
4. The method according to claim 2, wherein the fitting the corner distribution positions based on the sensitive text reference distribution area to obtain a text detection box corresponding to the sensitive text information comprises:
performing coordinate fitting based on the corner distribution positions to obtain a candidate bounding box;
performing position matching on the sensitive text reference distribution area and the candidate bounding boxes to obtain the position deviation between each candidate bounding box and the sensitive text reference distribution area;
and selecting the boundary box with the position deviation within a second set deviation range from the candidate boundary boxes as the text detection box corresponding to the sensitive text information.
5. The method according to claim 2, wherein the performing text region segmentation on the image to be processed based on the structured region segmentation map to obtain a sensitive text reference distribution region comprises:
determining the positive direction orientation of the image to be processed based on the corner distribution positions;
when the positive direction is inconsistent with the reference positive direction, correcting the direction of the image to be processed;
and based on the structured region segmentation map, performing text region segmentation on the image to be processed after the orientation correction to obtain a sensitive text reference distribution region.
6. The method of claim 1, wherein performing text recognition based on the text candidate region to obtain a text detection result comprises:
performing text recognition on the text candidate area to obtain sensitive text content;
and filling the sensitive text content into the text display template according to the display position corresponding to the sensitive text information preset in the text display template to obtain the text detection result.
7. The method according to claim 1, wherein the obtaining of the image to be processed containing the text information comprises:
and carrying out image shooting or image scanning on the paper book placed in the image acquisition area by using image acquisition equipment to obtain the image to be processed containing the text information.
8. An image text detection apparatus, comprising:
the acquisition module is used for acquiring an image to be processed containing text information and a structured regional division map corresponding to the image to be processed, wherein the structured regional division map contains distribution information of a text region corresponding to the sensitive text information;
the detection module is used for carrying out corner detection on the image to be processed to obtain a corner distribution position;
the extraction module is used for extracting a text candidate region corresponding to the sensitive text information from the image to be processed based on the angular point distribution position and the structured region segmentation map;
and the recognition module is used for performing text recognition based on the text candidate area to obtain a text detection result.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210147007.0A 2022-02-17 2022-02-17 Image text detection method and device, computer equipment and storage medium Pending CN114495146A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210147007.0A CN114495146A (en) 2022-02-17 2022-02-17 Image text detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210147007.0A CN114495146A (en) 2022-02-17 2022-02-17 Image text detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114495146A true CN114495146A (en) 2022-05-13

Family

ID=81481857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210147007.0A Pending CN114495146A (en) 2022-02-17 2022-02-17 Image text detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114495146A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840477A (en) * 2022-06-30 2022-08-02 深圳乐播科技有限公司 File sensitivity index determining method based on cloud conference and related product

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840477A (en) * 2022-06-30 2022-08-02 深圳乐播科技有限公司 File sensitivity index determining method based on cloud conference and related product
CN114840477B (en) * 2022-06-30 2022-09-27 深圳乐播科技有限公司 File sensitivity index determining method based on cloud conference and related product

Similar Documents

Publication Publication Date Title
CN110866495B (en) Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
US10977513B2 (en) Method, system and computer readable storage medium for identifying information carried on sheet
US10140511B2 (en) Building classification and extraction models based on electronic forms
US20190385054A1 (en) Text field detection using neural networks
US20190294921A1 (en) Field identification in an image using artificial intelligence
US20170147552A1 (en) Aligning a data table with a reference table
US11361570B2 (en) Receipt identification method, apparatus, device and storage medium
CN111209827B (en) Method and system for OCR (optical character recognition) bill problem based on feature detection
CA3052248C (en) Detecting orientation of textual documents on a live camera feed
CN113627428A (en) Document image correction method and device, storage medium and intelligent terminal device
US11727701B2 (en) Techniques to determine document recognition errors
CN111507354A (en) Information extraction method, device, equipment and storage medium
CN112668580A (en) Text recognition method, text recognition device and terminal equipment
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
Arslan End to end invoice processing application based on key fields extraction
CN114495146A (en) Image text detection method and device, computer equipment and storage medium
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
CN111079771B (en) Method, system, terminal equipment and storage medium for extracting characteristics of click-to-read image
CN113128496B (en) Method, device and equipment for extracting structured data from image
CN112101356A (en) Method and device for positioning specific text in picture and storage medium
CN112084364A (en) Object analysis method, local image search method, device, and storage medium
CN111291758A (en) Method and device for identifying characters of seal
CN113474786A (en) Electronic purchase order identification method and device and terminal equipment
CN113435331B (en) Image character recognition method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination