CN109977949B - Frame fine adjustment text positioning method and device, computer equipment and storage medium - Google Patents

Frame fine adjustment text positioning method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN109977949B
CN109977949B CN201910214068.2A CN201910214068A CN109977949B CN 109977949 B CN109977949 B CN 109977949B CN 201910214068 A CN201910214068 A CN 201910214068A CN 109977949 B CN109977949 B CN 109977949B
Authority
CN
China
Prior art keywords
text
identity card
frame
position information
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910214068.2A
Other languages
Chinese (zh)
Other versions
CN109977949A (en
Inventor
张欢
李爱林
周先得
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huafu Technology Co ltd
Original Assignee
Shenzhen Huafu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huafu Technology Co ltd filed Critical Shenzhen Huafu Technology Co ltd
Priority to CN201910214068.2A priority Critical patent/CN109977949B/en
Publication of CN109977949A publication Critical patent/CN109977949A/en
Application granted granted Critical
Publication of CN109977949B publication Critical patent/CN109977949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention relates to a text positioning method, a device, a computer device and a storage medium for frame fine adjustment, wherein the method comprises the steps of obtaining an identity card image to be positioned; preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame; predicting fine tuning parameters by adopting a fine tuning model; and adjusting the candidate detection frame according to the fine tuning parameters to obtain text position information. According to the invention, after the outer frame is determined aiming at the acquired identity card image to be positioned, the candidate detection frames of the text region are determined, the positions of the candidate detection frames are finely adjusted by utilizing the fine adjustment model to obtain the accurate text frame, the text region positioning is carried out after the priori information of the identity card text region distribution is fully utilized, the fine adjustment model is easy to converge, small and exquisite and fast, the network training is easy to converge, the positioning speed is fast and the accuracy is high.

Description

Frame fine adjustment text positioning method and device, computer equipment and storage medium
Technical Field
The invention relates to an identification card recognition method, in particular to a text positioning method, a device, computer equipment and a storage medium for frame fine adjustment.
Background
An identification card is a certificate for proving the identity of a bearer, and is often issued to citizens by governments of various countries or regions. It will be used as a proof tool for the unique citizen identity of each person, and the identity card is attached with text information which generally indicates the identity information of the corresponding person. The text positioning of the identity card is a key part in an identity card recognition algorithm, and whether the text position positioning directly influences the effect of character recognition or not.
The existing identification card text positioning method is to perform text positioning by using a traditional image recognition method, such as denoising an image, graying, binarizing, contour extraction, morphological transformation and other methods to determine the text position of the identification card. The method has low accuracy and is not suitable for commercial use. Another positioning method is to utilize deep learning technology to perform text positioning, the method is mainly divided into two modes, namely end-to-end text positioning, namely text line positions required to be positioned by an identity card in a picture are directly output through a network, network training is not easy to converge under a copying scene, and false recognition of text lines outside an identity card area is easy to occur; the other way is to detect the identification card area first, then locate the text in the area, and detect the identification card area, and generally, use the commonly used object detection network such as Faster RCNN, yolo, SSD, etc. to locate the area and try to correct the angle. There are two methods for positioning the text in the identification card area, one is to consider that the current area is limited to the identification card area, and the positioning can be performed by using the traditional image processing method, but the traditional method is difficult to obtain better effects even when encountering the situations of stains, uneven illumination, shielding and the like, and the other is to continuously use the object detection network and even the proprietary text line detection network to perform text positioning.
Therefore, a new method is necessary to be designed, the network training is easy to converge, the positioning speed is high, and the accuracy is high.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a text positioning method, a device, computer equipment and a storage medium for frame fine adjustment.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the text positioning method for frame fine adjustment comprises the following steps:
acquiring an identity card image to be positioned;
preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame;
predicting fine tuning parameters by adopting a fine tuning model;
and adjusting the candidate detection frame according to the fine tuning parameters to obtain text position information.
The further technical scheme is as follows: the preliminary positioning of the text region is determined for the identity card image to be positioned to obtain a candidate detection frame, which comprises the following steps:
determining an identity card outer frame of an identity card image to be positioned;
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned so as to obtain text parameters;
determining position information of a text region according to the text parameters to form candidate text boxes;
and expanding the width and the height of the candidate text boxes to form candidate detection boxes.
The further technical scheme is as follows: the obtaining the relative position information of the outer frame of the identity card and the text area of the image of the identity card to be positioned to obtain text parameters comprises the following steps:
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned through a training set so as to obtain text parameters;
the training set is obtained by training a plurality of identity card pictures with the position information of the marked text region and the position information of the outer frame of the identity card.
The further technical scheme is as follows: before the fine tuning model is adopted to predict the fine tuning parameters, the method comprises the following steps:
and carrying out graying treatment on the candidate detection frames.
The further technical scheme is as follows: the fine tuning model is a model obtained by training in a convolutional neural network after the text box marked with the text region position is deformed.
The further technical scheme is as follows: after the candidate detection frame is adjusted according to the fine tuning parameter to obtain the text position information, the method further comprises the following steps:
intercepting an identity card image to be positioned according to the text position information to form a text image;
and identifying the text image to obtain the identity card text.
The invention also provides a text positioning device for frame fine adjustment, which comprises:
the image acquisition unit is used for acquiring an identity card image to be positioned;
the candidate frame forming unit is used for carrying out preliminary positioning on the text area determined by the identity card image to be positioned so as to obtain a candidate detection frame;
the parameter prediction unit is used for predicting fine tuning parameters by adopting a fine tuning model;
and the adjusting unit is used for adjusting the candidate detection frame according to the fine adjustment parameters so as to obtain text position information.
The further technical scheme is as follows: the candidate frame forming unit includes:
the outer frame determining subunit is used for determining an identity card outer frame of an identity card image to be positioned;
the parameter forming subunit is used for acquiring the relative position information of the outer frame of the identity card and the text area of the identity card image to be positioned so as to obtain text parameters;
a position information determining subunit, configured to determine position information of the text region according to the text parameter, so as to form a candidate text box;
and the expansion subunit is used for expanding the width and the height of the candidate text boxes to form candidate detection boxes.
The invention also provides a computer device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the method when executing the computer program.
The present invention also provides a storage medium storing a computer program which, when executed by a processor, performs the above-described method.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, after the outer frame is determined aiming at the acquired identity card image to be positioned, the candidate detection frames of the text region are determined, the positions of the candidate detection frames are finely adjusted by utilizing the fine adjustment model to obtain the accurate text frame, the text region positioning is carried out after the priori information of the identity card text region distribution is fully utilized, the fine adjustment model is easy to converge, small and exquisite and fast, the network training is easy to converge, the positioning speed is fast and the accuracy is high.
The invention is further described below with reference to the drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application scenario schematic diagram of a text positioning method for frame fine adjustment provided by an embodiment of the present invention;
fig. 2 is a flow chart of a text positioning method for frame fine adjustment according to an embodiment of the present invention;
fig. 3 is a schematic sub-flowchart of a text positioning method for frame fine adjustment according to an embodiment of the present invention;
FIG. 4 is a schematic diagram I of a candidate detection frame according to an embodiment of the present invention;
FIG. 5 is a second schematic diagram of a candidate detection frame according to an embodiment of the present invention;
FIG. 6 is a third schematic diagram of a candidate detection frame according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an expanded candidate text box according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of text parameters provided by an embodiment of the present invention;
FIG. 9 is a schematic diagram of training of a fine tuning model according to the present invention;
fig. 10 is a flowchart illustrating a text positioning method for bezel trimming according to another embodiment of the present invention;
FIG. 11 is a schematic block diagram of a text positioning device for bezel fine tuning provided by an embodiment of the present invention;
FIG. 12 is a schematic block diagram of a text positioning device for bezel trimming according to another embodiment of the present invention;
fig. 13 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a text positioning method for frame fine adjustment according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a text positioning method for bezel trimming according to an embodiment of the present invention. The frame fine-tuning text positioning method is applied to a server, the server interacts with a terminal, an identity card image to be positioned is obtained from the terminal, then a text region of the identity card image to be positioned is preliminarily determined, and a candidate detection frame for defining the text region is fine-tuned to obtain a high-accuracy identity card text.
Fig. 2 is a flowchart illustrating a text positioning method for bezel fine adjustment according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S150.
S110, acquiring an identity card image to be positioned.
In this embodiment, the image of the identification card to be located refers to an image with the identification card and a background, and is usually obtained by shooting by a terminal with a camera function.
S120, preliminary positioning of a text area is determined on the identity card image to be positioned, so that a candidate detection frame is obtained.
In this embodiment, the candidate detection box refers to a circumscribed polygonal box having an area larger than the text region and internally enclosing the text region, as shown in fig. 4.
In one embodiment, referring to fig. 3, the step S120 may include steps S121 to S124.
S121, determining the outer frame of the identity card image to be positioned.
In this embodiment, the external frame of the identification card is generally obtained by performing area location and then performing angle correction by using a commonly used object detection network such as Faster RCNN, yolo, SSD, and the like.
S122, acquiring relative position information of an outer frame of the identity card and a text area of the identity card image to be positioned so as to obtain text parameters.
In this embodiment, the text parameter refers to the relative position of the outer frame of the identification card and the text region.
Specifically, acquiring relative position information of an identity card outer frame and a text region of an identity card image to be positioned through a training set to obtain text parameters;
the training set is obtained by training a plurality of identity card pictures with the position information of the marked text region and the position information of the outer frame of the identity card.
And sending a batch of identification card sample pictures marked with the position information of the text areas to an object detection network, obtaining the relative position information of each text area and the external frame in each identification card sample picture by combining the position information of the marked text areas according to the position information of the external frame of the identification card obtained by the object detection network, wherein the external frame of the identification card is a horizontal rectangular frame, and the frame of each text area is also a horizontal rectangular frame. The rectangular box of each text region is described by four parameters: start point x coordinate, start point y coordinate, end point x coordinate, end point y coordinate.
Taking a rectangular frame of the name text region as an example, let the width of the rectangular frame of the outer frame be width, the height be height, the starting point coordinates of the outer frame be (x_zero, y_zero), the starting point coordinates of the rectangular frame of the name text region be (x_nameStart, y_nameStart), the relative position information of the starting point coordinates of the rectangular frame of the name text region be ((x_nameStart-x_zero)/width, (y_nameStart-y_zero)/height). The position information of the terminal point coordinates can be obtained similarly, rectangular frames of text areas at other positions are similar, and finally, the text parameters can be obtained by averaging relative position data obtained by all the identity card samples.
S123, determining the position information of the text area according to the text parameters to form a candidate text box.
In this embodiment, the candidate text box refers to a rectangular box with text information.
And obtaining the position information of the candidate text box according to the text parameters and the identity card outer box, as shown in fig. 6 and 7.
S124, expanding the width and the height of the candidate text boxes to form candidate detection boxes.
As can be seen from fig. 6 and fig. 7, a certain error exists in the candidate text box, and a situation that part of text information is not in the candidate text box occurs under certain conditions, so that the center position of the candidate text box needs to be fixed, and the width and the height of the candidate text box are expanded in a certain proportion, so that the text information to be positioned is ensured to be in the candidate detection box, and the positioning accuracy is improved. The candidate detection frames are arranged according to a certain sequence, and specific attributes of the candidate detection frames can be known according to the sequence, so that the attributes of the candidate detection frames do not need to be classified.
S130, performing gray scale processing on the candidate detection frames.
In this embodiment, in the RGB model, if r=g=b, the color represents a gray color, where the value of r=g=b is called a gray value, and therefore, the gray image only needs one byte for storing the gray value (also called intensity value, luminance value) per pixel, and the gray range is 0 to 255. The four methods of the component maximum value method and the average value method and the weighted average method are used for graying the color image, so that the gradient characteristics of the image can be better identified, and the accuracy of positioning the whole text can be improved.
S140, predicting the fine tuning parameters by adopting a fine tuning model.
In the present embodiment, the fine adjustment model functions to correct the position of the candidate detection frame. The fine tuning model is obtained by inputting the deformed text box marked with the text region position into a convolutional neural network for training.
Since the candidate detection frame is expanded and needs to be accurately scaled, scaling a rectangular frame position needs four parameters, as shown in fig. 8, offset parameters of x and y coordinates of a start point and an end point of the candidate detection frame relative to a start point of a standard frame of a text region need to be output, and the length and width of the candidate detection frame are normalized to be 1.
The input of the network is the candidate detection frame after the expansion and graying treatment, the output is four floating point numerical values, and the offset values of the rectangular frame starting point and the end point coordinates of the positioned text area relative to the candidate detection frame starting point are represented.
The parameters of the trained convolutional neural network are as follows:
input layer: 150×50×1 (input grayscaled picture);
convolution layer 1:150×50×1×64 (5×5 convolution);
pooling layer 1:75×25×1×64 (2×2 steps);
convolution layer 2:75×25×1×128 (5×5 convolution);
pooling layer 2:38×13×1×128 (2×2 steps);
convolution layer 3:38×13×1×256 (3×3 convolution);
pooling layer 3:19×7×1×256 (2×2 steps);
convolution layer 4:19×7×1×512 (3×3 convolution);
pooling layer 4:10×4×1×512 (2×2 steps);
full tie layer: 4000;
output layer: 4, a step of;
in order to train the convolutional neural network, the positions of rectangular frames of marked text areas are used as references, the rectangular frames of the marked text areas are randomly rocked up and down and left and right and the scales are transformed, as shown in fig. 9, intercepting areas at different positions are obtained, the intercepting areas are sent into the convolutional network to train, so that output offset values are obtained, the parameter values of the convolutional neural network are adjusted by utilizing the loss values and the positions of the rectangular frames of the actually marked text areas, so that the output offset values are close to 0, and the convolutional neural network is the fine tuning model. In particular, in practice, there are fewer than three rows in the address field, so that some pictures can be intercepted in the address field at the blank position and sent to training, and the labeling values corresponding to the four parameters should be 0. The network has simple structure, small input size, centralized input information, easy network convergence and good output. The model has small volume and higher operation speed than the general model.
And S150, adjusting the candidate detection frames according to the fine adjustment parameters to obtain text position information.
In the present embodiment, the text position information refers to four vertex coordinate information of a rectangular frame of the text region.
Specifically, the positions of four vertexes of the candidate detection frame are sequentially moved according to the offset according to the fine tuning parameters output by the fine tuning model.
When the text position information is predicted actually, the candidate text boxes of each text area are obtained by means of the positioned identity card outer boxes, the height and the width of each candidate text box are expanded and then sequentially fed into the fine adjustment model, the candidate detection boxes are correspondingly adjusted according to the output value of the fine adjustment model, and then the accurate text box area can be obtained, and if the output value of the fine adjustment model is close to 0, the text line does not exist.
According to the frame fine-tuning text positioning method, after the outer frame is determined aiming at the acquired identity card image to be positioned, the candidate detection frames of the text areas are determined, fine tuning is carried out on the positions of the candidate detection frames by utilizing the fine tuning model to obtain the accurate text frames, after prior information of the identity card text area distribution is fully utilized, the text area positioning is carried out, the fine tuning model is easy to converge, small and exquisite and high in speed, the network training is easy to converge, the positioning speed is high, and the accuracy is high.
Fig. 10 is a flowchart of a text positioning method for bezel trimming according to another embodiment of the present invention. As shown in fig. 10, the text positioning method for bezel fine adjustment in the present embodiment includes steps S210 to S270. Steps S210 to S250 are similar to steps S110 to S150 in the above embodiment, and are not described herein. Steps S260 to S270 added in the present embodiment are described in detail below.
S260, intercepting the identity card image to be positioned according to the text position information to form a text image.
Cutting the identity card graph to be positioned according to the text position information to obtain an image only containing the identity card text.
S270, identifying the text image to obtain the identity card text.
In this embodiment, an optical character recognition technology may be used to recognize the text image to obtain an id card text, and the id card text is output to a terminal for display.
Fig. 11 is a schematic block diagram of a text positioning device 300 for bezel trimming according to an embodiment of the present invention. As shown in fig. 11, the present invention further provides a text positioning device 300 with fine-tuning frame, corresponding to the above text positioning method with fine-tuning frame. The frame fine-tuning text positioning apparatus 300 includes a unit for performing the frame fine-tuning text positioning method described above, and may be configured in a server.
Specifically, referring to fig. 11, the text positioning device 300 with fine-tuned frame includes:
an image acquisition unit 301, configured to acquire an image of an identification card to be positioned;
the candidate frame forming unit 302 is configured to perform preliminary positioning of the determined text region on the identification card image to be positioned, so as to obtain a candidate detection frame;
a parameter prediction unit 304, configured to predict a trimming parameter by using a trimming model;
and the adjusting unit 305 is configured to adjust the candidate detection frame according to the fine tuning parameter to obtain text position information.
In one embodiment, as shown in fig. 6, the candidate frame forming unit 302 includes:
the outer frame determining subunit is used for determining an identity card outer frame of an identity card image to be positioned;
the parameter forming subunit is used for acquiring the relative position information of the outer frame of the identity card and the text area of the identity card image to be positioned so as to obtain text parameters;
a position information determining subunit, configured to determine position information of the text region according to the text parameter, so as to form a candidate text box;
and the expansion subunit is used for expanding the width and the height of the candidate text boxes to form candidate detection boxes.
In an embodiment, the apparatus further includes:
and a graying processing unit 303, configured to perform graying processing on the candidate detection frame.
Fig. 12 is a schematic block diagram of a text positioning device 300 with bezel trimming according to another embodiment of the present invention. As shown in fig. 12, the text positioning device 300 for trimming a frame according to the present embodiment is added with a clipping unit 306 and a recognition unit 307 on the basis of the above embodiment.
The clipping unit 306 is used for clipping the identity card image to be positioned according to the text position information to form a text image;
the recognition unit 307 is configured to recognize the text image to obtain an identity card text.
It should be noted that, as those skilled in the art can clearly understand, the specific implementation process of the text positioning device 300 and each unit for frame trimming may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, the description is omitted here.
The above-described frame fine-tuning text positioning apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 13.
Referring to fig. 13, fig. 13 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server.
With reference to FIG. 13, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a text positioning method for bezel trimming.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a frame fine-tuning text positioning method.
The network interface 505 is used for network communication with other devices. It will be appreciated by those skilled in the art that the structure shown in fig. 13 is merely a block diagram of some of the structures associated with the present application and does not constitute a limitation of the computer device 500 to which the present application is applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to implement the steps of:
acquiring an identity card image to be positioned;
preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame;
predicting fine tuning parameters by adopting a fine tuning model;
and adjusting the candidate detection frame according to the fine tuning parameters to obtain text position information.
The fine tuning model is obtained by training in a convolutional neural network after being deformed through a text box marked with the position of a text region.
In an embodiment, when the processor 502 performs the step of performing preliminary positioning of the determined text region on the identification card image to be positioned to obtain the candidate detection frame, the following steps are specifically implemented:
determining an identity card outer frame of an identity card image to be positioned;
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned so as to obtain text parameters;
determining position information of a text region according to the text parameters to form candidate text boxes;
and expanding the width and the height of the candidate text boxes to form candidate detection boxes.
In an embodiment, when the step of obtaining the relative position information of the external border of the identification card and the text area of the identification card image to be positioned to obtain the text parameter is implemented by the processor 502, the following steps are specifically implemented:
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned through a training set so as to obtain text parameters;
the training set is obtained by training a plurality of identity card pictures with the position information of the marked text region and the position information of the outer frame of the identity card.
In one embodiment, before implementing the step of predicting the trim parameters using the trim model, the processor 502 further implements the steps of:
and carrying out graying treatment on the candidate detection frames.
In one embodiment, after implementing the step of adjusting the candidate detection box according to the trimming parameter to obtain the text position information, the processor 502 further implements the following steps:
intercepting an identity card image to be positioned according to the text position information to form a text image;
and identifying the text image to obtain the identity card text.
It should be appreciated that in embodiments of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring an identity card image to be positioned;
preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame;
predicting fine tuning parameters by adopting a fine tuning model;
and adjusting the candidate detection frame according to the fine tuning parameters to obtain text position information.
The fine tuning model is obtained by training in a convolutional neural network after being deformed through a text box marked with the position of a text region.
In an embodiment, the processor performs the preliminary positioning of the text region determined by the to-be-positioned identification card image to obtain a candidate detection frame when executing the computer program, and specifically performs the following steps:
determining an identity card outer frame of an identity card image to be positioned;
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned so as to obtain text parameters;
determining position information of a text region according to the text parameters to form candidate text boxes;
and expanding the width and the height of the candidate text boxes to form candidate detection boxes.
In an embodiment, when the processor executes the computer program to obtain the relative position information of the external border of the identification card and the text area of the image of the identification card to be positioned, the processor specifically realizes the following steps:
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned through a training set so as to obtain text parameters;
the training set is obtained by training a plurality of identity card pictures with the position information of the marked text region and the position information of the outer frame of the identity card.
In an embodiment, before executing the computer program to implement the step of predicting trim parameters using a trim model, the processor further implements the steps of:
and carrying out graying treatment on the candidate detection frames.
In one embodiment, after executing the computer program to implement the step of adjusting the candidate detection box according to the fine tuning parameter to obtain text position information, the processor further implements the steps of:
intercepting an identity card image to be positioned according to the text position information to form a text image;
and identifying the text image to obtain the identity card text.
The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (9)

1. The text positioning method for frame fine adjustment is characterized by comprising the following steps:
acquiring an identity card image to be positioned;
preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame;
predicting fine tuning parameters by adopting a fine tuning model;
according to the fine tuning parameters, adjusting candidate detection frames to obtain text position information;
the fine tuning model is a model obtained by training in a convolutional neural network after the text box marked with the text region position is deformed;
the parameters of the trained convolutional neural network are as follows:
input layer: 150X 50X 1;
convolution layer 1:150×50×1×64;
pooling layer 1:75×25×1×64;
convolution layer 2:75×25×1×128;
pooling layer 2:38×13×1×128;
convolution layer 3:38×13×1×256;
pooling layer 3:19×7×1×256;
convolution layer 4: 19X 7X 1X 512;
pooling layer 4:10×4×1×512;
full tie layer: 4000;
output layer: 4, a step of;
randomly shaking the rectangular frame of the marked text region up and down and left and right and transforming the scale by taking the position of the rectangular frame of the marked text region as a reference to obtain intercepting regions at different positions, sending the intercepting regions into a convolution network for training to obtain an output offset value, and adjusting the parameter value of the convolution neural network by utilizing the loss value and the position of the rectangular frame of the actually marked text region so that the output offset value is close to 0, wherein the convolution neural network is a fine tuning model;
in the training process, the address bar has less than three rows, so that the blank position in the address bar can intercept some pictures to send into the training, and the labeling value corresponding to the starting point x coordinate, the starting point y coordinate, the ending point x coordinate and the ending point y coordinate of four parameters is 0.
2. The method for positioning text in fine adjustment of a frame according to claim 1, wherein the performing preliminary positioning of the determined text region on the identification card image to be positioned to obtain the candidate detection frame includes:
determining an identity card outer frame of an identity card image to be positioned;
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned so as to obtain text parameters;
determining position information of a text region according to the text parameters to form candidate text boxes;
and expanding the width and the height of the candidate text boxes to form candidate detection boxes.
3. The method for positioning text by trimming a border according to claim 2, wherein the obtaining the relative position information of the external border of the identification card and the text region of the identification card image to be positioned to obtain the text parameter comprises:
acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned through a training set so as to obtain text parameters;
the training set is obtained by training a plurality of identity card pictures with the position information of the marked text region and the position information of the outer frame of the identity card.
4. The method for positioning text in a frame trim according to claim 1, wherein before the trim model is used to predict trim parameters, the method comprises:
and carrying out graying treatment on the candidate detection frames.
5. The method for positioning text in fine adjustment of a frame according to any one of claims 1 to 4, wherein after the candidate detection frames are adjusted according to the fine adjustment parameters to obtain text position information, further comprising:
intercepting an identity card image to be positioned according to the text position information to form a text image;
and identifying the text image to obtain the identity card text.
6. Frame fine setting's text positioning device, its characterized in that includes:
the image acquisition unit is used for acquiring an identity card image to be positioned;
the candidate frame forming unit is used for carrying out preliminary positioning on the text area determined by the identity card image to be positioned so as to obtain a candidate detection frame;
the parameter prediction unit is used for predicting fine tuning parameters by adopting a fine tuning model;
the adjustment unit is used for adjusting the candidate detection frames according to the fine adjustment parameters so as to obtain text position information;
the fine tuning model is a model obtained by training in a convolutional neural network after the text box marked with the text region position is deformed;
the parameters of the trained convolutional neural network are as follows:
input layer: 150X 50X 1;
convolution layer 1:150×50×1×64;
pooling layer 1:75×25×1×64;
convolution layer 2:75×25×1×128;
pooling layer 2:38×13×1×128;
convolution layer 3:38×13×1×256;
pooling layer 3:19×7×1×256;
convolution layer 4: 19X 7X 1X 512;
pooling layer 4:10×4×1×512;
full tie layer: 4000;
output layer: 4, a step of;
randomly shaking the rectangular frame of the marked text region up and down and left and right and transforming the scale by taking the position of the rectangular frame of the marked text region as a reference to obtain intercepting regions at different positions, sending the intercepting regions into a convolution network for training to obtain an output offset value, and adjusting the parameter value of the convolution neural network by utilizing the loss value and the position of the rectangular frame of the actually marked text region so that the output offset value is close to 0, wherein the convolution neural network is a fine tuning model;
in the training process, the address bar has less than three rows, so that the blank position in the address bar can intercept some pictures to send into the training, and the labeling value corresponding to the starting point x coordinate, the starting point y coordinate, the ending point x coordinate and the ending point y coordinate of four parameters is 0.
7. The frame fine-tuning text positioning apparatus according to claim 6, wherein the candidate frame forming unit includes:
the outer frame determining subunit is used for determining an identity card outer frame of an identity card image to be positioned;
the parameter forming subunit is used for acquiring the relative position information of the outer frame of the identity card and the text area of the identity card image to be positioned so as to obtain text parameters;
a position information determining subunit, configured to determine position information of the text region according to the text parameter, so as to form a candidate text box;
and the expansion subunit is used for expanding the width and the height of the candidate text boxes to form candidate detection boxes.
8. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-5.
9. A storage medium storing a computer program which, when executed by a processor, performs the method of any one of claims 1 to 5.
CN201910214068.2A 2019-03-20 2019-03-20 Frame fine adjustment text positioning method and device, computer equipment and storage medium Active CN109977949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910214068.2A CN109977949B (en) 2019-03-20 2019-03-20 Frame fine adjustment text positioning method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910214068.2A CN109977949B (en) 2019-03-20 2019-03-20 Frame fine adjustment text positioning method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109977949A CN109977949A (en) 2019-07-05
CN109977949B true CN109977949B (en) 2024-01-26

Family

ID=67079720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910214068.2A Active CN109977949B (en) 2019-03-20 2019-03-20 Frame fine adjustment text positioning method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109977949B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516541B (en) * 2019-07-19 2022-06-10 金蝶软件(中国)有限公司 Text positioning method and device, computer readable storage medium and computer equipment
CN110738238B (en) * 2019-09-18 2023-05-26 平安科技(深圳)有限公司 Classification positioning method and device for certificate information
CN112749529A (en) * 2019-10-29 2021-05-04 西安诺瓦星云科技股份有限公司 Method and device for character self-adaption special-shaped edit box
CN111178346B (en) * 2019-11-22 2023-12-08 京东科技控股股份有限公司 Text region positioning method, text region positioning device, text region positioning equipment and storage medium
CN112836696A (en) * 2019-11-22 2021-05-25 搜狗(杭州)智能科技有限公司 Text data detection method and device and electronic equipment
SG10202001222VA (en) * 2020-02-11 2021-04-29 Alipay Labs Singapore Pte Ltd A system suitable for detecting an identification card, and an apparatus and a processing method in association thereto
CN113469161A (en) * 2020-03-31 2021-10-01 顺丰科技有限公司 Method, device and storage medium for processing logistics list
CN111598091A (en) * 2020-05-20 2020-08-28 北京字节跳动网络技术有限公司 Image recognition method and device, electronic equipment and computer readable storage medium
CN112232336A (en) * 2020-09-02 2021-01-15 深圳前海微众银行股份有限公司 Certificate identification method, device, equipment and storage medium
CN112987994A (en) * 2021-03-31 2021-06-18 维沃移动通信有限公司 Frame selection annotation method, frame selection annotation device, electronic equipment and storage medium
CN113111839A (en) * 2021-04-25 2021-07-13 上海商汤智能科技有限公司 Behavior recognition method and device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679442A (en) * 2017-06-23 2018-02-09 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of document Data Enter
CN108229397A (en) * 2018-01-04 2018-06-29 华南理工大学 Method for text detection in image based on Faster R-CNN
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
CN109308476A (en) * 2018-09-06 2019-02-05 邬国锐 Billing information processing method, system and computer readable storage medium
CN109448007A (en) * 2018-11-02 2019-03-08 北京迈格威科技有限公司 Image processing method, image processing apparatus and storage medium
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679442A (en) * 2017-06-23 2018-02-09 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of document Data Enter
CN108229397A (en) * 2018-01-04 2018-06-29 华南理工大学 Method for text detection in image based on Faster R-CNN
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
CN109308476A (en) * 2018-09-06 2019-02-05 邬国锐 Billing information processing method, system and computer readable storage medium
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109448007A (en) * 2018-11-02 2019-03-08 北京迈格威科技有限公司 Image processing method, image processing apparatus and storage medium

Also Published As

Publication number Publication date
CN109977949A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109977949B (en) Frame fine adjustment text positioning method and device, computer equipment and storage medium
EP3537378B1 (en) Image processing apparatus and method for object boundary stabilization in an image of a sequence of images
US20200074646A1 (en) Method for obtaining image tracking points and device and storage medium thereof
CN109961040B (en) Identity card area positioning method and device, computer equipment and storage medium
CN109784250B (en) Positioning method and device of automatic guide trolley
CN109426789B (en) Hand and image detection method and system, hand segmentation method, storage medium and device
KR20150116833A (en) Image processor with edge-preserving noise suppression functionality
CN111757014B (en) Focal length adjusting method, device, equipment and storage medium of network camera
CN110490204B (en) Image processing method, image processing device and terminal
CN110475078B (en) Camera exposure time adjusting method and terminal equipment
CN110969046B (en) Face recognition method, face recognition device and computer-readable storage medium
CN107356213B (en) Optical filter concentricity measuring method and terminal equipment
US20220351413A1 (en) Target detection method, computer device and non-transitory readable storage medium
CN111080665B (en) Image frame recognition method, device, equipment and computer storage medium
JP2021086616A (en) Method for extracting effective region of fisheye image based on random sampling consistency
WO2021056501A1 (en) Feature point extraction method, movable platform and storage medium
CN114862897A (en) Image background processing method and device and electronic equipment
CN111738272A (en) Target feature extraction method and device and electronic equipment
WO2024082925A1 (en) Surface defect data enhancement method and apparatus, and electronic device and storage medium
CN104933430B (en) A kind of Interactive Image Processing method and system for mobile terminal
CN112580638B (en) Text detection method and device, storage medium and electronic equipment
CN115760578A (en) Image processing method and device, electronic equipment and storage medium
CN110874814A (en) Image processing method, image processing device and terminal equipment
CN112052859B (en) License plate accurate positioning method and device in free scene
JP2019020839A (en) Image processing apparatus, image processing method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen Huafu Technology Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: SHENZHEN HUAFU INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant