CN109977949B

CN109977949B - Frame fine adjustment text positioning method and device, computer equipment and storage medium

Info

Publication number: CN109977949B
Application number: CN201910214068.2A
Authority: CN
Inventors: 张欢; 李爱林; 周先得
Original assignee: Shenzhen Huafu Technology Co ltd
Current assignee: Shenzhen Huafu Technology Co ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2024-01-26
Anticipated expiration: 2039-03-20
Also published as: CN109977949A

Abstract

The invention relates to a text positioning method, a device, a computer device and a storage medium for frame fine adjustment, wherein the method comprises the steps of obtaining an identity card image to be positioned; preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame; predicting fine tuning parameters by adopting a fine tuning model; and adjusting the candidate detection frame according to the fine tuning parameters to obtain text position information. According to the invention, after the outer frame is determined aiming at the acquired identity card image to be positioned, the candidate detection frames of the text region are determined, the positions of the candidate detection frames are finely adjusted by utilizing the fine adjustment model to obtain the accurate text frame, the text region positioning is carried out after the priori information of the identity card text region distribution is fully utilized, the fine adjustment model is easy to converge, small and exquisite and fast, the network training is easy to converge, the positioning speed is fast and the accuracy is high.

Description

Frame fine adjustment text positioning method and device, computer equipment and storage medium

Technical Field

The invention relates to an identification card recognition method, in particular to a text positioning method, a device, computer equipment and a storage medium for frame fine adjustment.

Background

An identification card is a certificate for proving the identity of a bearer, and is often issued to citizens by governments of various countries or regions. It will be used as a proof tool for the unique citizen identity of each person, and the identity card is attached with text information which generally indicates the identity information of the corresponding person. The text positioning of the identity card is a key part in an identity card recognition algorithm, and whether the text position positioning directly influences the effect of character recognition or not.

The existing identification card text positioning method is to perform text positioning by using a traditional image recognition method, such as denoising an image, graying, binarizing, contour extraction, morphological transformation and other methods to determine the text position of the identification card. The method has low accuracy and is not suitable for commercial use. Another positioning method is to utilize deep learning technology to perform text positioning, the method is mainly divided into two modes, namely end-to-end text positioning, namely text line positions required to be positioned by an identity card in a picture are directly output through a network, network training is not easy to converge under a copying scene, and false recognition of text lines outside an identity card area is easy to occur; the other way is to detect the identification card area first, then locate the text in the area, and detect the identification card area, and generally, use the commonly used object detection network such as Faster RCNN, yolo, SSD, etc. to locate the area and try to correct the angle. There are two methods for positioning the text in the identification card area, one is to consider that the current area is limited to the identification card area, and the positioning can be performed by using the traditional image processing method, but the traditional method is difficult to obtain better effects even when encountering the situations of stains, uneven illumination, shielding and the like, and the other is to continuously use the object detection network and even the proprietary text line detection network to perform text positioning.

Therefore, a new method is necessary to be designed, the network training is easy to converge, the positioning speed is high, and the accuracy is high.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a text positioning method, a device, computer equipment and a storage medium for frame fine adjustment.

In order to achieve the above purpose, the present invention adopts the following technical scheme: the text positioning method for frame fine adjustment comprises the following steps:

acquiring an identity card image to be positioned;

preliminary positioning of a text area is determined on the identity card image to be positioned so as to obtain a candidate detection frame;

predicting fine tuning parameters by adopting a fine tuning model;

and adjusting the candidate detection frame according to the fine tuning parameters to obtain text position information.

The further technical scheme is as follows: the preliminary positioning of the text region is determined for the identity card image to be positioned to obtain a candidate detection frame, which comprises the following steps:

determining an identity card outer frame of an identity card image to be positioned;

acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned so as to obtain text parameters;

determining position information of a text region according to the text parameters to form candidate text boxes;

and expanding the width and the height of the candidate text boxes to form candidate detection boxes.

The further technical scheme is as follows: the obtaining the relative position information of the outer frame of the identity card and the text area of the image of the identity card to be positioned to obtain text parameters comprises the following steps:

acquiring relative position information of an identity card outer frame and a text area of an identity card image to be positioned through a training set so as to obtain text parameters;

the training set is obtained by training a plurality of identity card pictures with the position information of the marked text region and the position information of the outer frame of the identity card.

The further technical scheme is as follows: before the fine tuning model is adopted to predict the fine tuning parameters, the method comprises the following steps:

and carrying out graying treatment on the candidate detection frames.

The further technical scheme is as follows: the fine tuning model is a model obtained by training in a convolutional neural network after the text box marked with the text region position is deformed.

The further technical scheme is as follows: after the candidate detection frame is adjusted according to the fine tuning parameter to obtain the text position information, the method further comprises the following steps:

intercepting an identity card image to be positioned according to the text position information to form a text image;

and identifying the text image to obtain the identity card text.

The invention also provides a text positioning device for frame fine adjustment, which comprises:

the image acquisition unit is used for acquiring an identity card image to be positioned;

the candidate frame forming unit is used for carrying out preliminary positioning on the text area determined by the identity card image to be positioned so as to obtain a candidate detection frame;

the parameter prediction unit is used for predicting fine tuning parameters by adopting a fine tuning model;

and the adjusting unit is used for adjusting the candidate detection frame according to the fine adjustment parameters so as to obtain text position information.

The further technical scheme is as follows: the candidate frame forming unit includes:

the outer frame determining subunit is used for determining an identity card outer frame of an identity card image to be positioned;

the parameter forming subunit is used for acquiring the relative position information of the outer frame of the identity card and the text area of the identity card image to be positioned so as to obtain text parameters;

a position information determining subunit, configured to determine position information of the text region according to the text parameter, so as to form a candidate text box;

and the expansion subunit is used for expanding the width and the height of the candidate text boxes to form candidate detection boxes.

The invention also provides a computer device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the method when executing the computer program.

The present invention also provides a storage medium storing a computer program which, when executed by a processor, performs the above-described method.

Compared with the prior art, the invention has the beneficial effects that: according to the invention, after the outer frame is determined aiming at the acquired identity card image to be positioned, the candidate detection frames of the text region are determined, the positions of the candidate detection frames are finely adjusted by utilizing the fine adjustment model to obtain the accurate text frame, the text region positioning is carried out after the priori information of the identity card text region distribution is fully utilized, the fine adjustment model is easy to converge, small and exquisite and fast, the network training is easy to converge, the positioning speed is fast and the accuracy is high.

The invention is further described below with reference to the drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scenario schematic diagram of a text positioning method for frame fine adjustment provided by an embodiment of the present invention;

fig. 2 is a flow chart of a text positioning method for frame fine adjustment according to an embodiment of the present invention;

fig. 3 is a schematic sub-flowchart of a text positioning method for frame fine adjustment according to an embodiment of the present invention;

FIG. 4 is a schematic diagram I of a candidate detection frame according to an embodiment of the present invention;

FIG. 5 is a second schematic diagram of a candidate detection frame according to an embodiment of the present invention;

FIG. 6 is a third schematic diagram of a candidate detection frame according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an expanded candidate text box according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of text parameters provided by an embodiment of the present invention;

FIG. 9 is a schematic diagram of training of a fine tuning model according to the present invention;

fig. 10 is a flowchart illustrating a text positioning method for bezel trimming according to another embodiment of the present invention;

FIG. 11 is a schematic block diagram of a text positioning device for bezel fine tuning provided by an embodiment of the present invention;

FIG. 12 is a schematic block diagram of a text positioning device for bezel trimming according to another embodiment of the present invention;

fig. 13 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a text positioning method for frame fine adjustment according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a text positioning method for bezel trimming according to an embodiment of the present invention. The frame fine-tuning text positioning method is applied to a server, the server interacts with a terminal, an identity card image to be positioned is obtained from the terminal, then a text region of the identity card image to be positioned is preliminarily determined, and a candidate detection frame for defining the text region is fine-tuned to obtain a high-accuracy identity card text.

Fig. 2 is a flowchart illustrating a text positioning method for bezel fine adjustment according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S150.

S110, acquiring an identity card image to be positioned.

In this embodiment, the image of the identification card to be located refers to an image with the identification card and a background, and is usually obtained by shooting by a terminal with a camera function.

S120, preliminary positioning of a text area is determined on the identity card image to be positioned, so that a candidate detection frame is obtained.

In this embodiment, the candidate detection box refers to a circumscribed polygonal box having an area larger than the text region and internally enclosing the text region, as shown in fig. 4.

In one embodiment, referring to fig. 3, the step S120 may include steps S121 to S124.

S121, determining the outer frame of the identity card image to be positioned.

In this embodiment, the external frame of the identification card is generally obtained by performing area location and then performing angle correction by using a commonly used object detection network such as Faster RCNN, yolo, SSD, and the like.

S122, acquiring relative position information of an outer frame of the identity card and a text area of the identity card image to be positioned so as to obtain text parameters.

In this embodiment, the text parameter refers to the relative position of the outer frame of the identification card and the text region.

Specifically, acquiring relative position information of an identity card outer frame and a text region of an identity card image to be positioned through a training set to obtain text parameters;

And sending a batch of identification card sample pictures marked with the position information of the text areas to an object detection network, obtaining the relative position information of each text area and the external frame in each identification card sample picture by combining the position information of the marked text areas according to the position information of the external frame of the identification card obtained by the object detection network, wherein the external frame of the identification card is a horizontal rectangular frame, and the frame of each text area is also a horizontal rectangular frame. The rectangular box of each text region is described by four parameters: start point x coordinate, start point y coordinate, end point x coordinate, end point y coordinate.

Taking a rectangular frame of the name text region as an example, let the width of the rectangular frame of the outer frame be width, the height be height, the starting point coordinates of the outer frame be (x_zero, y_zero), the starting point coordinates of the rectangular frame of the name text region be (x_nameStart, y_nameStart), the relative position information of the starting point coordinates of the rectangular frame of the name text region be ((x_nameStart-x_zero)/width, (y_nameStart-y_zero)/height). The position information of the terminal point coordinates can be obtained similarly, rectangular frames of text areas at other positions are similar, and finally, the text parameters can be obtained by averaging relative position data obtained by all the identity card samples.

S123, determining the position information of the text area according to the text parameters to form a candidate text box.

In this embodiment, the candidate text box refers to a rectangular box with text information.

And obtaining the position information of the candidate text box according to the text parameters and the identity card outer box, as shown in fig. 6 and 7.

S124, expanding the width and the height of the candidate text boxes to form candidate detection boxes.

As can be seen from fig. 6 and fig. 7, a certain error exists in the candidate text box, and a situation that part of text information is not in the candidate text box occurs under certain conditions, so that the center position of the candidate text box needs to be fixed, and the width and the height of the candidate text box are expanded in a certain proportion, so that the text information to be positioned is ensured to be in the candidate detection box, and the positioning accuracy is improved. The candidate detection frames are arranged according to a certain sequence, and specific attributes of the candidate detection frames can be known according to the sequence, so that the attributes of the candidate detection frames do not need to be classified.

S130, performing gray scale processing on the candidate detection frames.

In this embodiment, in the RGB model, if r=g=b, the color represents a gray color, where the value of r=g=b is called a gray value, and therefore, the gray image only needs one byte for storing the gray value (also called intensity value, luminance value) per pixel, and the gray range is 0 to 255. The four methods of the component maximum value method and the average value method and the weighted average method are used for graying the color image, so that the gradient characteristics of the image can be better identified, and the accuracy of positioning the whole text can be improved.

S140, predicting the fine tuning parameters by adopting a fine tuning model.

In the present embodiment, the fine adjustment model functions to correct the position of the candidate detection frame. The fine tuning model is obtained by inputting the deformed text box marked with the text region position into a convolutional neural network for training.

Since the candidate detection frame is expanded and needs to be accurately scaled, scaling a rectangular frame position needs four parameters, as shown in fig. 8, offset parameters of x and y coordinates of a start point and an end point of the candidate detection frame relative to a start point of a standard frame of a text region need to be output, and the length and width of the candidate detection frame are normalized to be 1.

The input of the network is the candidate detection frame after the expansion and graying treatment, the output is four floating point numerical values, and the offset values of the rectangular frame starting point and the end point coordinates of the positioned text area relative to the candidate detection frame starting point are represented.

The parameters of the trained convolutional neural network are as follows:

input layer: 150×50×1 (input grayscaled picture);

convolution layer 1:150×50×1×64 (5×5 convolution);

pooling layer 1:75×25×1×64 (2×2 steps);

convolution layer 2:75×25×1×128 (5×5 convolution);

pooling layer 2:38×13×1×128 (2×2 steps);

convolution layer 3:38×13×1×256 (3×3 convolution);

pooling layer 3:19×7×1×256 (2×2 steps);

convolution layer 4:19×7×1×512 (3×3 convolution);

pooling layer 4:10×4×1×512 (2×2 steps);

full tie layer: 4000;

output layer: 4, a step of;

in order to train the convolutional neural network, the positions of rectangular frames of marked text areas are used as references, the rectangular frames of the marked text areas are randomly rocked up and down and left and right and the scales are transformed, as shown in fig. 9, intercepting areas at different positions are obtained, the intercepting areas are sent into the convolutional network to train, so that output offset values are obtained, the parameter values of the convolutional neural network are adjusted by utilizing the loss values and the positions of the rectangular frames of the actually marked text areas, so that the output offset values are close to 0, and the convolutional neural network is the fine tuning model. In particular, in practice, there are fewer than three rows in the address field, so that some pictures can be intercepted in the address field at the blank position and sent to training, and the labeling values corresponding to the four parameters should be 0. The network has simple structure, small input size, centralized input information, easy network convergence and good output. The model has small volume and higher operation speed than the general model.

And S150, adjusting the candidate detection frames according to the fine adjustment parameters to obtain text position information.

In the present embodiment, the text position information refers to four vertex coordinate information of a rectangular frame of the text region.

Specifically, the positions of four vertexes of the candidate detection frame are sequentially moved according to the offset according to the fine tuning parameters output by the fine tuning model.

When the text position information is predicted actually, the candidate text boxes of each text area are obtained by means of the positioned identity card outer boxes, the height and the width of each candidate text box are expanded and then sequentially fed into the fine adjustment model, the candidate detection boxes are correspondingly adjusted according to the output value of the fine adjustment model, and then the accurate text box area can be obtained, and if the output value of the fine adjustment model is close to 0, the text line does not exist.

According to the frame fine-tuning text positioning method, after the outer frame is determined aiming at the acquired identity card image to be positioned, the candidate detection frames of the text areas are determined, fine tuning is carried out on the positions of the candidate detection frames by utilizing the fine tuning model to obtain the accurate text frames, after prior information of the identity card text area distribution is fully utilized, the text area positioning is carried out, the fine tuning model is easy to converge, small and exquisite and high in speed, the network training is easy to converge, the positioning speed is high, and the accuracy is high.

Fig. 10 is a flowchart of a text positioning method for bezel trimming according to another embodiment of the present invention. As shown in fig. 10, the text positioning method for bezel fine adjustment in the present embodiment includes steps S210 to S270. Steps S210 to S250 are similar to steps S110 to S150 in the above embodiment, and are not described herein. Steps S260 to S270 added in the present embodiment are described in detail below.

S260, intercepting the identity card image to be positioned according to the text position information to form a text image.

Cutting the identity card graph to be positioned according to the text position information to obtain an image only containing the identity card text.

S270, identifying the text image to obtain the identity card text.

In this embodiment, an optical character recognition technology may be used to recognize the text image to obtain an id card text, and the id card text is output to a terminal for display.

Fig. 11 is a schematic block diagram of a text positioning device 300 for bezel trimming according to an embodiment of the present invention. As shown in fig. 11, the present invention further provides a text positioning device 300 with fine-tuning frame, corresponding to the above text positioning method with fine-tuning frame. The frame fine-tuning text positioning apparatus 300 includes a unit for performing the frame fine-tuning text positioning method described above, and may be configured in a server.

Specifically, referring to fig. 11, the text positioning device 300 with fine-tuned frame includes:

an image acquisition unit 301, configured to acquire an image of an identification card to be positioned;

the candidate frame forming unit 302 is configured to perform preliminary positioning of the determined text region on the identification card image to be positioned, so as to obtain a candidate detection frame;

a parameter prediction unit 304, configured to predict a trimming parameter by using a trimming model;

and the adjusting unit 305 is configured to adjust the candidate detection frame according to the fine tuning parameter to obtain text position information.

In one embodiment, as shown in fig. 6, the candidate frame forming unit 302 includes:

In an embodiment, the apparatus further includes:

and a graying processing unit 303, configured to perform graying processing on the candidate detection frame.

Fig. 12 is a schematic block diagram of a text positioning device 300 with bezel trimming according to another embodiment of the present invention. As shown in fig. 12, the text positioning device 300 for trimming a frame according to the present embodiment is added with a clipping unit 306 and a recognition unit 307 on the basis of the above embodiment.

The clipping unit 306 is used for clipping the identity card image to be positioned according to the text position information to form a text image;

the recognition unit 307 is configured to recognize the text image to obtain an identity card text.

It should be noted that, as those skilled in the art can clearly understand, the specific implementation process of the text positioning device 300 and each unit for frame trimming may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, the description is omitted here.

The above-described frame fine-tuning text positioning apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 13.

Referring to fig. 13, fig. 13 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server.

With reference to FIG. 13, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a text positioning method for bezel trimming.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a frame fine-tuning text positioning method.

The network interface 505 is used for network communication with other devices. It will be appreciated by those skilled in the art that the structure shown in fig. 13 is merely a block diagram of some of the structures associated with the present application and does not constitute a limitation of the computer device 500 to which the present application is applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to implement the steps of:

acquiring an identity card image to be positioned;

predicting fine tuning parameters by adopting a fine tuning model;

The fine tuning model is obtained by training in a convolutional neural network after being deformed through a text box marked with the position of a text region.

In an embodiment, when the processor 502 performs the step of performing preliminary positioning of the determined text region on the identification card image to be positioned to obtain the candidate detection frame, the following steps are specifically implemented:

In an embodiment, when the step of obtaining the relative position information of the external border of the identification card and the text area of the identification card image to be positioned to obtain the text parameter is implemented by the processor 502, the following steps are specifically implemented:

In one embodiment, before implementing the step of predicting the trim parameters using the trim model, the processor 502 further implements the steps of:

and carrying out graying treatment on the candidate detection frames.

In one embodiment, after implementing the step of adjusting the candidate detection box according to the trimming parameter to obtain the text position information, the processor 502 further implements the following steps:

and identifying the text image to obtain the identity card text.

It should be appreciated that in embodiments of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of:

acquiring an identity card image to be positioned;

predicting fine tuning parameters by adopting a fine tuning model;

In an embodiment, the processor performs the preliminary positioning of the text region determined by the to-be-positioned identification card image to obtain a candidate detection frame when executing the computer program, and specifically performs the following steps:

In an embodiment, when the processor executes the computer program to obtain the relative position information of the external border of the identification card and the text area of the image of the identification card to be positioned, the processor specifically realizes the following steps:

In an embodiment, before executing the computer program to implement the step of predicting trim parameters using a trim model, the processor further implements the steps of:

and carrying out graying treatment on the candidate detection frames.

In one embodiment, after executing the computer program to implement the step of adjusting the candidate detection box according to the fine tuning parameter to obtain text position information, the processor further implements the steps of:

and identifying the text image to obtain the identity card text.

The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The text positioning method for frame fine adjustment is characterized by comprising the following steps:

acquiring an identity card image to be positioned;

predicting fine tuning parameters by adopting a fine tuning model;

according to the fine tuning parameters, adjusting candidate detection frames to obtain text position information;

the fine tuning model is a model obtained by training in a convolutional neural network after the text box marked with the text region position is deformed;

the parameters of the trained convolutional neural network are as follows:

input layer: 150X 50X 1;

convolution layer 1:150×50×1×64;

pooling layer 1:75×25×1×64;

convolution layer 2:75×25×1×128;

pooling layer 2:38×13×1×128;

convolution layer 3:38×13×1×256;

pooling layer 3:19×7×1×256;

convolution layer 4: 19X 7X 1X 512;

pooling layer 4:10×4×1×512;

full tie layer: 4000;

output layer: 4, a step of;

randomly shaking the rectangular frame of the marked text region up and down and left and right and transforming the scale by taking the position of the rectangular frame of the marked text region as a reference to obtain intercepting regions at different positions, sending the intercepting regions into a convolution network for training to obtain an output offset value, and adjusting the parameter value of the convolution neural network by utilizing the loss value and the position of the rectangular frame of the actually marked text region so that the output offset value is close to 0, wherein the convolution neural network is a fine tuning model;

in the training process, the address bar has less than three rows, so that the blank position in the address bar can intercept some pictures to send into the training, and the labeling value corresponding to the starting point x coordinate, the starting point y coordinate, the ending point x coordinate and the ending point y coordinate of four parameters is 0.

2. The method for positioning text in fine adjustment of a frame according to claim 1, wherein the performing preliminary positioning of the determined text region on the identification card image to be positioned to obtain the candidate detection frame includes:

3. The method for positioning text by trimming a border according to claim 2, wherein the obtaining the relative position information of the external border of the identification card and the text region of the identification card image to be positioned to obtain the text parameter comprises:

4. The method for positioning text in a frame trim according to claim 1, wherein before the trim model is used to predict trim parameters, the method comprises:

and carrying out graying treatment on the candidate detection frames.

5. The method for positioning text in fine adjustment of a frame according to any one of claims 1 to 4, wherein after the candidate detection frames are adjusted according to the fine adjustment parameters to obtain text position information, further comprising:

and identifying the text image to obtain the identity card text.

6. Frame fine setting's text positioning device, its characterized in that includes:

the adjustment unit is used for adjusting the candidate detection frames according to the fine adjustment parameters so as to obtain text position information;

the parameters of the trained convolutional neural network are as follows:

input layer: 150X 50X 1;

convolution layer 1:150×50×1×64;

pooling layer 1:75×25×1×64;

convolution layer 2:75×25×1×128;

pooling layer 2:38×13×1×128;

convolution layer 3:38×13×1×256;

pooling layer 3:19×7×1×256;

convolution layer 4: 19X 7X 1X 512;

pooling layer 4:10×4×1×512;

full tie layer: 4000;

output layer: 4, a step of;

7. The frame fine-tuning text positioning apparatus according to claim 6, wherein the candidate frame forming unit includes:

8. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-5.

9. A storage medium storing a computer program which, when executed by a processor, performs the method of any one of claims 1 to 5.