WO2022121039A1

WO2022121039A1 - Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal

Info

Publication number: WO2022121039A1
Application number: PCT/CN2020/141443
Authority: WO
Inventors: 王晓亮; 陈建良; 田丰; 王丹丹; 吴昌宇
Original assignee: 广州广电运通金融电子股份有限公司
Priority date: 2020-12-10
Filing date: 2020-12-30
Publication date: 2022-06-16
Also published as: CN112686812A; CN112686812B

Abstract

A bankcard tilt correction (BTC)-based detection method and apparatus, a readable storage medium, and a terminal. By using BTC technology in combination with deep learning technology and a conventional image processing method, the advantages of the two are fully integrated, and for a wide variety of user input images having complex scenes, high-accuracy and high-robustness certificate segmentation and correction results can be obtained, thereby providing a foundation for subsequent certificate detection, classification, and information extraction, and improving the application range of certificate recognition; the present invention can be widely applied in the fields of security, finance, and the like.

Description

Bank card tilt correction detection method, device, readable storage medium and terminal

technical field

The invention relates to the technical field of information detection or intelligent vision, in particular to a bank card tilt correction detection method, device, readable storage medium and terminal.

Background technique

For document image recognition, it is necessary to quickly and efficiently identify identity information in the fields of security, finance, and enterprise information management. In the early days, most of the information of ID cards required manual input, which was very inefficient, and the long-term identification process would also make people's eyes tired.

With the rise of artificial intelligence, image recognition technology is gradually applied in security, military, medical, intelligent transportation and other fields, and technologies such as face recognition and fingerprint recognition are increasingly used in public security, finance, aerospace and other security fields. In the military field, image recognition is mainly used in the reconnaissance and identification of targets, through automatic image recognition technology to identify and strike enemy targets; in the medical field, various medical image analysis and diagnosis can be carried out through image recognition technology, On the one hand, it can greatly reduce the cost of medical treatment, and on the other hand, it can also help to improve the quality and efficiency of medical care; in the field of transportation, it can not only perform license plate recognition, but also be applied to the cutting-edge field of autonomous driving to achieve a clear view of roads, vehicles and pedestrians. Identify, improve the convenience of life and reduce people's travel costs. Although technologies for automatic identification or automatic extraction of document information have emerged, for complex scenes, such as document misalignment in vision, uneven illumination, external light field interference, and debris coverage, the outline of the document and the border of the image background are blurred. , which is not conducive to the accurate extraction of the document boundary, resulting in reduced or failed document number detection efficiency. Some solutions for this have also emerged as follows.

Traditional method: use the edge detection algorithm, use the edge detection operator to locate the edge of the document, use the edge point line fitting to determine the information of the intersection of the document edge line and the edge straight line to determine the document deflection angle, rotate the document, and then use the image processing method to detect the document Number position, accurate detection of document edge points is the core step of this method, and the edge detection operator has high requirements on the complexity of the image background. The edge point detection of the certificate fails, so the detection of the certificate number cannot be realized.

Deep learning method: This method uses a large amount of labeled data to train the deep network in the model training stage, fits the network parameters, and realizes the modeling of the OCR (Optical Character Recognition, Optical Character Recognition) detection algorithm. The image is used as the input of the network, and the character region detection is realized through the network forward reasoning. This method is currently a popular character detection method, but for the identification number detection task, this method has the following defects: (1) The non-document area image also participates in the network reasoning process, which wastes computing resources on the one hand; False detection of characters in the region existence requires additional processing logic to be eliminated; (2) This scheme consumes more computing resources, and the training and reasoning time is longer than this proposal; (3) Due to the inexplicability of the neural network, this method The frame of the positioned character area cannot accurately locate the smallest bounding rectangle of the character, and even cuts off part of the character area. That is, the traditional optical recognition (OCR) technology of document images is mainly used for high-definition scanned images. This method requires the recognized images to have clean Background, use standard print and have high resolution. However, in natural scenes, there are problems such as large text background noise, irregular text distribution, and the influence of natural light sources. The detection rate of OCR technology in actual natural scenes is not ideal, and identification of documents such as documents brings pressure to the character recognition in the subsequent steps.

In addition, although AI technology has been applied to all walks of life, the use of intelligent terminal equipment to assist bank cards and other document shooting technologies has become very mature and popular, and can meet the needs of some practical application scenarios. However, the bank card detection and identification scenarios in the financial field , In the process of taking pictures, there are a lot of cases that the bank card is deformed due to improper operations, resulting in a decrease in the recognition accuracy and efficiency.

Based on the above situation, in the intelligent detection of bank cards (such as ID cards, work permits, etc.), it is impossible to respond quickly, accurately and efficiently according to the changes and complexity of actual application scenarios, that is, the diversification and complexity of practical application scenarios give modern The detection and identification of documents, such as bank cards, put forward higher requirements.

SUMMARY OF THE INVENTION

In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a bank card tilt correction detection method, device, readable storage medium and terminal, which can solve the above problems.

Design principle: The Bankcard Tilt Correction (BTC) technology is proposed. BTC combines deep learning technology and traditional image processing methods to fully integrate the advantages of the two. For a wide variety of user input images with complex scenes, you can get High accuracy and robustness of document segmentation and correction results.

A method for detecting bank card tilt correction under complex background, the method comprises the following steps:

The first step, model training: label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model;

The second step, the initial inspection of the document, uses the deep learning model to find the corresponding potential document area for the picture input through the image acquisition unit, and obtains a preliminary and rough document area mask;

The third step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.

Further, the first step of model training includes the following steps:

S11 Determine the certificate area, and find the certificate area in the picture of the original data through manual annotation;

S12 Vertex labeling and generating labels, labeling the four vertices of the document in the document area, and saving the coordinate positions of the vertices in the form of json files to generate labels;

S13 Count the size of the certificate, according to the generated annotation file, count the area size s of each certificate area, so as to serve the subsequent testing stage;

S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.

Further, in step S14, the input picture and the corresponding annotation file have the same size; and the json file is converted into a corresponding 0-1 binary mask before training, wherein the area with a pixel of 1 represents the certificate area, The area with pixel 0 represents the background area.

Further, the initial certificate inspection in the second step includes the following steps:

S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;

S22 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area;

S23 Threshold truncation, binarize the probability distribution map according to the preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map;

S24 rough segmentation mask, upsample the 0-1 mask image to the same size as the original input image, and obtain a preliminary document rough segmentation mask image;

S25 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask to pass the legal area Filtering will filter some error areas.

Further, in the third step of standardization, fine-grained mask correction is performed on the legal regions in the mask image filtered in the first step, including the following steps:

S31 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the document photo;

S32 obtains the convex hull of the contour, obtains the minimum convex hull of the contour on the basis of the original contour, fills in the missing area of the partial segmentation, and at the same time smoothes the edge of the contour;

S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull;

S34 Find the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, for the case where the two straight lines are parallel do not consider;

S35 Vertex legal screening, set the filtering conditions to check the legality of the vertices, the tolerance value tol is set in the filtering conditions, the abscissa [0-tol, width+tol] and the ordinate [0-tol, height+tol] are defined as legal Vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates (x _crosspoint , y _crosspoint ) to the edge of the original image, that is :

S36 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained;

S37 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined through the following steps: 1) Obtain the coordinates of the center point according to the coordinates of the four vertices; 2) Establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and use the minimum coordinate value The vertex of the sum is the upper left vertex, and the coordinate order is rearranged with the upper left vertex as the starting point, and arranged in the order of "upper left - upper right - lower right - lower left";

S38 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask;

S39 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I _output =WI _input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.

Further, in step S33, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.

Further, in step S36, the specific algorithm of K-means is:

1) Randomly select 4 cluster centroid points μ ₀ , μ ₁ , μ ₂ , μ ₃ ;

2) For each vertex coordinate (x _i , y _i ), by calculating the Euclidean distance with each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin _j | |(x _i ,y _i )-μ _j || ₂ ,j=0,1,2,3;

Among them, ||(x _i ,y _i )-μ _j || ₂ ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all the vertices of category j; argmin _j || (x _i , y _i )-μ _j || ₂ , j=0,1,2,3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest;

3) Recalculate the coordinates of the 4 centroids;

4) Repeat 2) and 3) process until convergence.

The invention also provides a certificate detection device, the device includes an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication; wherein, the acquisition input unit acquires the detection picture of the certificate to be detected and the information output unit through the camera assembly. The standard registration picture; the image processing unit processes the input picture through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, and the deducted original document. The image area and the corrected image after affine transformation correction; the information extraction unit, the category and information of the corrected image are corrected by the information extraction algorithm in the processor; the information output unit, the processor extracts the category and information results of the input picture on the display Display and store to memory.

The present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.

The present invention also provides a terminal, including a memory and a processor, the memory stores a registered picture and a computer instruction that can be run on the processor, and the processor executes the method of the foregoing method when the processor runs the computer instruction. step.

Compared with the prior art, the beneficial effect of the present invention is that: by combining the deep learning technology and the traditional image processing method with the bank card tilt correction technology (Bankcard Tilt Correction, BTC) of the present application, the advantages of the two are fully integrated, and the advantages of the two are fully integrated. With various user input images with complex scenes, high accuracy and robustness of document segmentation and correction results can be obtained, which provides a basis for subsequent document detection, classification and information extraction, and improves the application scope of document recognition. , finance and other fields can be widely used.

Description of drawings

Fig. 1 is the flow chart of the bank card tilt correction detection method under the complex background of the present invention;

Figure 2 is a schematic diagram of model training;

Figure 3 is a simplified flow chart of the BTC testing phase;

Fig. 4 is the method flow chart of the initial inspection of the certificate;

Figure 5 is a flow chart of document image standardization.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

first embodiment

A method for detecting bank card tilt correction under complex background is shown in Fig. 1-Fig. 5. The method includes the following steps.

The first step, model training: label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model.

The second step is the initial inspection of the document. For the picture input through the image acquisition unit, the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained.

model training

BTC relies on the powerful feature extraction ability of deep learning, so it needs to train related models before it is officially used. Referring to Figure 2, for a batch of raw data to be trained, first find the area of documents such as bank cards in the picture by manual annotation. Specifically, for each document in the picture, the four vertices of the document are marked, and the coordinate positions of the vertices are saved as a json file. Next, according to the generated annotation file, the area size s of each document area is counted, which is designed to serve the subsequent testing phase. It is verified by an example that the size of the document photo area in the original data conforms to the Gaussian distribution, namely: s～N(μ,σ ² ).

By counting the area of each document area, the mean μ and standard deviation σ of the Gaussian distribution are calculated.

Finally, the segmentation model is trained using the raw data and the generated annotation files. It is worth noting that in the specific training, it is necessary to keep the input image and the corresponding annotation file with the same size. Therefore, it is also necessary to convert the marked json file into a corresponding 0-1 binary mask map, in which the area with a pixel of 1 represents the document area, and the area with a pixel of 0 represents the background area.

Specifically, the model training steps of the first step are as follows.

S11 determines the certificate area, and finds the certificate area in the picture of the original data through manual annotation.

S12 Vertex labeling generates labels, labels the four vertices of the document in the document area, and saves the coordinate positions of the vertices in the form of json files to generate labels.

JSON (JavaScript Object Notation) is a lightweight data interchange format. Easy for humans to read and write. It is also easy to parse and generate by machine. It is based on JavaScript Programming Language, a subset of Standard ECMA-262 3rd Edition-December 1999. JSON is a sequence of tokens, consisting of six constructed characters, strings, numbers, and three literal names. Because of this, the coordinate annotation applied to this scheme can be well matched.

S13 Count the size of the certificate, and according to the generated annotation file, count the area size s of each certificate area, so as to serve the subsequent testing stage.

So far, the BTC training process has been implemented.

detection stage

The testing phase is divided into initial certificate inspection and standardization. BTC is a two-stage, coarse-to-fine segmentation optimization model (two-stage and coarse-to-fine refinement segmentation). As shown in Figure 3, in the first stage, the deep learning model is used to find the corresponding potential document area for the input picture, and a preliminary and relatively rough document area mask is obtained; in the second stage, using traditional image processing technology, The rough mask of the first stage is refined and corrected to obtain a high-quality document area mask, and the document photo is extracted from the original image by using the mask. Set the ID photo size.

The first stage is the initial inspection of documents. In the first stage, the goal of finding the document area is mainly completed by several sub-operations of extracting features, calculating probability, and threshold truncation, and finally obtains a preliminary rough segmentation mask. As shown in Figure 4, after the user inputs the picture, it is scaled to the input picture size suitable for the segmentation network, and then the classical Unet network model is used to extract the depth features of the input data; The two-class judgment is to obtain the probability value that the feature of each position belongs to the certificate area. So far, a probability distribution map belonging to the certificate area is obtained; then, the probability distribution map is binarized according to the preset threshold. operation, set the probability of being greater than the threshold to 1 and the probability of being less than the threshold to 0, and then upsample this 0-1 mask to the same size as the original input. So far, the first stage operation is completed, and a preliminary document segmentation mask map is obtained. The specific steps for the initial inspection of the certificate are as follows.

S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map.

S22 calculates the probability, performs binary classification judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains a probability distribution map belonging to the certificate area.

S23 Threshold truncation, binarize the probability distribution map according to a preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map.

S24 rough segmentation mask, the 0-1 mask image is upsampled to the same size as the original input image, and a preliminary document rough segmentation mask image is obtained.

Among them, the Unet network model belongs to the segmentation network. Unet draws on the FCN network, and its network structure includes two symmetrical parts: the first part of the network is the same as the ordinary convolutional network, using 3x3 convolution and pooling downsampling, which can capture The context information in the image (that is, the relationship between pixels); the latter part of the network is basically symmetrical with the former, using 3x3 convolution and upsampling to achieve the purpose of output image segmentation. In addition, feature fusion is also used in the network, and the features of the previous part of the downsampling network are fused with the features of the latter part of the upsampling part to obtain more accurate context information and achieve a better segmentation effect. Moreover, Unet uses a weighted softmax loss function, which has its own weight for each pixel, which makes the network pay more attention to the learning of edge pixels. Using this model is more suitable for the slight uneven change of the edge of the document which is not straight.

The second stage is standardization. On the basis of the first stage, the refinement mask refinement of the second stage is performed. As shown in Figure 5, all legal regions in the mask map obtained in the first stage must be corrected one by one. In the second step of standardization, for each legal document area, that is, the refined mask correction is performed on the legal area in the mask image after the screening in the first step, see FIG. 5 , including the following steps.

S31 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the ID photo.

When performing the following operations, first introduce a property to ensure the legality of the following operations.

Property definition: Convex sets are still convex sets after affine transformation. One of the good properties of ID photo is that it is a regular rectangular shape, which is a standard convex set. No matter what affine transformation the convex set undergoes in the collection stage, the properties of the convex set cannot be changed.

S32 obtains the convex hull of the contour, obtains the minimum convex hull of the contour on the basis of the original contour, fills in the missing area of the partial segmentation, and smoothes the edge of the contour at the same time.

Since the contour extraction in the previous step completely relies on the results of the segmentation model, some uneven edges are uneven, which is inconsistent with the nature of the ID photo. Therefore, the minimum convex hull of the contour is obtained on the basis of the original contour, and the missing area of the partial segmentation is filled, and the contour edge is smoother at the same time.

S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull. In a specific embodiment, in step S33, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.

Among them, Hough transform is a feature extraction, which is widely used in image analysis, computer vision and digital image processing. Extract features in objects, such as lines. This scheme uses it to accurately parse the defined document edge line.

S34 finds the vertices, reads all the legal straight lines in the straight line fitting to find the intersection points in pairs, so as to find the distribution range of the four vertices of the certificate photo. Specifically, all the legal straight lines detected in S33 can be straight lines. analytic expression. For all legal straight lines, read them to find the intersection points. This step is to find the distribution range of the four vertices of the ID photo. And in the process of finding the vertices, the case where the two lines are parallel is not considered.

S35 Vertex legal screening, among all the obtained vertices, not all vertices are legal. Therefore, setting filtering conditions to check the legality of vertices improves the accuracy and processing speed for subsequent steps. Specifically, a filter condition is set to check the legitimacy of the vertex. The tolerance value tol is set in the filter condition, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal vertex coordinates , where width and height represent the width and height of the original image. In a specific embodiment, the tolerance value tol is set to 50. And, if the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates (x _crosspoint , y _crosspoint ) to the edge of the original image, that is:

Among them, min(x _crosspoint , width) will make the maximum value of x _crosspoint not exceed the width of the original image, and the minimum value of max(min(x _crosspoint , width), 0) cannot be less than 0;

In the same way, min(y _crosspoint , height) will make the maximum value of y _{crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint} _, height), 0) cannot be less than 0.

S36 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained.

Among them, the specific algorithm of K-means is:

Among them, ||(x _i ,y _i )-μ _j || ₂ ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all the vertices of category j; argmin _j || (x _i , y _i )-μ _j || ₂ , j=0, 1, 2, 3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest.

3) Recalculate the coordinates of the 4 centroids;

4) Repeat 2) and 3) process until convergence.

Among them, K-means is the most commonly used clustering algorithm based on Euclidean distance, which is numerical, unsupervised, non-deterministic, and iterative, and the algorithm aims to minimize an objective function - the squared error function (all The sum of the distance between the observation point and its center point), it believes that the closer the distance between the two targets, the greater the similarity. Due to its excellent speed and good scalability, the Kmeans clustering algorithm can be regarded as the most famous clustering algorithm method.

S37 Vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps:

1) Obtain the coordinates of the center point according to the coordinates of the four vertices;

2) Establish a polar coordinate system with the center point, and construct a vector pointing from the center point to each vertex, and obtain the angle between each vector and the polar axis in turn;

3) Sort the four vertices according to the size of the included angle from large to small;

4) Find the upper left corner of the document area, and start from the upper left corner, and arrange them in the order of "upper left - upper right - lower right - lower left".

Wherein, in step 4) of step S37, the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex of the sum of the smallest coordinate value is the upper left vertex, and the coordinate order is rearranged from this as the starting point to determine the four vertexes. order.

S38 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask.

So far, the corresponding correction operation can be performed for each certificate area, and the certificate picture obtained after correction can be saved to the specified file path. At this point, the whole process of bank card tilt correction is completed.

Second Embodiment

The invention also provides a certificate detection device, which includes an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication.

Obtain the input unit, and obtain the inspection picture and standard registration picture of the certificate to be inspected through the camera component; the obtaining unit uses hardware equipment, including but not limited to mobile phones, IPAD, ordinary cameras, CCD industrial cameras, scanners, etc., to image the front of the certificate For information collection, note that the collected image should completely include the four borders of the document, and the inclination should not exceed plus or minus 20°, and the human eye can distinguish the document number and the edge straight line.

The image processing unit processes the input image through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, the deducted original image area and the affine Transform the rectified image after rectification.

The collected image is an image collected by a camera, which can be a static image (that is, an image collected separately), or an image in a video (that is, an image selected from the collected video according to preset standards or randomly An image of the present invention) can be used as the image source of the document of the present invention, and the embodiment of the present invention has no restrictions on all attributes such as the source, nature, size, etc. of the image.

The information extraction unit will correct the category and information of the image through the information extraction algorithm in the processor.

In the information output unit, the processor displays the category and information result extracted from the input picture on the display and stores it in the memory. Among them, the display includes but is not limited to the display screen of a tablet computer, computer, mobile phone, etc., which compares and classifies the certificates extracted by the processor.

Those skilled in the art may know based on the description of the embodiments of the present disclosure that, in addition to neural networks, in the embodiments of the present disclosure, for example, but not limited to: character detection algorithms based on image processing (for example, based on histogram rough segmentation and singular value Character/number detection algorithm based on feature, character/number detection algorithm based on binary wavelet transform, etc.), to perform character detection on the collected image. In addition, in addition to neural networks, embodiments of the present disclosure may also utilize, for example, but not limited to, image processing-based document detection algorithms (eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection). Statistical method, genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.), to perform document detection on the collected image.

In the embodiment of the present disclosure, when edge detection is performed on the collected image through the neural network, the neural network can be trained by using the sample image in advance, so that the trained neural network can effectively detect the edge straight lines in the image.

Third Embodiment

The present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed. Wherein, for the method, please refer to the detailed introduction in the foregoing part, and details are not repeated here.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the computer-readable medium includes a permanent Persistent and non-permanent, removable and non-removable media can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Fourth Embodiment

The present invention also provides a terminal, including a memory and a processor, the memory stores a registered picture and a computer instruction that can be run on the processor, and the processor executes the method of the foregoing method when the processor runs the computer instruction. step. Wherein, for the method, please refer to the detailed introduction in the foregoing section, and details are not repeated here.

It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or inherent to such a process, method, article of manufacture or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or device that includes the element.

It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as methods, apparatuses, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

The bank card tilt correction detection method under complex background is characterized in that, the method comprises the following steps:

The first step, model training: label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model;

The second step is the initial inspection of the document. For the picture input through the image acquisition unit, the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained;

The third step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
The method according to claim 1, wherein the model training of the first step comprises the following steps:

S11 Determine the certificate area, and find the certificate area in the picture of the original data through manual annotation;

S12 Vertex labeling and generating labels, labeling the four vertices of the document in the document area, and saving the coordinate positions of the vertices in the form of json files to generate labels;

S13 Count the size of the certificate, according to the generated annotation file, count the area size s of each certificate area, so as to serve the subsequent testing stage;

S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.
The method according to claim 2, wherein: in step S14, the input picture and the corresponding annotation file have the same size; and the json file is converted into a corresponding 0-1 binary mask image before training, The area where the pixel is 1 represents the document area, and the area where the pixel is 0 represents the background area.
The method according to claim 1, wherein the initial inspection of the certificate in the second step comprises the following steps:

S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;

S22 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area;

S23 Threshold truncation, binarize the probability distribution map according to the preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map;

S24 rough segmentation mask, the 0-1 mask image is upsampled to the same size as the original input image, and a preliminary document rough segmentation mask image is obtained;

S25 legal area screening, count the area of the bank card in the training phase, calculate the distribution function in the training set, get the mean μ and standard deviation σ, and count the area a of each isolated document area in the rough segmentation mask map, if a ≤μ-3σ, then the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to filter some error areas through legal area screening.
The method according to claim 1, characterized in that, in the third step of standardization, refining the mask correction is performed on the legal area in the mask map after the first step of screening, comprising the following steps:

S31 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the document photo;

S32 obtains the contour convex hull, obtains the minimum convex hull of this contour on the basis of the original contour, and fills the missing area of the partial segmentation, and makes the contour edge smooth simultaneously;

S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull;

S34 Find the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, for the case where the two straight lines are parallel do not consider;

S35 Vertex legal screening, set the filtering conditions to check the legality of the vertices, the tolerance value tol is set in the filtering conditions, the abscissa [0-tol, width+tol] and the ordinate [0-tol, height+tol] are defined as legal Vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex (x crosspoint , y crosspoint ) exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is :

in,

min(x crosspoint ,width)The maximum value of x crosspoint cannot exceed the original image width, and the minimum value of max(min(x crosspoint ,width),0) cannot be less than 0;

In the same way, min(y crosspoint , height) will make the maximum value of y crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.

S36 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained;

S37 vertex sorting, in order to facilitate subsequent operations, determine the sorting of the four vertices through the following steps: 1) obtain the coordinates of the center point according to the coordinates of the four vertices; 2) establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and use the minimum coordinate value The vertex of the sum is the upper left vertex, and the coordinate order is rearranged with the upper left vertex as the starting point, and arranged in the order of "upper left - upper right - lower right - lower left";

S38 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask;

S39 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
The method according to claim 5, characterized in that: in step S33, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
The method according to claim 5, wherein: in step S36, the specific algorithm of K-means is:

1) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;

2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance from each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j:

argmin j ||(x i ,y i )-μ j || 2 ,j=0,1,2,3;

Among them, ||(x i ,y i )-μ j || 2 ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all the vertices of category j; argmin j || (x i , y i )-μ j || 2 , j=0,1,2,3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest;

3) Recalculate the coordinates of the 4 centroids;

4) Repeat 2) and 3) process until convergence.
A certificate detection device, characterized in that: the device comprises an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication; wherein,

Obtain the input unit, and obtain the detection picture and the standard registration picture of the certificate to be detected through the camera assembly;

The image processing unit processes the input image through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, the deducted original image area and the affine area. The rectified image after transformation and rectification;

The information extraction unit corrects the category and information of the image through the information extraction algorithm in the processor;

In the information output unit, the processor displays the category and information result extracted from the input picture on the display and stores it in the memory.
A computer-readable storage medium on which computer instructions are stored, characterized in that: when the computer instructions are executed, the steps of the method according to any one of claims 1-7 are executed.
A terminal, comprising a memory and a processor, characterized in that: the memory stores a registered picture and computer instructions that can be run on the processor, and the processor executes claims 1- when running the computer instructions 7 the steps of any one of the methods.