WO2022121025A1

WO2022121025A1 - Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal

Info

Publication number: WO2022121025A1
Application number: PCT/CN2020/140736
Authority: WO
Inventors: 吴昌宇; 黄跃珍; 王晓亮
Original assignee: 广州广电运通金融电子股份有限公司
Priority date: 2020-12-10
Filing date: 2020-12-29
Publication date: 2022-06-16
Also published as: CN112686248A; CN112686248B

Abstract

Provided are a certificate category increase and decrease detection method and apparatus, a readable storage medium, and a terminal. The method comprises: firstly, storing standard pictures of certificates of various categories in a memory as registered pictures; secondly, detecting certificates to be detected, so as to obtain an input picture, and performing image processing on a picture to be input; and finally, comparing the processed picture with the registered pictures, determining, by means of a similarity, a category to which the detected picture belongs, so as to perform quick and efficient screening and category determination on newly-added certificates. By means of the detection solution, the accuracy and efficiency of determining a category of a newly-added certificate in a complex photographing scenario can be improved, and the detection solution can be widely applied in the fields of security, finance, etc.

Description

Method, device, readable storage medium and terminal for detecting change of certificate type

technical field

The present invention relates to the technical field of information detection or intelligent vision, in particular to a method, device, readable storage medium and terminal for detecting an increase or decrease category of a certificate.

Background technique

For document image recognition, it is necessary to quickly and efficiently identify identity information in the fields of security, finance, and enterprise information management. In the early days, most of the information of ID cards required manual input, which was very inefficient, and the long-term identification process would also make people's eyes tired.

With the rise of artificial intelligence, image recognition technology is gradually applied in security, military, medical, intelligent transportation and other fields, and technologies such as face recognition and fingerprint recognition are increasingly used in public security, finance, aerospace and other security fields. In the military field, image recognition is mainly used in the reconnaissance and identification of targets, through automatic image recognition technology to identify and strike enemy targets; in the medical field, various medical image analysis and diagnosis can be carried out through image recognition technology, On the one hand, it can greatly reduce the cost of medical treatment, and on the other hand, it can also help to improve the quality and efficiency of medical care; in the field of transportation, it can not only perform license plate recognition, but also be applied to the cutting-edge field of autonomous driving to achieve a clear view of roads, vehicles and pedestrians. Identify, improve the convenience of life and reduce people's travel costs. Although technologies for automatic identification or automatic extraction of document information have emerged, for complex scenes, such as document misalignment in vision, uneven illumination, external light field interference, and debris coverage, the outline of the document and the border of the image background are blurred. , which is not conducive to the accurate extraction of the document boundary, resulting in reduced or failed document number detection efficiency. Some solutions for this have also emerged as follows.

Traditional method: use the edge detection algorithm, use the edge detection operator to locate the edge of the document, use the edge point line fitting to determine the information of the intersection of the document edge line and the edge straight line to determine the document deflection angle, rotate the document, and then use the image processing method to detect the document Number position, accurate detection of document edge points is the core step of this method, and the edge detection operator has high requirements on the complexity of the image background. The edge point detection of the certificate fails, so the detection of the certificate number cannot be realized.

Deep learning method: This method uses a large amount of labeled data to train the deep network in the model training stage, fits the network parameters, and realizes the modeling of the OCR (Optical Character Recognition, Optical Character Recognition) detection algorithm. The image is used as the input of the network, and the character region detection is realized through the network forward reasoning. This method is currently a popular character detection method, but for the identification number detection task, this method has the following defects: (1) The non-document area image also participates in the network reasoning process, which wastes computing resources on the one hand; False detection of characters in the region existence requires additional processing logic to be eliminated; (2) This scheme consumes more computing resources, and the training and reasoning time is longer than this proposal; (3) Due to the inexplicability of the neural network, this method The frame of the positioned character area cannot accurately locate the smallest bounding rectangle of the character, and even cuts off part of the character area. That is, the traditional optical recognition (OCR) technology of document images is mainly used for high-definition scanned images. This method requires the recognized images to have clean Background, use standard print and have high resolution. However, in natural scenes, there are problems such as large text background noise, irregular text distribution, and the influence of natural light sources. The detection rate of OCR technology in actual natural scenes is not ideal, and identification of documents such as documents brings pressure to the character recognition in the subsequent steps.

In addition, although AI technology has been applied to all walks of life and can meet the needs of some practical application scenarios, as the targets to be detected or to be identified, such as customer detection targets in the banking industry, will be added or deleted from time to time. When the target increases, it is often necessary to complete sample collection, labeling, model training, deployment, etc. The optimization process has a long cycle and low efficiency.

Based on the above situation, intelligent detection of certificates (including ID cards, bank cards, work permits, etc.) and new detection of certificate categories cannot respond quickly to changes in actual application scenarios and increase or decrease of detection targets. That is, the increase or decrease of detection targets and the diversification of practical application scenarios put forward higher requirements for modern document recognition.

SUMMARY OF THE INVENTION

In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a method, device, readable storage medium and terminal for detecting the increase or decrease of a certificate, which can solve the above problems.

Design principle: first, standard pictures of various types of documents are stored in the memory as registered pictures; secondly, the documents to be tested are detected, the input pictures are obtained, and the input pictures are processed by image processing; the final processed pictures are compared with the registered pictures, and the similarity is determined by the detection. The category of the picture, so that the new certificate can be quickly and efficiently screened to determine the category.

Technical solution: The purpose of the present invention is achieved by the following technical solutions.

A method for detecting the increase or decrease of documents, the method comprising: the first step, initial document inspection, using a deep learning model to search for a corresponding potential document area for a picture input through an image acquisition unit, to obtain a preliminary and rough document area mask; The second step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , transform it into the preset ID photo size, and output the corrected ID image; the third step, image comparison, compare the corrected ID image output in the second step with the registered image, determine the category of the input image and output it.

Preferably, the initial inspection of the certificate in the first step includes the following steps: S11 extracting features, after inputting the picture, scaling the picture to a size suitable for the input picture of the segmentation network, and then using the Unet network model to extract depth features from the input data to obtain a feature map; S12 calculates the probability, performs two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area; S13 threshold value truncation, according to the preset The threshold of the probability distribution map is binarized, the probability greater than the threshold is set to 1, the probability less than the threshold is set to 0, and the 0-1 mask map is obtained; S14 rough segmentation mask, the 0-1 mask map is Sampling to the same size as the original input to get a preliminary rough segmentation mask image of the document; S15 legal area screening, count the area a of each isolated document area in the rough segmentation mask image, if a≤μ-3σ, Then, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to filter some erroneous areas through legal area screening. Among them, the distribution of the area value of the document area obeys the normal distribution, and the probability of a≤μ-3σ is less than 0.5%. μ represents the expected value of the area distribution of the document area; σ represents the standard deviation of the area distribution of the document area.

Preferably, in the second step of standardization, fine-grained mask correction is performed on the legal area in the mask image after the first step of screening, including the following steps: S21 extracting regional contour features, and the contour features are a binary mask. The film map is a closed irregular curve as a whole, and the binary mask map does not change the properties of the rectangular convex set of the document photo; Partially segment the missing area to fill, and at the same time smooth the contour edge; S23 line fitting, use Hough transform to perform line fitting on the irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull; S24 Find Take the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, do not do anything if the two straight lines are parallel. Consider; S25 vertices are validly screened, set the screening conditions to check the validity of the vertices, the tolerance value tol is set in the screening conditions, the abscissa [0-tol, width+tol], the ordinate [0-tol, height+tol] is defined are legal vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is:

Among them, the maximum value of x _crosspoint in min(x _crosspoint , width) cannot exceed the width of the original image, and the minimum value of max(min(x _crosspoint , width), 0) cannot be less than 0; similarly, min(y _crosspoint , height) equals y The maximum value of the _crosspoint cannot exceed the height of the original image, and the minimum value of max(min(y _corsspoint , height), 0) cannot be less than 0.

S26 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertices, a total of four vertex coordinates are obtained; S27 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps: 1) According to the coordinates of the four vertices, obtain the coordinates of the center point; 2) Use the center point to establish a pole Coordinate system, and construct the vector pointing from the center point to each vertex, and find the angle between each vector and the polar axis in turn; 3) Sort the four vertices according to the size of the angle from large to small; 4) Find the certificate The upper left corner of the area, and starting from the upper left point, arranged in the order of "upper left - upper right - lower right - lower left"; S28 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral formed by the four vertices The area is filled with binary values to form a binary mask; S29 affine transformation outputs the corrected picture, and for the document area where the four vertices are re-determined, affine transformation is performed on the document area according to the preset target document photo size, I _output = WI _input , where W is the affine transformation matrix between the document area and the size of the target document; in this way, a corresponding correction operation is performed on each document area, and the corrected document picture is output and saved as a corrected picture to the specified file path.

Preferably, in step S23, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.

Preferably, in step S25, the tolerance value tol is set to 50.

Preferably, in step S26, the specific algorithm of K-means is: 1) randomly select four cluster centroid points μ ₀ , μ ₁ , μ ₂ , μ ₃ ; 2) For each vertex coordinate ( _xi , y _i ), by calculating the Euclidean distance from the centroid of each cluster, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin _j ||( _xi ,y _i )-μ _j | | ₂ ,j=0,1,2,3; 3) Recalculate the coordinates of the 4 centroids; 4) Repeat 2) and 3) until convergence.

||(x _i ,y _i )-μ _j || ₂ ,j=0,1,2,3: Calculate the Euclidean norm between the centroid point j and all vertices of category j; argmin _j ||(x _i , y _i )-μ _j || ₂ , j=0,1,2,3: Adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest.

Preferably, in step 4) of step S27, the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex with the smallest sum of coordinate values is the upper left vertex, and the coordinate order is rearranged from this as the starting point to determine the four vertices. Order.

Preferably, the image comparison in the third step includes the following steps:

S31 image binarization, binarizing the registered image A and the image to be classified B, the corresponding vectors are x ₁ x ₂ x ₃ ...... x _n and y ₁ y ₂ y ₃ ...... y _n ;

S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:

S33 Similarity determination, the smaller the cosine of the included angle, the more irrelevant the two images are: when the cosine value of the included angle is close to 1, the two images are similar; when the cosine of the included angle between the two image vectors is equal to 1, the two images are the same ; Among them, the most relevant or identical registered picture A is determined as the picture B to be classified, that is, the category to which the input picture belongs, and is output.

A certificate detection device, the device comprises an acquisition input unit, an image processing unit, an image comparison classification unit and a certificate category output unit connected by telecommunication; the acquisition input unit obtains the detection picture of the certificate to be detected and the standard registration picture through a camera assembly; The processing unit processes the input image through the deep learning algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the refined correction mask, and the corrected image after affine correction transformation; the image comparison and classification unit, through the processing The comparison algorithm in the device compares and classifies the corrected image with the registered picture stored in the memory; in the document category output unit, the processor displays the result of the category of the input picture after comparison and sorting on the display and stores it in the memory.

A computer-readable storage medium having computer instructions stored thereon that, when executed, perform the steps of the aforementioned method.

A terminal includes a memory and a processor, the memory stores a registered picture and computer instructions that can be executed on the processor, and the processor executes the steps of the aforementioned method when the processor executes the computer instructions.

Compared with the prior art, the present invention has the beneficial effects that: by storing standard pictures of various certificates in front of the body, accurate comparison objects are provided, and the comparison and screening accuracy are improved; the comparison algorithm adopted is simple, efficient and accurate, which improves the comparison rate. For screening efficiency; the invention can realize rapid response to detection target changes in application scenarios, improve the application scope of certificate identification, and can be widely used in security, finance and other fields.

Description of drawings

Fig. 1 is the flow chart of the detection method of certificate increase and decrease category of the present invention;

Fig. 2 is the method flow chart of the initial inspection of the certificate;

Fig. 3 is the flow chart of document image standardization;

FIG. 4 is an example diagram of similarity comparison of image comparison.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

first embodiment

A method for detecting an increase or decrease category of a certificate, see Figure 1, the method includes the following steps.

The first step is the initial inspection of the document. For the picture input through the image acquisition unit, the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained.

The second step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.

The third step, image comparison, compares the corrected certificate image output in the second step with the registered image, determines the category of the input image and outputs it.

Further, the new category detection process of the certificate is divided into three stages. The first two stages are the segmentation optimization model (two-stage and coarse-to-fine refinement segmentation) for image segmentation. As shown in Figure 1, in the first stage, we use the deep learning model to find the corresponding potential document area for the input image, and obtain a preliminary, relatively rough document area mask; in the second stage, use the traditional image processing technology , refine and correct the rough mask in the first stage to obtain a high-quality document area mask, use the mask to extract the document area in the original image, perform affine correction transformation on the obtained document photo, and convert it into a pre- Set the ID photo size. The third stage is to compare the ID photo and the registered image, and output the category to which the input image belongs.

In the first stage of detection, that is, the first step of the initial inspection of the document, the goal of finding the document area is mainly completed by several sub-operations of extracting features, calculating probability, and threshold truncation, and finally obtains a preliminary rough segmentation mask. As shown in Figure 2, after the user inputs the picture, it is scaled to the input picture size suitable for the segmentation network, and then the classic Unet network model is used to extract the depth features of the input data; The two-class judgment is to obtain the probability value that the feature of each position belongs to the certificate area. So far, a probability distribution map belonging to the certificate area is obtained; then, the probability distribution map is binarized according to the preset threshold. operation, set the probability of being greater than the threshold to 1 and the probability of being less than the threshold to 0, then we upsample this 0-1 mask to the same size as the original input. At this point, the first stage operation is completed, and we get a preliminary document segmentation mask map. Specific steps are as follows.

S11 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map.

S12 calculates the probability, performs two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains a probability distribution map belonging to the certificate area.

S13 Threshold truncation, binarize the probability distribution map according to a preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map.

S14 rough segmentation mask, upsample the 0-1 mask image to the same size as the original input, and obtain a preliminary document rough segmentation mask image.

S15 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to pass the legal area Filtering will filter some error areas.

Among them, the distribution of the area value of the document area obeys the normal distribution, and the probability of a≤μ-3σ is less than 0.5%. μ represents the expected value of the area distribution of the document area; σ represents the standard deviation of the area distribution of the document area.

The Unet network model belongs to the segmentation network. Unet draws on the FCN network. Its network structure includes two symmetrical parts: the first part of the network is the same as the ordinary convolutional network, using 3x3 convolution and pooling downsampling, which can capture the image in the image The context information (that is, the relationship between pixels); the latter part of the network is basically symmetrical with the former, using 3x3 convolution and upsampling to achieve the purpose of output image segmentation. In addition, feature fusion is also used in the network, and the features of the previous part of the downsampling network are fused with the features of the latter part of the upsampling part to obtain more accurate context information and achieve a better segmentation effect. Moreover, Unet uses a weighted softmax loss function, which has its own weight for each pixel, which makes the network pay more attention to the learning of edge pixels. Using this model is more suitable for the slight uneven change of the edge of the document which is not straight.

On the basis of the first stage, the refinement mask refinement of the second stage is performed. As shown in Figure 3, all legal regions in the mask map obtained in the first stage must be corrected one by one. In the second step of standardization, for each legal document area, that is, a refined mask correction is performed on the legal area in the mask map after the screening in the first step, see FIG. 3 , including the following steps.

S21 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the ID photo.

When performing the following operations, first introduce a property to ensure the legality of the following operations.

Property definition: Convex sets are still convex sets after affine transformation. One of the good properties of ID photo is that it is a regular rectangular shape, which is a standard convex set. No matter what affine transformation the convex set undergoes in the collection stage, the properties of the convex set cannot be changed.

S22 Obtain the convex hull of the contour, obtain the minimum convex hull of the contour on the basis of the original contour, fill in the missing area of the partial segmentation, and make the contour edge smooth at the same time.

Since the contour extraction in the previous step completely relies on the results of the segmentation model, some uneven edges are uneven, which is inconsistent with the nature of the ID photo. Therefore, the minimum convex hull of the contour is obtained on the basis of the original contour, and the missing area of the partial segmentation is filled, and the contour edge is smoother at the same time.

S23 line fitting, using Hough transform to perform line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull. In a specific embodiment, in step S23, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.

Among them, Hough transform is a feature extraction, which is widely used in image analysis, computer vision and digital image processing. Extract features in objects, such as lines. This scheme uses it to accurately parse the defined document edge line.

S24 finds the vertices, reads all the legal straight lines in the straight line fitting to find the intersection points in pairs, so as to find the distribution range of the four vertices of the certificate photo. Specifically, all the legal straight lines detected in S23 can be straight lines. analytic expression. For all legal straight lines, read them to find the intersection points. This step is to find the distribution range of the four vertices of the ID photo. And in the process of finding the vertices, the case where the two lines are parallel is not considered.

S25 Vertex legal screening, in all the obtained vertices, not all vertices are legal, therefore, setting filtering conditions to check the legality of vertices, which improves the accuracy and processing speed for the subsequent steps. Specifically, a filter condition is set to check the legitimacy of the vertex. The tolerance value tol is set in the filter condition, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal vertex coordinates , where width and height represent the width and height of the original image. In a specific embodiment, the tolerance value tol is set to 50. And, if the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is:

Among them, min(x _crosspoint , width) will make the maximum value of x _crosspoint not exceed the width of the original image, and the minimum value of max(min(x _crosspoint , width), 0) cannot be less than 0;

In the same way, min(y _crosspoint , height) will make the maximum value of y _{crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint} _, height), 0) cannot be less than 0.

S26 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained.

Among them, the specific algorithm of K-means is:

1) Randomly select 4 cluster centroid points μ ₀ , μ ₁ , μ ₂ , μ ₃ ;

2) For each vertex coordinate (x _i , y _i ), by calculating the Euclidean distance from each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j

: argmin _j ||(x _i ,y _i )-μ _j || ₂ ,j=0,1,2,3;

||(x _i ,y _i )-μ _j || ₂ ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all vertices of category j; argmin _j ||(x _i , y _i )-μ _j || ₂ , j=0,1,2,3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest.

3) Recalculate the coordinates of the 4 centroids;

4) Repeat 2) and 3) process until convergence.

Among them, K-means is the most commonly used clustering algorithm based on Euclidean distance, which is numerical, unsupervised, non-deterministic, and iterative, and the algorithm aims to minimize an objective function - the squared error function (all The sum of the distance between the observation point and its center point), it believes that the closer the distance between the two targets, the greater the similarity. Due to its excellent speed and good scalability, the Kmeans clustering algorithm can be regarded as the most famous clustering algorithm method.

S27 Vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps:

1) Obtain the coordinates of the center point according to the coordinates of the four vertices;

2) Establish a polar coordinate system with the center point, and construct a vector pointing from the center point to each vertex, and obtain the angle between each vector and the polar axis in turn;

3) Sort the four vertices according to the size of the included angle from large to small;

4) Find the upper left corner of the document area, and start from the upper left corner, and arrange them in the order of "upper left - upper right - lower right - lower left".

Wherein, in step 4) of step S27, the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex of the sum of the smallest coordinate value is the upper left vertex, and the coordinate order is rearranged with this as the starting point to determine the four vertexes. order.

S28 area filling, after finding and arranging the vertex coordinates in sequence, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask.

S29 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I _output =WI _input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.

Image comparison, the third step of image comparison includes the following steps.

S33 Similarity determination, the smaller the cosine of the included angle, the more irrelevant the two images are: see Figure 4, when the cosine value of the included angle is close to 1, the two images are similar; when the cosine of the included angle between the two image vectors is equal to 1, the The two pictures are the same; among them, the most relevant or identical registered picture A is determined as the picture to be classified B, that is, the category of the input picture, and output.

The image collected in the present invention is an image collected by a camera, which can be a static image (that is, an image collected separately), or an image in a video (that is, an image from a collected video according to a preset standard or random A selected image) can be used as the image source of the document of the present invention, and the embodiment of the present invention has no restrictions on all attributes such as the source, nature, size, and the like of the image.

Those skilled in the art may know based on the description of the embodiments of the present disclosure that, in addition to the neural network, in the embodiments of the present disclosure, for example but not limited to: character detection algorithms based on image processing (for example, based on histogram rough segmentation and singular value Character/number detection algorithm based on feature, character/number detection algorithm based on binary wavelet transform, etc.), to perform character detection on the collected image. In addition, in addition to neural networks, embodiments of the present disclosure may also utilize, for example, but not limited to, image processing-based document detection algorithms (eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection). Statistical method, genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.), etc., are used for document detection on the captured image.

In the embodiment of the present disclosure, when edge detection is performed on the collected image by using the neural network, the neural network can be trained by using the sample image in advance, so that the trained neural network can effectively detect the edge straight lines in the image.

Second Embodiment

The invention also provides a certificate detection device, which includes an acquisition input unit, an image processing unit, an image comparison and classification unit, and a certificate type output unit connected by telecommunication.

Among them, the acquisition input unit obtains the detection picture and standard registration picture of the certificate to be detected through the camera component; the acquisition unit uses hardware equipment, including but not limited to mobile phones, IPADs, ordinary cameras, CCD industrial cameras, scanners, etc., to detect the front of the certificate. When collecting image information, note that the collected image should completely include the four borders of the document, and the inclination should not exceed plus or minus 20°, and the human eye can distinguish the document number and the edge straight line.

The image processing unit processes the input image through the deep learning algorithm in the processor, and sequentially obtains a preliminary rough document area mask, a refined correction mask, and a corrected image after affine correction transformation. Specifically, the corrected image is compared and classified with the registered image stored in the memory through a comparison algorithm in the processor. Using algorithms, programs, etc. stored in the memory, corresponding processing and data extraction are performed on the obtained images by the processor.

In the certificate category output unit, the processor displays the result of the category of the input picture after comparison and sorting on the display and stores it in the memory. Among them, the display includes but is not limited to the display screen of a tablet computer, computer, mobile phone, etc., which compares and classifies the certificates extracted by the processor.

Third Embodiment

The present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed. Wherein, for the certificate detection method, please refer to the detailed introduction in the foregoing section, and details are not repeated here.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the computer-readable medium includes a permanent Persistent and non-permanent, removable and non-removable media can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Fourth Embodiment

The present invention also provides a terminal, comprising a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, and the processor executes the steps of the foregoing method when the processor executes the computer instructions. Wherein, for the method for detecting the certificate number, please refer to the detailed introduction in the foregoing part, and details are not repeated here.

The above solution solves the problem that the outline of the certificate and the border of the image background are blurred in the case of complex background, which is not conducive to the accurate classification of new categories or items of the certificate.

It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or inherent to such a process, method, article of manufacture or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or device that includes the element.

It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as methods, apparatuses, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

A method for detecting the increase or decrease of certificates, characterized in that the method comprises the following steps:

The first step is the initial inspection of the document, using the deep learning model to find the corresponding potential document area for the picture input through the image acquisition unit, and obtain a preliminary and rough document area mask;

The second step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID image;

The third step, image comparison, compares the corrected certificate image output in the second step with the registered image, determines the category of the input image and outputs it.
The method according to claim 1, wherein the initial inspection of the certificate in the first step comprises the following steps:

S11 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;

S12 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains the probability distribution map belonging to the certificate area;

S13 Threshold truncation, binarize the probability distribution map according to the preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map;

S14 rough segmentation mask, the 0-1 mask image is upsampled to the same size as the original input, and a preliminary document rough segmentation mask image is obtained;

S15 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, consider the area a as an illegal area, and remove it from the rough segmentation mask, so as to pass the legal area Filtering will filter some error areas.

The area value distribution of the document area obeys the normal distribution, and the probability of a≤μ-3σ is less than 0.5%.

μ represents the expected value of the area distribution of the document area

σ represents the standard deviation of the area distribution of the document area.
The method according to claim 1, characterized in that, in the second step of standardization, refining the mask correction is performed on the legal area in the mask image after the first step of screening, comprising the following steps:

S21 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the ID photo;

S22 to obtain the convex hull of the contour, obtain the minimum convex hull of the contour on the basis of the original contour, fill in the missing area of the partial segmentation, and make the contour edge smooth;

S23 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull;

S24 Find the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, for the case where the two straight lines are parallel do not consider;

S25 Vertex legal screening, set the filtering conditions to check the legality of the vertices, the tolerance value tol is set in the filtering conditions, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal Vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is:

The maximum value of x crosspoint in min(x crosspoint ,width) does not exceed the original image width, and the minimum value in max(min(x crosspoint ,width),0) cannot be less than 0;

The maximum value of y crosspoint in min(y crosspoint ,height) does not exceed the height of the original image, and the minimum value in max(min(y corsspoint , height),0) cannot be less than 0;

S26 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained;

S27 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined through the following steps: 1) Obtain the coordinates of the center point according to the coordinates of the four vertices; 2) Establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and start from the upper left corner At the beginning, arrange them in the order of "upper left - upper right - lower right - lower left";

S28 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask;

S29 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
The method according to claim 3, characterized in that: in step S23, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
The method according to claim 3, characterized in that: in step S26, the specific algorithm of K-means is:

1) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;

2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance from the centroid of each cluster, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin j | |( xi ,y i )-μ j || 2 ,j=0,1,2,3;;wherein, ||( xi ,y i )-μ j || 2 ,j=0,1, 2,3 is to calculate the Euclidean norm between the centroid point j and all vertices of category j; argmin j ||(x i ,y i )-μj||2,j=0,1,2,3 is the adjustment The centroid point, so that the Euclidean norm sum of the four centroid points is the smallest;

3) Recalculate the coordinates of the 4 centroids;

4) Repeat 2) and 3) process until convergence.
The method according to claim 3, is characterized in that: in step 4) of step S27, the coordinate value sum of the upper left coordinate point is the smallest, and the vertex of the smallest coordinate value sum is the upper left vertex, and this is the starting point Rearrange the coordinate order to determine the order of the four vertices.
method according to claim 1, is characterized in that: the image contrast of the 3rd step comprises the following steps:

S31 image binarization, binarize the registered image A and the image B to be classified, and the corresponding vectors are x 1 x 2 x 3 ...... x n and y 1 y 2 y 3 ..... .y n ;

S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:

S33 Similarity determination, the smaller the cosine of the included angle, the more irrelevant the two images are: when the cosine value of the included angle is close to 1, the two images are similar; when the cosine of the included angle between the two image vectors is equal to 1, the two images are the same ; Among them, the most relevant or identical registered picture A is determined as the picture B to be classified, that is, the category to which the input picture belongs, and is output.
A certificate detection device, characterized in that: the device comprises an acquisition input unit, an image processing unit, an image comparison and classification unit, and a certificate category output unit connected by telecommunication; the acquisition input unit obtains the detection picture and the certificate type output unit of the certificate to be detected through a camera assembly Standard registration picture; the image processing unit processes the input picture through the deep learning algorithm in the processor, and sequentially obtains a preliminary rough document area mask, a refined correction mask, and a corrected image after affine correction transformation; image The comparison and classification unit, through the comparison algorithm in the processor, compares and classifies the corrected image and the registered picture stored in the memory; the document category output unit, the processor displays the result of the category of the input picture after the comparison and sorting on the display and stores it in the memory .
A computer-readable storage medium on which computer instructions are stored, characterized in that: when the computer instructions are executed, the steps of the foregoing method are executed.
A terminal, comprising a memory and a processor, characterized in that: the memory stores a registered image and computer instructions that can be run on the processor, and the processor executes the steps of the foregoing method when the processor runs the computer instructions .