WO2022121025A1 - Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal - Google Patents

Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal Download PDF

Info

Publication number
WO2022121025A1
WO2022121025A1 PCT/CN2020/140736 CN2020140736W WO2022121025A1 WO 2022121025 A1 WO2022121025 A1 WO 2022121025A1 CN 2020140736 W CN2020140736 W CN 2020140736W WO 2022121025 A1 WO2022121025 A1 WO 2022121025A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
area
document
mask
vertices
Prior art date
Application number
PCT/CN2020/140736
Other languages
French (fr)
Chinese (zh)
Inventor
吴昌宇
黄跃珍
王晓亮
Original Assignee
广州广电运通金融电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州广电运通金融电子股份有限公司 filed Critical 广州广电运通金融电子股份有限公司
Publication of WO2022121025A1 publication Critical patent/WO2022121025A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the technical field of information detection or intelligent vision, in particular to a method, device, readable storage medium and terminal for detecting an increase or decrease category of a certificate.
  • image recognition technology is gradually applied in security, military, medical, intelligent transportation and other fields, and technologies such as face recognition and fingerprint recognition are increasingly used in public security, finance, aerospace and other security fields.
  • image recognition is mainly used in the reconnaissance and identification of targets, through automatic image recognition technology to identify and strike enemy targets; in the medical field, various medical image analysis and diagnosis can be carried out through image recognition technology, On the one hand, it can greatly reduce the cost of medical treatment, and on the other hand, it can also help to improve the quality and efficiency of medical care; in the field of transportation, it can not only perform license plate recognition, but also be applied to the cutting-edge field of autonomous driving to achieve a clear view of roads, vehicles and pedestrians.
  • Deep learning method This method uses a large amount of labeled data to train the deep network in the model training stage, fits the network parameters, and realizes the modeling of the OCR (Optical Character Recognition, Optical Character Recognition) detection algorithm.
  • OCR Optical Character Recognition, Optical Character Recognition
  • the image is used as the input of the network, and the character region detection is realized through the network forward reasoning.
  • This method is currently a popular character detection method, but for the identification number detection task, this method has the following defects: (1) The non-document area image also participates in the network reasoning process, which wastes computing resources on the one hand; False detection of characters in the region existence requires additional processing logic to be eliminated; (2) This scheme consumes more computing resources, and the training and reasoning time is longer than this proposal; (3) Due to the inexplicability of the neural network, this method The frame of the positioned character area cannot accurately locate the smallest bounding rectangle of the character, and even cuts off part of the character area. That is, the traditional optical recognition (OCR) technology of document images is mainly used for high-definition scanned images. This method requires the recognized images to have clean Background, use standard print and have high resolution. However, in natural scenes, there are problems such as large text background noise, irregular text distribution, and the influence of natural light sources. The detection rate of OCR technology in actual natural scenes is not ideal, and identification of documents such as documents brings pressure to the character recognition in the subsequent steps.
  • OCR optical recognition
  • the purpose of the present invention is to provide a method, device, readable storage medium and terminal for detecting the increase or decrease of a certificate, which can solve the above problems.
  • Design principle first, standard pictures of various types of documents are stored in the memory as registered pictures; secondly, the documents to be tested are detected, the input pictures are obtained, and the input pictures are processed by image processing; the final processed pictures are compared with the registered pictures, and the similarity is determined by the detection.
  • the category of the picture so that the new certificate can be quickly and efficiently screened to determine the category.
  • a method for detecting the increase or decrease of documents comprising: the first step, initial document inspection, using a deep learning model to search for a corresponding potential document area for a picture input through an image acquisition unit, to obtain a preliminary and rough document area mask;
  • the second step standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , transform it into the preset ID photo size, and output the corrected ID image;
  • the third step image comparison, compare the corrected ID image output in the second step with the registered image, determine the category of the input image and output it.
  • the initial inspection of the certificate in the first step includes the following steps: S11 extracting features, after inputting the picture, scaling the picture to a size suitable for the input picture of the segmentation network, and then using the Unet network model to extract depth features from the input data to obtain a feature map; S12 calculates the probability, performs two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area; S13 threshold value truncation, according to the preset The threshold of the probability distribution map is binarized, the probability greater than the threshold is set to 1, the probability less than the threshold is set to 0, and the 0-1 mask map is obtained; S14 rough segmentation mask, the 0-1 mask map is Sampling to the same size as the original input to get a preliminary rough segmentation mask image of the document; S15 legal area screening, count the area a of each isolated document area in the rough segmentation mask image, if a ⁇ -3 ⁇ , The
  • the distribution of the area value of the document area obeys the normal distribution, and the probability of a ⁇ -3 ⁇ is less than 0.5%.
  • represents the expected value of the area distribution of the document area;
  • represents the standard deviation of the area distribution of the document area.
  • fine-grained mask correction is performed on the legal area in the mask image after the first step of screening, including the following steps: S21 extracting regional contour features, and the contour features are a binary mask.
  • the film map is a closed irregular curve as a whole, and the binary mask map does not change the properties of the rectangular convex set of the document photo; Partially segment the missing area to fill, and at the same time smooth the contour edge; S23 line fitting, use Hough transform to perform line fitting on the irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull; S24 Find Take the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, do not do anything if the two straight lines are parallel.
  • the maximum value of x crosspoint in min(x crosspoint , width) cannot exceed the width of the original image, and the minimum value of max(min(x crosspoint , width), 0) cannot be less than 0; similarly, min(y crosspoint , height) equals y
  • the maximum value of the crosspoint cannot exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
  • S26 vertex clustering compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertices, a total of four vertex coordinates are obtained; S27 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps: 1) According to the coordinates of the four vertices, obtain the coordinates of the center point; 2) Use the center point to establish a pole Coordinate system, and construct the vector pointing from the center point to each vertex, and find the angle between each vector and the polar axis in turn; 3) Sort the four vertices according to the size of the angle from large to small; 4) Find the certificate The upper left corner of the area, and starting from the upper left point, arranged in the order of "upper left - upper right - lower right -
  • the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
  • the tolerance value tol is set to 50.
  • the specific algorithm of K-means is: 1) randomly select four cluster centroid points ⁇ 0 , ⁇ 1 , ⁇ 2 , ⁇ 3 ; 2) For each vertex coordinate ( xi , y i ), by calculating the Euclidean distance from the centroid of each cluster, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin j
  • 2 ,j 0,1,2,3; 3) Recalculate the coordinates of the 4 centroids; 4) Repeat 2) and 3) until convergence.
  • step 4) of step S27 the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex with the smallest sum of coordinate values is the upper left vertex, and the coordinate order is rearranged from this as the starting point to determine the four vertices. Order.
  • the image comparison in the third step includes the following steps:
  • S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:
  • a certificate detection device comprises an acquisition input unit, an image processing unit, an image comparison classification unit and a certificate category output unit connected by telecommunication;
  • the acquisition input unit obtains the detection picture of the certificate to be detected and the standard registration picture through a camera assembly;
  • the processing unit processes the input image through the deep learning algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the refined correction mask, and the corrected image after affine correction transformation;
  • the image comparison and classification unit through the processing
  • the comparison algorithm in the device compares and classifies the corrected image with the registered picture stored in the memory; in the document category output unit, the processor displays the result of the category of the input picture after comparison and sorting on the display and stores it in the memory.
  • a computer-readable storage medium having computer instructions stored thereon that, when executed, perform the steps of the aforementioned method.
  • a terminal includes a memory and a processor, the memory stores a registered picture and computer instructions that can be executed on the processor, and the processor executes the steps of the aforementioned method when the processor executes the computer instructions.
  • the present invention has the beneficial effects that: by storing standard pictures of various certificates in front of the body, accurate comparison objects are provided, and the comparison and screening accuracy are improved; the comparison algorithm adopted is simple, efficient and accurate, which improves the comparison rate. For screening efficiency; the invention can realize rapid response to detection target changes in application scenarios, improve the application scope of certificate identification, and can be widely used in security, finance and other fields.
  • Fig. 1 is the flow chart of the detection method of certificate increase and decrease category of the present invention
  • Fig. 2 is the method flow chart of the initial inspection of the certificate
  • Fig. 3 is the flow chart of document image standardization
  • FIG. 4 is an example diagram of similarity comparison of image comparison.
  • a method for detecting an increase or decrease category of a certificate see Figure 1, the method includes the following steps.
  • the first step is the initial inspection of the document.
  • the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained.
  • the second step standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
  • the third step image comparison, compares the corrected certificate image output in the second step with the registered image, determines the category of the input image and outputs it.
  • the new category detection process of the certificate is divided into three stages.
  • the first two stages are the segmentation optimization model (two-stage and coarse-to-fine refinement segmentation) for image segmentation.
  • the segmentation optimization model two-stage and coarse-to-fine refinement segmentation for image segmentation.
  • the first stage we use the deep learning model to find the corresponding potential document area for the input image, and obtain a preliminary, relatively rough document area mask;
  • the second stage use the traditional image processing technology , refine and correct the rough mask in the first stage to obtain a high-quality document area mask, use the mask to extract the document area in the original image, perform affine correction transformation on the obtained document photo, and convert it into a pre- Set the ID photo size.
  • the third stage is to compare the ID photo and the registered image, and output the category to which the input image belongs.
  • the goal of finding the document area is mainly completed by several sub-operations of extracting features, calculating probability, and threshold truncation, and finally obtains a preliminary rough segmentation mask.
  • the user inputs the picture, it is scaled to the input picture size suitable for the segmentation network, and then the classic Unet network model is used to extract the depth features of the input data; The two-class judgment is to obtain the probability value that the feature of each position belongs to the certificate area. So far, a probability distribution map belonging to the certificate area is obtained; then, the probability distribution map is binarized according to the preset threshold.
  • S11 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map.
  • S12 calculates the probability, performs two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains a probability distribution map belonging to the certificate area.
  • S14 rough segmentation mask upsample the 0-1 mask image to the same size as the original input, and obtain a preliminary document rough segmentation mask image.
  • S15 Legal area screening count the area a of each isolated document area in the rough segmentation mask, if a ⁇ -3 ⁇ , the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to pass the legal area Filtering will filter some error areas.
  • the distribution of the area value of the document area obeys the normal distribution, and the probability of a ⁇ -3 ⁇ is less than 0.5%.
  • represents the expected value of the area distribution of the document area;
  • represents the standard deviation of the area distribution of the document area.
  • the Unet network model belongs to the segmentation network.
  • Unet draws on the FCN network. Its network structure includes two symmetrical parts: the first part of the network is the same as the ordinary convolutional network, using 3x3 convolution and pooling downsampling, which can capture the image in the image
  • the context information that is, the relationship between pixels
  • the latter part of the network is basically symmetrical with the former, using 3x3 convolution and upsampling to achieve the purpose of output image segmentation.
  • feature fusion is also used in the network, and the features of the previous part of the downsampling network are fused with the features of the latter part of the upsampling part to obtain more accurate context information and achieve a better segmentation effect.
  • Unet uses a weighted softmax loss function, which has its own weight for each pixel, which makes the network pay more attention to the learning of edge pixels. Using this model is more suitable for the slight uneven change of the edge of the document which is not straight.
  • the refinement mask refinement of the second stage is performed. As shown in Figure 3, all legal regions in the mask map obtained in the first stage must be corrected one by one.
  • a refined mask correction is performed on the legal area in the mask map after the screening in the first step, see FIG. 3 , including the following steps.
  • the contour feature is a binary mask image
  • the whole is a closed irregular curve
  • the binary mask image does not change the properties of the rectangular convex set of the ID photo.
  • Convex sets are still convex sets after affine transformation.
  • One of the good properties of ID photo is that it is a regular rectangular shape, which is a standard convex set. No matter what affine transformation the convex set undergoes in the collection stage, the properties of the convex set cannot be changed.
  • the minimum convex hull of the contour is obtained on the basis of the original contour, and the missing area of the partial segmentation is filled, and the contour edge is smoother at the same time.
  • step S23 line fitting, using Hough transform to perform line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull.
  • the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
  • Hough transform is a feature extraction, which is widely used in image analysis, computer vision and digital image processing. Extract features in objects, such as lines. This scheme uses it to accurately parse the defined document edge line.
  • S24 finds the vertices, reads all the legal straight lines in the straight line fitting to find the intersection points in pairs, so as to find the distribution range of the four vertices of the certificate photo.
  • all the legal straight lines detected in S23 can be straight lines. analytic expression. For all legal straight lines, read them to find the intersection points. This step is to find the distribution range of the four vertices of the ID photo. And in the process of finding the vertices, the case where the two lines are parallel is not considered.
  • a filter condition is set to check the legitimacy of the vertex.
  • the tolerance value tol is set in the filter condition, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal vertex coordinates , where width and height represent the width and height of the original image.
  • the tolerance value tol is set to 50.
  • min(x crosspoint , width) will make the maximum value of x crosspoint not exceed the width of the original image, and the minimum value of max(min(x crosspoint , width), 0) cannot be less than 0;
  • min(y crosspoint , height) will make the maximum value of y crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
  • K-means is the most commonly used clustering algorithm based on Euclidean distance, which is numerical, unsupervised, non-deterministic, and iterative, and the algorithm aims to minimize an objective function - the squared error function (all The sum of the distance between the observation point and its center point), it believes that the closer the distance between the two targets, the greater the similarity. Due to its excellent speed and good scalability, the Kmeans clustering algorithm can be regarded as the most famous clustering algorithm method.
  • step 4) of step S27 the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex of the sum of the smallest coordinate value is the upper left vertex, and the coordinate order is rearranged with this as the starting point to determine the four vertexes. order.
  • Image comparison the third step of image comparison includes the following steps.
  • S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:
  • the image collected in the present invention is an image collected by a camera, which can be a static image (that is, an image collected separately), or an image in a video (that is, an image from a collected video according to a preset standard or random A selected image) can be used as the image source of the document of the present invention, and the embodiment of the present invention has no restrictions on all attributes such as the source, nature, size, and the like of the image.
  • embodiments of the present disclosure may also utilize, for example, but not limited to, image processing-based document detection algorithms (eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection). Statistical method, genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.), etc., are used for document detection on the captured image.
  • image processing-based document detection algorithms eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection.
  • the neural network when edge detection is performed on the collected image by using the neural network, the neural network can be trained by using the sample image in advance, so that the trained neural network can effectively detect the edge straight lines in the image.
  • the invention also provides a certificate detection device, which includes an acquisition input unit, an image processing unit, an image comparison and classification unit, and a certificate type output unit connected by telecommunication.
  • the acquisition input unit obtains the detection picture and standard registration picture of the certificate to be detected through the camera component; the acquisition unit uses hardware equipment, including but not limited to mobile phones, IPADs, ordinary cameras, CCD industrial cameras, scanners, etc., to detect the front of the certificate.
  • hardware equipment including but not limited to mobile phones, IPADs, ordinary cameras, CCD industrial cameras, scanners, etc.
  • the collected image should completely include the four borders of the document, and the inclination should not exceed plus or minus 20°, and the human eye can distinguish the document number and the edge straight line.
  • the image processing unit processes the input image through the deep learning algorithm in the processor, and sequentially obtains a preliminary rough document area mask, a refined correction mask, and a corrected image after affine correction transformation. Specifically, the corrected image is compared and classified with the registered image stored in the memory through a comparison algorithm in the processor. Using algorithms, programs, etc. stored in the memory, corresponding processing and data extraction are performed on the obtained images by the processor.
  • the processor displays the result of the category of the input picture after comparison and sorting on the display and stores it in the memory.
  • the display includes but is not limited to the display screen of a tablet computer, computer, mobile phone, etc., which compares and classifies the certificates extracted by the processor.
  • the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.
  • a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • the present invention also provides a terminal, comprising a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, and the processor executes the steps of the foregoing method when the processor executes the computer instructions.
  • a terminal comprising a memory and a processor
  • the memory stores computer instructions that can be executed on the processor
  • the processor executes the steps of the foregoing method when the processor executes the computer instructions.
  • the embodiments of the present application may be provided as methods, apparatuses, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Abstract

Provided are a certificate category increase and decrease detection method and apparatus, a readable storage medium, and a terminal. The method comprises: firstly, storing standard pictures of certificates of various categories in a memory as registered pictures; secondly, detecting certificates to be detected, so as to obtain an input picture, and performing image processing on a picture to be input; and finally, comparing the processed picture with the registered pictures, determining, by means of a similarity, a category to which the detected picture belongs, so as to perform quick and efficient screening and category determination on newly-added certificates. By means of the detection solution, the accuracy and efficiency of determining a category of a newly-added certificate in a complex photographing scenario can be improved, and the detection solution can be widely applied in the fields of security, finance, etc.

Description

证件增减类别检测方法、装置、可读存储介质和终端Method, device, readable storage medium and terminal for detecting change of certificate type 技术领域technical field
本发明涉及信息检测或智能视觉技术领域,具体涉及一种证件增减类别检测方法、装置、可读存储介质和终端。The present invention relates to the technical field of information detection or intelligent vision, in particular to a method, device, readable storage medium and terminal for detecting an increase or decrease category of a certificate.
背景技术Background technique
对于证件图像识别,在安防、金融、企事业信息管理领域需要快速高效识别身份信息。早期的份证的信息大多需要人工录入,效率十分低下,而且长时间的识别过程也会使人眼疲劳,所以人工录入已经不适应于当今计算机等领域飞速发展的现状。For document image recognition, it is necessary to quickly and efficiently identify identity information in the fields of security, finance, and enterprise information management. In the early days, most of the information of ID cards required manual input, which was very inefficient, and the long-term identification process would also make people's eyes tired.
随着人工智能的兴起,图像识别技术逐步应用于安全、军事、医疗、智能交通等领域,人脸识别和指纹识别等技术越来越多的使用到公共安全、金融和航空航天等安全领域。在军事领域,图像识别主要应用于目标的侦查和识别,通过自动化的图像识别技术来进行敌方目标的识别并进行打击;在医疗领域,通过图像识别技术可以进行各类医学图像分析和诊断,一方面可以大大降低医疗的成本,另一方面也有助于提高医疗质量和效率;在交通领域不仅可以进行车牌识别,同时也可以应用到前沿的自动驾驶领域,实现对道路、车辆和行人的清晰识别,提高生活的便利并且降低人们出行成本。虽然已出现了自动识别或自动提取证件信息的技术,然而对于复杂场景,如证件在视觉内未对准、光照不均、外光场干扰、杂物覆盖等,导致证件轮廓与图像背景边界模糊,不利于证件边界的精确提取,从而导致证件号码检测效率降低或失败。为此也出现了一些解决方案如下。With the rise of artificial intelligence, image recognition technology is gradually applied in security, military, medical, intelligent transportation and other fields, and technologies such as face recognition and fingerprint recognition are increasingly used in public security, finance, aerospace and other security fields. In the military field, image recognition is mainly used in the reconnaissance and identification of targets, through automatic image recognition technology to identify and strike enemy targets; in the medical field, various medical image analysis and diagnosis can be carried out through image recognition technology, On the one hand, it can greatly reduce the cost of medical treatment, and on the other hand, it can also help to improve the quality and efficiency of medical care; in the field of transportation, it can not only perform license plate recognition, but also be applied to the cutting-edge field of autonomous driving to achieve a clear view of roads, vehicles and pedestrians. Identify, improve the convenience of life and reduce people's travel costs. Although technologies for automatic identification or automatic extraction of document information have emerged, for complex scenes, such as document misalignment in vision, uneven illumination, external light field interference, and debris coverage, the outline of the document and the border of the image background are blurred. , which is not conducive to the accurate extraction of the document boundary, resulting in reduced or failed document number detection efficiency. Some solutions for this have also emerged as follows.
传统方法:采用边缘检测算法,应用边缘检测算子定位证件边缘,应用边缘点直线拟合确定证件边缘直线与边缘直线交点信息从而确定证件偏转角度,对证件进行旋转,再应用图像处理方法检测证件号码位置,准确检测证件边缘点是该方法的核心步骤,而边缘检测算子对图像背景复杂程度要求较高,若图像背景前景区域梯度变化小,或背景区域存在大量边缘信息情况下,将导致证件边缘点检测失败,从而无法实现证件号码的检测。Traditional method: use the edge detection algorithm, use the edge detection operator to locate the edge of the document, use the edge point line fitting to determine the information of the intersection of the document edge line and the edge straight line to determine the document deflection angle, rotate the document, and then use the image processing method to detect the document Number position, accurate detection of document edge points is the core step of this method, and the edge detection operator has high requirements on the complexity of the image background. The edge point detection of the certificate fails, so the detection of the certificate number cannot be realized.
深度学习方法:该方法在模型训练阶段应用大量标注数据对深度网络进行训练,拟合网络参数,实现OCR(Optical Character Recognition,光学字符识别)检测算法的建模,在模型预测阶段,将整张图像作为网络的输入,通过网络前向推理实现字符区域的检测。该方法为目前较为流行的字符检测方法,而对于证件号码检测任务,该方法存在如下缺陷(1)非证件区域图像也参加了网络推理过程,一方面浪费了计算资源,另一方面对于非证件区域存在做的字符存在误检测需要额外增加处理逻辑进行剔除;(2)该方案计算资源消耗较大,相比本提案训练和推理时间长;(3)因神经网络的不可解释行,该方法定位的字符区域边框存无法精确定位字符最小外接矩形框,甚至会切掉部分字符区域,即传统的证件图像光学识别(OCR)技术主要面向高清扫描的图像,该方法要求识别的图像拥有干净的背景、使用规范的印刷体并具有较高的分辨率。但是,自然场景中存在文本背景噪声大、文本分布不规范和自然光源影响等问题,OCR技术在实际自然场景中检测率并不理想,针对证件等证件识别给后面步骤的字符识别带来压力。Deep learning method: This method uses a large amount of labeled data to train the deep network in the model training stage, fits the network parameters, and realizes the modeling of the OCR (Optical Character Recognition, Optical Character Recognition) detection algorithm. The image is used as the input of the network, and the character region detection is realized through the network forward reasoning. This method is currently a popular character detection method, but for the identification number detection task, this method has the following defects: (1) The non-document area image also participates in the network reasoning process, which wastes computing resources on the one hand; False detection of characters in the region existence requires additional processing logic to be eliminated; (2) This scheme consumes more computing resources, and the training and reasoning time is longer than this proposal; (3) Due to the inexplicability of the neural network, this method The frame of the positioned character area cannot accurately locate the smallest bounding rectangle of the character, and even cuts off part of the character area. That is, the traditional optical recognition (OCR) technology of document images is mainly used for high-definition scanned images. This method requires the recognized images to have clean Background, use standard print and have high resolution. However, in natural scenes, there are problems such as large text background noise, irregular text distribution, and the influence of natural light sources. The detection rate of OCR technology in actual natural scenes is not ideal, and identification of documents such as documents brings pressure to the character recognition in the subsequent steps.
此外,虽然AI技术已经应用于各行各业,能够满足部分结合实际应用场景的需求,但随着待检测目标或待识别目标、如银行业中客户检测目标 会不定期新增或者删除,出现检测目标增加时往往需要完成样本的采集、标注、模型训练、部署等工作,优化过程周期长、效率低下。In addition, although AI technology has been applied to all walks of life and can meet the needs of some practical application scenarios, as the targets to be detected or to be identified, such as customer detection targets in the banking industry, will be added or deleted from time to time. When the target increases, it is often necessary to complete sample collection, labeling, model training, deployment, etc. The optimization process has a long cycle and low efficiency.
基于以上情况,证件(包含身份证、银行卡、工作证等)的智能检测及证件类别新增的检测中,不能根据实际应用场景的变化、检测目标的增减做出快速响应。即检测目标的增减变化、实际应用场景的多样化给现代证件识别提出了更高的要求。Based on the above situation, intelligent detection of certificates (including ID cards, bank cards, work permits, etc.) and new detection of certificate categories cannot respond quickly to changes in actual application scenarios and increase or decrease of detection targets. That is, the increase or decrease of detection targets and the diversification of practical application scenarios put forward higher requirements for modern document recognition.
发明内容SUMMARY OF THE INVENTION
为了克服现有技术的不足,本发明的目的在于提供一种证件增减类别检测方法、装置、可读存储介质和终端,其能解决上述问题。In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a method, device, readable storage medium and terminal for detecting the increase or decrease of a certificate, which can solve the above problems.
设计原理:首先存储器内存储多种类别证件的标准图片作为注册图片;其次,检测待检测证件,获得输入图片,对待输入图片经过图像处理;最后处理的图片与注册图片对比,通过相似度确定检测图片的所属类别,以此对新增证件进行快速高效筛选确定类别。Design principle: first, standard pictures of various types of documents are stored in the memory as registered pictures; secondly, the documents to be tested are detected, the input pictures are obtained, and the input pictures are processed by image processing; the final processed pictures are compared with the registered pictures, and the similarity is determined by the detection. The category of the picture, so that the new certificate can be quickly and efficiently screened to determine the category.
技术方案:本发明的目的采用以下技术方案实现。Technical solution: The purpose of the present invention is achieved by the following technical solutions.
一种证件增减类别检测方法,方法包括:第一步,证件初检,对于通过图像采集单元输入的图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步且粗糙的证件区域掩膜;第二步,标准化,对第一步获得的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸,输出矫正证件图片;第三步,图像比对,将第二步输出的矫正证件图片与注册图片比对,判定输入图片所属类别并输出。A method for detecting the increase or decrease of documents, the method comprising: the first step, initial document inspection, using a deep learning model to search for a corresponding potential document area for a picture input through an image acquisition unit, to obtain a preliminary and rough document area mask; The second step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , transform it into the preset ID photo size, and output the corrected ID image; the third step, image comparison, compare the corrected ID image output in the second step with the registered image, determine the category of the input image and output it.
优选的,第一步的证件初检包括以下步骤:S11提取特征,输入图片后,将图片缩放为适合分割网络的输入图片大小,再用Unet网络模型对于输入数据提取深度特征,得到特征图;S12计算概率,对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,得到属于证件区域的概率分布图;S13阈值截断,根据预先设定的阈值将概率分布图进行二值化,将大于阈值的概率设置为1,小于阈值的概率设置为0,获得0-1掩膜图;S14粗分割掩膜,将0-1掩膜图上采样至与原始输入同样大小的尺寸,得到一张初步的证件粗分割掩膜图;S15合法区域筛选,统计粗分割掩膜图中每个孤立的证件区域面积a,如果a≤μ-3σ,则认为该区域a为非法区域,从粗分割掩膜中剔除,以此通过合法区域筛选将部分错误区域进行过滤。其中,证件区域面积值分布服从正态分布,a≤μ-3σ出现的概率小于0.5%,当a出现a≤μ-3σ时则判断a值为异常值。μ代表证件区域面积分布的期望值;σ代表证件区域面积分布的标准差。Preferably, the initial inspection of the certificate in the first step includes the following steps: S11 extracting features, after inputting the picture, scaling the picture to a size suitable for the input picture of the segmentation network, and then using the Unet network model to extract depth features from the input data to obtain a feature map; S12 calculates the probability, performs two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area; S13 threshold value truncation, according to the preset The threshold of the probability distribution map is binarized, the probability greater than the threshold is set to 1, the probability less than the threshold is set to 0, and the 0-1 mask map is obtained; S14 rough segmentation mask, the 0-1 mask map is Sampling to the same size as the original input to get a preliminary rough segmentation mask image of the document; S15 legal area screening, count the area a of each isolated document area in the rough segmentation mask image, if a≤μ-3σ, Then, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to filter some erroneous areas through legal area screening. Among them, the distribution of the area value of the document area obeys the normal distribution, and the probability of a≤μ-3σ is less than 0.5%. μ represents the expected value of the area distribution of the document area; σ represents the standard deviation of the area distribution of the document area.
优选的,在第二步标准化中,对第一步经筛选后的掩膜图中的合法区域进行精细化掩膜修正,包括以下步骤:S21提取区域轮廓特征,轮廓特征是一张二值掩膜图,整体是一条闭合的不规则曲线,二值掩膜图不改变证件照矩形凸集的性质;S22求取轮廓凸包,在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘平滑;S23直线拟合,使用霍夫变换对凸包的多个线段组成的不规则凸多边形进行直线拟合,以对凸包进行描述;S24求取顶点,对直线拟合中的所有合法直线读取两两求取交点,以此寻找证件照四个顶点的分布范围,并且在 求取顶点的过程中,对于两条直线平行的情况不做考虑;S25顶点合法筛选,设置筛选条件对于顶点进行合法性检查,筛选条件中设置了容忍值tol,横坐标[0-tol,width+tol],纵坐标[0-tol,height+tol]定义为合法顶点坐标,其中width,height代表原始图像的宽度和高度,若某顶点的坐标超出了原始图像尺寸而没有超过tol,则将该顶点坐标纠正到原始图像边缘处,即:
Figure PCTCN2020140736-appb-000001
其中,min(x crosspoint,width)中x crosspoint最大值不能超过原始图片width,max(min(x crosspoint,width),0)最小值不能小于0;同理,min(y crosspoint,height)将y crosspoint最大值不能超过原始图片height,max(min(y corsspoint,height),0)最小值不能小于0。
Preferably, in the second step of standardization, fine-grained mask correction is performed on the legal area in the mask image after the first step of screening, including the following steps: S21 extracting regional contour features, and the contour features are a binary mask. The film map is a closed irregular curve as a whole, and the binary mask map does not change the properties of the rectangular convex set of the document photo; Partially segment the missing area to fill, and at the same time smooth the contour edge; S23 line fitting, use Hough transform to perform line fitting on the irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull; S24 Find Take the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, do not do anything if the two straight lines are parallel. Consider; S25 vertices are validly screened, set the screening conditions to check the validity of the vertices, the tolerance value tol is set in the screening conditions, the abscissa [0-tol, width+tol], the ordinate [0-tol, height+tol] is defined are legal vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is:
Figure PCTCN2020140736-appb-000001
Among them, the maximum value of x crosspoint in min(x crosspoint , width) cannot exceed the width of the original image, and the minimum value of max(min(x crosspoint , width), 0) cannot be less than 0; similarly, min(y crosspoint , height) equals y The maximum value of the crosspoint cannot exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
S26顶点聚类,对比标准银行卡存在四个顶点,根据已求得的所有合法顶点,通过无监督聚类算法K-means将所有顶点聚为四类,其中每一类的质心即为某一个顶点的坐标,共得到四个顶点坐标;S27顶点排序,为方便后续操作,通过以下步骤确定四个顶点的排序:1)根据四个顶点坐标求取中心点坐标;2)以中心点建立极坐标系,并构造从中心点指向各顶点的向量,依次求出各向量与极轴的夹角;3)按照夹角的大小由大到小的顺序对四个顶点进行排序;4)寻找证件区域的左上角点,并从左上角点开始,按照“左上-右上-右下-左下”的顺序进行排列;S28区域填充,在找到并按顺序排列顶点坐标之后,将四个顶点构成的四边形区域进行二值填充,形成一个二进制掩膜;S29仿射变换输出矫正图片,对重新确定四个顶点的证件区域,根据预先设定的目标证件照大小对证件区域进行仿射变换,I output=WI input,其中,W为证件区域与目标证件大小之间的仿射变换矩阵;以此,对每一个证件区域都进行相应的修正操作,并将修正后得到的 证件图片作为矫正图片输出并保存到指定的文件路径处。 S26 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertices, a total of four vertex coordinates are obtained; S27 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps: 1) According to the coordinates of the four vertices, obtain the coordinates of the center point; 2) Use the center point to establish a pole Coordinate system, and construct the vector pointing from the center point to each vertex, and find the angle between each vector and the polar axis in turn; 3) Sort the four vertices according to the size of the angle from large to small; 4) Find the certificate The upper left corner of the area, and starting from the upper left point, arranged in the order of "upper left - upper right - lower right - lower left"; S28 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral formed by the four vertices The area is filled with binary values to form a binary mask; S29 affine transformation outputs the corrected picture, and for the document area where the four vertices are re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output = WI input , where W is the affine transformation matrix between the document area and the size of the target document; in this way, a corresponding correction operation is performed on each document area, and the corrected document picture is output and saved as a corrected picture to the specified file path.
优选的,在步骤S23中,通过霍夫变换对凸包进行直线拟合的最小检测直线长度设置为100,直线之间最大间隔设置为20。Preferably, in step S23, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
优选的,在步骤S25中,容忍值tol设为50。Preferably, in step S25, the tolerance value tol is set to 50.
优选的,在步骤S26中,K-means的具体算法为:1)随机选取4个聚类质心点μ 0、μ 1、μ 2、μ 3;2)对于每一个顶点坐标(x i,y i),通过计算与每个聚类质心的欧氏距离,找到最小距离的质心点作为其对应的质心点并标注为对应类别j:argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3;3)重新计算4个质心的坐标;4)重复2)和3)过程直到收敛。 Preferably, in step S26, the specific algorithm of K-means is: 1) randomly select four cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ; 2) For each vertex coordinate ( xi , y i ), by calculating the Euclidean distance from the centroid of each cluster, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin j ||( xi ,y i )-μ j | | 2 ,j=0,1,2,3; 3) Recalculate the coordinates of the 4 centroids; 4) Repeat 2) and 3) until convergence.
||(x i,y i)-μ j|| 2,j=0,1,2,3:计算质心点j与类别j所有顶点之间欧几里得范数;argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3:调整质心点,使得四个质心点的欧几里得范数和最小。 ||(x i ,y i )-μ j || 2 ,j=0,1,2,3: Calculate the Euclidean norm between the centroid point j and all vertices of category j; argmin j ||(x i , y i )-μ j || 2 , j=0,1,2,3: Adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest.
优选的,在步骤S27的步骤4)中,左上的坐标点坐标值之和最小,并以最小坐标值之和的顶点为左上顶点,并以此为起点重新排列坐标顺序,以确定四个顶点的顺序。Preferably, in step 4) of step S27, the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex with the smallest sum of coordinate values is the upper left vertex, and the coordinate order is rearranged from this as the starting point to determine the four vertices. Order.
优选的,第三步的图像对比包括以下步骤:Preferably, the image comparison in the third step includes the following steps:
S31图片二值化,将注册图片A和待分类图片B进行二值化,其对应向量为x 1x 2x 3……x n和y 1y 2y 3……y nS31 image binarization, binarizing the registered image A and the image to be classified B, the corresponding vectors are x 1 x 2 x 3 ...... x n and y 1 y 2 y 3 ...... y n ;
S32计算向量夹角余弦值,待分类图片B的向量与注册图片A的向量的向量夹角余弦值为:
Figure PCTCN2020140736-appb-000002
S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:
Figure PCTCN2020140736-appb-000002
S33相似度判定,夹角的余弦越小两张图片越不相关:当夹角的余弦 值接近于1时,两张图片相似;当两张图片向量夹角余弦等于1时,两张图片相同;其中,最相关或相同的注册图片A判定为待分类图片B、即输入图片的所属类别并输出。S33 Similarity determination, the smaller the cosine of the included angle, the more irrelevant the two images are: when the cosine value of the included angle is close to 1, the two images are similar; when the cosine of the included angle between the two image vectors is equal to 1, the two images are the same ; Among them, the most relevant or identical registered picture A is determined as the picture B to be classified, that is, the category to which the input picture belongs, and is output.
一种证件检测装置,装置包括电讯连接的获取输入单元、图像处理单元、图像对比分类单元和证件类别输出单元;获取输入单元,通过摄像组件获取待检测证件的检测图片及标准的注册图片;图像处理单元,通过处理器中的深度学习算法对输入图片进行处理,依次获得初步的粗糙的证件区域掩膜、精细化修正掩膜、仿射矫正变换后的矫正图像;图像对比分类单元,通过处理器中的比对算法将矫正图像与存储器存储的注册图片对比分类;证件类别输出单元,处理器将输入图片对比分选后的所属类别结果在显示器上显示并存储至存储器。A certificate detection device, the device comprises an acquisition input unit, an image processing unit, an image comparison classification unit and a certificate category output unit connected by telecommunication; the acquisition input unit obtains the detection picture of the certificate to be detected and the standard registration picture through a camera assembly; The processing unit processes the input image through the deep learning algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the refined correction mask, and the corrected image after affine correction transformation; the image comparison and classification unit, through the processing The comparison algorithm in the device compares and classifies the corrected image with the registered picture stored in the memory; in the document category output unit, the processor displays the result of the category of the input picture after comparison and sorting on the display and stores it in the memory.
一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行前述方法的步骤。A computer-readable storage medium having computer instructions stored thereon that, when executed, perform the steps of the aforementioned method.
一种终端,包括存储器和处理器,所述存储器上储存有注册图片和能够在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行前述方法的步骤。A terminal includes a memory and a processor, the memory stores a registered picture and computer instructions that can be executed on the processor, and the processor executes the steps of the aforementioned method when the processor executes the computer instructions.
相比现有技术,本发明的有益效果在于:通过体前存储多种证件的标准图片,提供准确的对比对象,提高了对比和筛选准确性;采用的对比算法简单高效且准确,提高了比对筛选效率;通过本发明可以实现快速响应应用场景下的检测目标变化,提高了证件识别的应用范围,在安保、金融等领域可得到广泛应用。Compared with the prior art, the present invention has the beneficial effects that: by storing standard pictures of various certificates in front of the body, accurate comparison objects are provided, and the comparison and screening accuracy are improved; the comparison algorithm adopted is simple, efficient and accurate, which improves the comparison rate. For screening efficiency; the invention can realize rapid response to detection target changes in application scenarios, improve the application scope of certificate identification, and can be widely used in security, finance and other fields.
附图说明Description of drawings
图1为本发明证件增减类别检测方法的流程图;Fig. 1 is the flow chart of the detection method of certificate increase and decrease category of the present invention;
图2为证件初检的方法流程图;Fig. 2 is the method flow chart of the initial inspection of the certificate;
图3为证件图像标准化的流程图;Fig. 3 is the flow chart of document image standardization;
图4为图像比对的相似度对比示例图。FIG. 4 is an example diagram of similarity comparison of image comparison.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.
第一实施例first embodiment
一种证件增减类别检测方法,参见图1,方法包括以下步骤。A method for detecting an increase or decrease category of a certificate, see Figure 1, the method includes the following steps.
第一步,证件初检,对于通过图像采集单元输入的图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步且粗糙的证件区域掩膜。The first step is the initial inspection of the document. For the picture input through the image acquisition unit, the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained.
第二步,标准化,对第一步获得的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸,输出矫正证件图片。The second step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
第三步,图像比对,将第二步输出的矫正证件图片与注册图片比对,判定输入图片所属类别并输出。The third step, image comparison, compares the corrected certificate image output in the second step with the registered image, determines the category of the input image and outputs it.
进一步的,证件新增类别检测过程分三个阶段,前两个阶段是对图 片分割由粗到细的分割优化模型(two-stage and coarse-to-fine refinement segmentation)。如图1所示,在第一阶段,我们对于输入图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步的、较为粗糙的证件区域掩膜;在第二阶段,利用传统的图像处理技术,对第一阶段的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸。第三阶段是证件照和注册图片比对,输出输入图片所属类别。Further, the new category detection process of the certificate is divided into three stages. The first two stages are the segmentation optimization model (two-stage and coarse-to-fine refinement segmentation) for image segmentation. As shown in Figure 1, in the first stage, we use the deep learning model to find the corresponding potential document area for the input image, and obtain a preliminary, relatively rough document area mask; in the second stage, use the traditional image processing technology , refine and correct the rough mask in the first stage to obtain a high-quality document area mask, use the mask to extract the document area in the original image, perform affine correction transformation on the obtained document photo, and convert it into a pre- Set the ID photo size. The third stage is to compare the ID photo and the registered image, and output the category to which the input image belongs.
第一阶段的检测、即第一步的证件初检中,寻找证件区域这一目标主要由提取特征、计算概率、阈值截断这几个子操作完成,最终得到一个初步的粗分割掩膜。如图2所示,在用户输入图片后,将其缩放为适合分割网络的输入图片大小,之后采用经典的Unet网络模型对于输入数据提取深度特征;接着对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,至此,得到了一张属于证件区域的概率分布图;接下来根据预先设定的阈值将这张概率分布图进行二值化操作,将大于阈值的概率设置为1,小于阈值的概率设置为0,然后我们将这张0-1掩膜图上采样至与原始输入同样大小的尺寸。至此第一阶段操作完毕,我们得到一张初步的证件分割掩膜图。具体步骤如下。In the first stage of detection, that is, the first step of the initial inspection of the document, the goal of finding the document area is mainly completed by several sub-operations of extracting features, calculating probability, and threshold truncation, and finally obtains a preliminary rough segmentation mask. As shown in Figure 2, after the user inputs the picture, it is scaled to the input picture size suitable for the segmentation network, and then the classic Unet network model is used to extract the depth features of the input data; The two-class judgment is to obtain the probability value that the feature of each position belongs to the certificate area. So far, a probability distribution map belonging to the certificate area is obtained; then, the probability distribution map is binarized according to the preset threshold. operation, set the probability of being greater than the threshold to 1 and the probability of being less than the threshold to 0, then we upsample this 0-1 mask to the same size as the original input. At this point, the first stage operation is completed, and we get a preliminary document segmentation mask map. Specific steps are as follows.
S11提取特征,输入图片后,将图片缩放为适合分割网络的输入图片大小,再用Unet网络模型对于输入数据提取深度特征,得到特征图。S11 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map.
S12计算概率,对于特征图中的每个位置的特征进行二分类判断, 求得每个位置的特征属于证件区域的概率值,得到属于证件区域的概率分布图。S12 calculates the probability, performs two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains a probability distribution map belonging to the certificate area.
S13阈值截断,根据预先设定的阈值将概率分布图进行二值化,将大于阈值的概率设置为1,小于阈值的概率设置为0,获得0-1掩膜图。S13 Threshold truncation, binarize the probability distribution map according to a preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map.
S14粗分割掩膜,将0-1掩膜图上采样至与原始输入同样大小的尺寸,得到一张初步的证件粗分割掩膜图。S14 rough segmentation mask, upsample the 0-1 mask image to the same size as the original input, and obtain a preliminary document rough segmentation mask image.
S15合法区域筛选,统计粗分割掩膜图中每个孤立的证件区域面积a,如果a≤μ-3σ,则认为该区域a为非法区域,从粗分割掩膜中剔除,以此通过合法区域筛选将部分错误区域进行过滤。S15 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to pass the legal area Filtering will filter some error areas.
其中,证件区域面积值分布服从正态分布,a≤μ-3σ出现的概率小于0.5%,当a出现a≤μ-3σ时则判断a值为异常值。μ代表证件区域面积分布的期望值;σ代表证件区域面积分布的标准差。Among them, the distribution of the area value of the document area obeys the normal distribution, and the probability of a≤μ-3σ is less than 0.5%. μ represents the expected value of the area distribution of the document area; σ represents the standard deviation of the area distribution of the document area.
Unet网络模型,属于分割网络,Unet借鉴了FCN网络,其网络结构包括两个对称部分:前面一部分网络与普通卷积网络相同,使用了3x3的卷积和池化下采样,能够抓住图像中的上下文信息(也即像素间的关系);后面部分网络则是与前面基本对称,使用的是3x3卷积和上采样,以达到输出图像分割的目的。此外,网络中还用到了特征融合,将前面部分下采样网络的特征与后面上采样部分的特征进行了融合以获得更准确的上下文信息,达到更好的分割效果。且,Unet使用了加权的softmax损失函数,对于每一个像素点都有自己的权重,这使得网络更加重视边缘像素的学习。采用这种模型更适应于证件边缘非直线的微小凹凸变化。The Unet network model belongs to the segmentation network. Unet draws on the FCN network. Its network structure includes two symmetrical parts: the first part of the network is the same as the ordinary convolutional network, using 3x3 convolution and pooling downsampling, which can capture the image in the image The context information (that is, the relationship between pixels); the latter part of the network is basically symmetrical with the former, using 3x3 convolution and upsampling to achieve the purpose of output image segmentation. In addition, feature fusion is also used in the network, and the features of the previous part of the downsampling network are fused with the features of the latter part of the upsampling part to obtain more accurate context information and achieve a better segmentation effect. Moreover, Unet uses a weighted softmax loss function, which has its own weight for each pixel, which makes the network pay more attention to the learning of edge pixels. Using this model is more suitable for the slight uneven change of the edge of the document which is not straight.
在第一阶段的基础上,进行第二阶段的精细化掩膜修正(refinement)。 如图3所示,对于第一阶段得到的掩膜图中的所有合法区域,都要逐一进行修正处理。在第二步标准化中,对于每一个合法证件区域,即对第一步经筛选后的掩膜图中的合法区域进行精细化掩膜修正,参见图3,包括以下步骤。On the basis of the first stage, the refinement mask refinement of the second stage is performed. As shown in Figure 3, all legal regions in the mask map obtained in the first stage must be corrected one by one. In the second step of standardization, for each legal document area, that is, a refined mask correction is performed on the legal area in the mask map after the screening in the first step, see FIG. 3 , including the following steps.
S21提取区域轮廓特征,轮廓特征是一张二值掩膜图,整体是一条闭合的不规则曲线,二值掩膜图不改变证件照矩形凸集的性质。S21 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the ID photo.
在进行接下来的操作时,首先引入一条性质以保证以下操作的合法性。When performing the following operations, first introduce a property to ensure the legality of the following operations.
性质定义:凸集经过仿射变换作用后仍为凸集。证件照的良好性质之一在于其为规则矩形形状,是一种标准的凸集集合,无论该凸集在采集阶段经过怎样的仿射变换,均不能改变其凸集的性质。Property definition: Convex sets are still convex sets after affine transformation. One of the good properties of ID photo is that it is a regular rectangular shape, which is a standard convex set. No matter what affine transformation the convex set undergoes in the collection stage, the properties of the convex set cannot be changed.
S22求取轮廓凸包,在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘平滑。S22 Obtain the convex hull of the contour, obtain the minimum convex hull of the contour on the basis of the original contour, fill in the missing area of the partial segmentation, and make the contour edge smooth at the same time.
由于上一步的轮廓提取完全依赖于分割模型的结果,在某些不平滑的边缘处凹凸不平,这与证件照的性质不吻合。故在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘更加平滑。Since the contour extraction in the previous step completely relies on the results of the segmentation model, some uneven edges are uneven, which is inconsistent with the nature of the ID photo. Therefore, the minimum convex hull of the contour is obtained on the basis of the original contour, and the missing area of the partial segmentation is filled, and the contour edge is smoother at the same time.
S23直线拟合,使用霍夫变换对凸包的多个线段组成的不规则凸多边形进行直线拟合,以对凸包进行描述。具体实施例中,在步骤S23中,通过霍夫变换对凸包进行直线拟合的最小检测直线长度设置为100,直线之间最大间隔设置为20。S23 line fitting, using Hough transform to perform line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull. In a specific embodiment, in step S23, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
其中,霍夫变换是一种特征检测(feature extraction),被广泛应用在图像分析(image analysis)、计算机视觉(computer vision)以及数位 影像处理(digital image processing),霍夫变换是用来辨别找出物件中的特征,例如:线条。本方案即用其来精确地解析定义的证件边缘直线。Among them, Hough transform is a feature extraction, which is widely used in image analysis, computer vision and digital image processing. Extract features in objects, such as lines. This scheme uses it to accurately parse the defined document edge line.
S24求取顶点,对直线拟合中的所有合法直线读取两两求取交点,以此寻找证件照四个顶点的分布范围,具体的,S23中所有检测得到的合法直线,均可以得到直线的解析式表达。针对所有的合法直线,读其两两求取交点,这一步操作旨在于寻找证件照四个顶点的分布范围。并且在求取顶点的过程中,对于两条直线平行的情况不做考虑。S24 finds the vertices, reads all the legal straight lines in the straight line fitting to find the intersection points in pairs, so as to find the distribution range of the four vertices of the certificate photo. Specifically, all the legal straight lines detected in S23 can be straight lines. analytic expression. For all legal straight lines, read them to find the intersection points. This step is to find the distribution range of the four vertices of the ID photo. And in the process of finding the vertices, the case where the two lines are parallel is not considered.
S25顶点合法筛选,在所有得到的顶点中,并非所有顶点都是合法的,因此,设置了筛选条件对于顶点进行合法性检查,为后续步骤提高了准确率和处理速度。具体的,设置筛选条件对于顶点进行合法性检查,筛选条件中设置了容忍值tol,横坐标[0-tol,width+tol],纵坐标[0-tol,height+tol]定义为合法顶点坐标,其中width,height代表原始图像的宽度和高度,具体实施例中,容忍值tol设为50。且,若某顶点的坐标超出了原始图像尺寸而没有超过tol,则将该顶点坐标纠正到原始图像边缘处,即:
Figure PCTCN2020140736-appb-000003
S25 Vertex legal screening, in all the obtained vertices, not all vertices are legal, therefore, setting filtering conditions to check the legality of vertices, which improves the accuracy and processing speed for the subsequent steps. Specifically, a filter condition is set to check the legitimacy of the vertex. The tolerance value tol is set in the filter condition, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal vertex coordinates , where width and height represent the width and height of the original image. In a specific embodiment, the tolerance value tol is set to 50. And, if the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is:
Figure PCTCN2020140736-appb-000003
其中,min(x crosspoint,width)将x crosspoint最大值不能超过原始图片width,max(min(x crosspoint,width),0)最小值不能小于0; Among them, min(x crosspoint , width) will make the maximum value of x crosspoint not exceed the width of the original image, and the minimum value of max(min(x crosspoint , width), 0) cannot be less than 0;
同理,min(y crosspoint,height)将y crosspoint最大值不能超过原始图片height,max(min(y corsspoint,height),0)最小值不能小于0。 In the same way, min(y crosspoint , height) will make the maximum value of y crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
S26顶点聚类,对比标准银行卡存在四个顶点,根据已求得的所有合法顶点,通过无监督聚类算法K-means将所有顶点聚为四类,其中每一类 的质心即为某一个顶点的坐标,共得到四个顶点坐标。S26 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained.
其中,K-means的具体算法为:Among them, the specific algorithm of K-means is:
1)随机选取4个聚类质心点μ 0、μ 1、μ 2、μ 31) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;
2)对于每一个顶点坐标(x i,y i),通过计算与每个聚类质心的欧氏距离,找到最小距离的质心点作为其对应的质心点并标注为对应类别j 2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance from each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j
:argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3; : argmin j ||(x i ,y i )-μ j || 2 ,j=0,1,2,3;
||(x i,y i)-μ j|| 2,j=0,1,2,3为计算质心点j与类别j所有顶点之间欧几里得范数;argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3为调整质心点,使得四个质心点的欧几里得范数和最小。 ||(x i ,y i )-μ j || 2 ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all vertices of category j; argmin j ||(x i , y i )-μ j || 2 , j=0,1,2,3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest.
3)重新计算4个质心的坐标;3) Recalculate the coordinates of the 4 centroids;
4)重复2)和3)过程直到收敛。4) Repeat 2) and 3) process until convergence.
其中,K-means是最常用的基于欧式距离的聚类算法,它是数值的、非监督的、非确定的、迭代的,该算法旨在最小化一个目标函数——误差平方函数(所有的观测点与其中心点的距离之和),其认为两个目标的距离越近,相似度越大,由于具有出色的速度和良好的可扩展性,Kmeans聚类算法算得上是最著名的聚类方法。Among them, K-means is the most commonly used clustering algorithm based on Euclidean distance, which is numerical, unsupervised, non-deterministic, and iterative, and the algorithm aims to minimize an objective function - the squared error function (all The sum of the distance between the observation point and its center point), it believes that the closer the distance between the two targets, the greater the similarity. Due to its excellent speed and good scalability, the Kmeans clustering algorithm can be regarded as the most famous clustering algorithm method.
S27顶点排序,为方便后续操作,通过以下步骤确定四个顶点的排序:S27 Vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps:
1)根据四个顶点坐标求取中心点坐标;1) Obtain the coordinates of the center point according to the coordinates of the four vertices;
2)以中心点建立极坐标系,并构造从中心点指向各顶点的向量,依次求出各向量与极轴的夹角;2) Establish a polar coordinate system with the center point, and construct a vector pointing from the center point to each vertex, and obtain the angle between each vector and the polar axis in turn;
3)按照夹角的大小由大到小的顺序对四个顶点进行排序;3) Sort the four vertices according to the size of the included angle from large to small;
4)寻找证件区域的左上角点,并从左上角点开始,按照“左上-右上-右下-左下”的顺序进行排列。4) Find the upper left corner of the document area, and start from the upper left corner, and arrange them in the order of "upper left - upper right - lower right - lower left".
其中,在步骤S27的步骤4)中,左上的坐标点坐标值之和最小,并以最小坐标值之和的顶点为左上顶点,并以此为起点重新排列坐标顺序,以确定四个顶点的顺序。Wherein, in step 4) of step S27, the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex of the sum of the smallest coordinate value is the upper left vertex, and the coordinate order is rearranged with this as the starting point to determine the four vertexes. order.
S28区域填充,在找到并按顺序排列顶点坐标之后,将四个顶点构成的四边形区域进行二值填充,形成一个二进制掩膜。S28 area filling, after finding and arranging the vertex coordinates in sequence, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask.
S29仿射变换输出矫正图片,对重新确定四个顶点的证件区域,根据预先设定的目标证件照大小对证件区域进行仿射变换,I output=WI input,其中,W为证件区域与目标证件大小之间的仿射变换矩阵;以此,对每一个证件区域都进行相应的修正操作,并将修正后得到的证件图片作为矫正图片输出并保存到指定的文件路径处。 S29 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
图像对比,第三步的图像对比包括以下步骤。Image comparison, the third step of image comparison includes the following steps.
S31图片二值化,将注册图片A和待分类图片B进行二值化,其对应向量为x 1x 2x 3……x n和y 1y 2y 3……y nS31 image binarization, binarizing the registered image A and the image to be classified B, the corresponding vectors are x 1 x 2 x 3 ...... x n and y 1 y 2 y 3 ...... y n ;
S32计算向量夹角余弦值,待分类图片B的向量与注册图片A的向量的向量夹角余弦值为:
Figure PCTCN2020140736-appb-000004
S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:
Figure PCTCN2020140736-appb-000004
S33相似度判定,夹角的余弦越小两张图片越不相关:参见图4,当夹角的余弦值接近于1时,两张图片相似;当两张图片向量夹角余弦等于1时,两张图片相同;其中,最相关或相同的注册图片A判定为待分类图片B、即输入图片的所属类别并输出。S33 Similarity determination, the smaller the cosine of the included angle, the more irrelevant the two images are: see Figure 4, when the cosine value of the included angle is close to 1, the two images are similar; when the cosine of the included angle between the two image vectors is equal to 1, the The two pictures are the same; among them, the most relevant or identical registered picture A is determined as the picture to be classified B, that is, the category of the input picture, and output.
本发明中采集的图像,是通过摄像头采集的图像,可以是一张静态图像(即:单独采集的图像),也可以是一张视频中图像(即从采集的视频中按照预设标准或随机选取的一张图像),均可用于本发明证件的图像源,本发明实施例对于图像的来源、性质、大小等等所有属性均无限制。The image collected in the present invention is an image collected by a camera, which can be a static image (that is, an image collected separately), or an image in a video (that is, an image from a collected video according to a preset standard or random A selected image) can be used as the image source of the document of the present invention, and the embodiment of the present invention has no restrictions on all attributes such as the source, nature, size, and the like of the image.
本领域技术人员基于本公开实施例的记载可以知悉,除了神经网络外,在本公开实施例还可以利用例如但不限于:基于图像处理的字符检测算法(例如,基于直方图粗分割和奇异值特征的字符/号码检测算法,基于二进小波变换的字符/号码检测算法,等等),对采集图像进行字符检测。另外,除了神经网络外,在本公开实施例也可以利用例如但不限于:基于图像处理的证件检测算法(例如,边缘检测法,数学形态学法,基于纹理分析的定位方法,行检测和边缘统计法,遗传算法,霍夫(Hough)变换和轮廓线法,基于小波变换的方法,等等),等等,对采集图像进行证件检测。Those skilled in the art may know based on the description of the embodiments of the present disclosure that, in addition to the neural network, in the embodiments of the present disclosure, for example but not limited to: character detection algorithms based on image processing (for example, based on histogram rough segmentation and singular value Character/number detection algorithm based on feature, character/number detection algorithm based on binary wavelet transform, etc.), to perform character detection on the collected image. In addition, in addition to neural networks, embodiments of the present disclosure may also utilize, for example, but not limited to, image processing-based document detection algorithms (eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection). Statistical method, genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.), etc., are used for document detection on the captured image.
本公开实施例中,通过神经网络对采集图像进行边缘检测时,可以预先利用样本图像对神经网络进行训练,使得训练好的神经网络能够实现对图像中边缘直线的有效检测。In the embodiment of the present disclosure, when edge detection is performed on the collected image by using the neural network, the neural network can be trained by using the sample image in advance, so that the trained neural network can effectively detect the edge straight lines in the image.
第二实施例Second Embodiment
本发明还提供了一种证件检测装置,装置包括电讯连接的获取输入单元、图像处理单元、图像对比分类单元和证件类别输出单元。The invention also provides a certificate detection device, which includes an acquisition input unit, an image processing unit, an image comparison and classification unit, and a certificate type output unit connected by telecommunication.
其中,获取输入单元,通过摄像组件获取待检测证件的检测图片及标准的注册图片;获取单元利用硬件设备,包括但不限于手机,IPAD,普通摄像头,CCD工业相机、扫描仪等,对证件正面进行图像信息采集,注意采 集到的图像应完全的包含证件的四条边界,并且倾斜不超过正负20°,且人眼能分辨证件号码和边缘直线。Among them, the acquisition input unit obtains the detection picture and standard registration picture of the certificate to be detected through the camera component; the acquisition unit uses hardware equipment, including but not limited to mobile phones, IPADs, ordinary cameras, CCD industrial cameras, scanners, etc., to detect the front of the certificate. When collecting image information, note that the collected image should completely include the four borders of the document, and the inclination should not exceed plus or minus 20°, and the human eye can distinguish the document number and the edge straight line.
图像处理单元,通过处理器中的深度学习算法对输入图片进行处理,依次获得初步的粗糙的证件区域掩膜、精细化修正掩膜、仿射矫正变换后的矫正图像。具体的,通过处理器中的比对算法将矫正图像与存储器存储的注册图片对比分类。利用存储在存储器中的算法、程序等,通过处理器对获得的图像进行相应的处理和数据提取。The image processing unit processes the input image through the deep learning algorithm in the processor, and sequentially obtains a preliminary rough document area mask, a refined correction mask, and a corrected image after affine correction transformation. Specifically, the corrected image is compared and classified with the registered image stored in the memory through a comparison algorithm in the processor. Using algorithms, programs, etc. stored in the memory, corresponding processing and data extraction are performed on the obtained images by the processor.
证件类别输出单元,处理器将输入图片对比分选后的所属类别结果在显示器上显示并存储至存储器。其中,显示器包括但不限于平板电脑、计算机、手机等的显示屏,将处理器提取的证件对比分类显示。In the certificate category output unit, the processor displays the result of the category of the input picture after comparison and sorting on the display and stores it in the memory. Among them, the display includes but is not limited to the display screen of a tablet computer, computer, mobile phone, etc., which compares and classifies the certificates extracted by the processor.
第三实施例Third Embodiment
本发明还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行前述方法的步骤。其中,所述的证件检测方法请参见前述部分的详细介绍,此处不再赘述。The present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed. Wherein, for the certificate detection method, please refer to the detailed introduction in the foregoing section, and details are not repeated here.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于计算机可读存储介质中,计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读 光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the computer-readable medium includes a permanent Persistent and non-permanent, removable and non-removable media can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
第四实施例Fourth Embodiment
本发明还提供了一种终端,包括存储器和处理器,所述存储器上储存有能够在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行前述方法的步骤。其中,所述的证件号码检测方法请参见前述部分的详细介绍,此处不再赘述。The present invention also provides a terminal, comprising a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, and the processor executes the steps of the foregoing method when the processor executes the computer instructions. Wherein, for the method for detecting the certificate number, please refer to the detailed introduction in the foregoing part, and details are not repeated here.
上述方案解决了在复杂背景情况下,证件轮廓与图像背景边界模糊,不利于证件新增类别或项目的准确归类问题。The above solution solves the problem that the outline of the certificate and the border of the image background are blurred in the case of complex background, which is not conducive to the accurate classification of new categories or items of the certificate.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or inherent to such a process, method, article of manufacture or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or device that includes the element.
本领域技术人员应明白,本申请的实施例可提供为方法、装置、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as methods, apparatuses, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

  1. 一种证件增减类别检测方法,其特征在于,方法包括以下步骤:A method for detecting the increase or decrease of certificates, characterized in that the method comprises the following steps:
    第一步,证件初检,对于通过图像采集单元输入的图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步且粗糙的证件区域掩膜;The first step is the initial inspection of the document, using the deep learning model to find the corresponding potential document area for the picture input through the image acquisition unit, and obtain a preliminary and rough document area mask;
    第二步,标准化,对第一步获得的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸,输出矫正证件图片;The second step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID image;
    第三步,图像比对,将第二步输出的矫正证件图片与注册图片比对,判定输入图片所属类别并输出。The third step, image comparison, compares the corrected certificate image output in the second step with the registered image, determines the category of the input image and outputs it.
  2. 根据权利要求1所述的方法,其特征在于,第一步的证件初检包括以下步骤:The method according to claim 1, wherein the initial inspection of the certificate in the first step comprises the following steps:
    S11提取特征,输入图片后,将图片缩放为适合分割网络的输入图片大小,再用Unet网络模型对于输入数据提取深度特征,得到特征图;S11 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;
    S12计算概率,对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,得到属于证件区域的概率分布图;S12 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains the probability distribution map belonging to the certificate area;
    S13阈值截断,根据预先设定的阈值将概率分布图进行二值化,将大于阈值的概率设置为1,小于阈值的概率设置为0,获得0-1掩膜图;S13 Threshold truncation, binarize the probability distribution map according to the preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map;
    S14粗分割掩膜,将0-1掩膜图上采样至与原始输入同样大小的尺 寸,得到一张初步的证件粗分割掩膜图;S14 rough segmentation mask, the 0-1 mask image is upsampled to the same size as the original input, and a preliminary document rough segmentation mask image is obtained;
    S15合法区域筛选,统计粗分割掩膜图中每个孤立的证件区域面积a,如果a≤μ-3σ,则认为该区域a为非法区域,从粗分割掩膜中剔除,以此通过合法区域筛选将部分错误区域进行过滤。S15 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, consider the area a as an illegal area, and remove it from the rough segmentation mask, so as to pass the legal area Filtering will filter some error areas.
    证件区域面积值分布服从正态分布,a≤μ-3σ出现的概率小于0.5%,当a出现a≤μ-3σ时则判断a值为异常值。The area value distribution of the document area obeys the normal distribution, and the probability of a≤μ-3σ is less than 0.5%.
    μ代表证件区域面积分布的期望值μ represents the expected value of the area distribution of the document area
    σ代表证件区域面积分布的标准差。σ represents the standard deviation of the area distribution of the document area.
  3. 根据权利要求1所述的方法,其特征在于,在第二步标准化中,对第一步经筛选后的掩膜图中的合法区域进行精细化掩膜修正,包括以下步骤:The method according to claim 1, characterized in that, in the second step of standardization, refining the mask correction is performed on the legal area in the mask image after the first step of screening, comprising the following steps:
    S21提取区域轮廓特征,轮廓特征是一张二值掩膜图,整体是一条闭合的不规则曲线,二值掩膜图不改变证件照矩形凸集的性质;S21 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the ID photo;
    S22求取轮廓凸包,在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘平滑;S22 to obtain the convex hull of the contour, obtain the minimum convex hull of the contour on the basis of the original contour, fill in the missing area of the partial segmentation, and make the contour edge smooth;
    S23直线拟合,使用霍夫变换对凸包的多个线段组成的不规则凸多边形进行直线拟合,以对凸包进行描述;S23 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull;
    S24求取顶点,对直线拟合中的所有合法直线读取两两求取交点,以此寻找证件照四个顶点的分布范围,并且在求取顶点的过程中,对于两条直线平行的情况不做考虑;S24 Find the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, for the case where the two straight lines are parallel do not consider;
    S25顶点合法筛选,设置筛选条件对于顶点进行合法性检查,筛选条件 中设置了容忍值tol,横坐标[0-tol,width+tol],纵坐标[0-tol,height+tol]定义为合法顶点坐标,其中width,height代表原始图像的宽度和高度,若某顶点的坐标超出了原始图像尺寸而没有超过tol,则将该顶点坐标纠正到原始图像边缘处,即:
    Figure PCTCN2020140736-appb-100001
    S25 Vertex legal screening, set the filtering conditions to check the legality of the vertices, the tolerance value tol is set in the filtering conditions, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal Vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is:
    Figure PCTCN2020140736-appb-100001
    min(x crosspoint,width)中x crosspoint最大值不超过原始图片width,max(min(x crosspoint,width),0)中最小值不能小于0; The maximum value of x crosspoint in min(x crosspoint ,width) does not exceed the original image width, and the minimum value in max(min(x crosspoint ,width),0) cannot be less than 0;
    min(y crosspoint,height)中y crosspoint最大值不超过原始图片height,max(min(y corsspoint,height),0)中最小值不能小于0; The maximum value of y crosspoint in min(y crosspoint ,height) does not exceed the height of the original image, and the minimum value in max(min(y corsspoint , height),0) cannot be less than 0;
    S26顶点聚类,对比标准银行卡存在四个顶点,根据已求得的所有合法顶点,通过无监督聚类算法K-means将所有顶点聚为四类,其中每一类的质心即为某一个顶点的坐标,共得到四个顶点坐标;S26 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained;
    S27顶点排序,为方便后续操作,通过以下步骤确定四个顶点的排序:1)根据四个顶点坐标求取中心点坐标;2)以中心点建立极坐标系,并构造从中心点指向各顶点的向量,依次求出各向量与极轴的夹角;3)按照夹角的大小由大到小的顺序对四个顶点进行排序;4)寻找证件区域的左上角点,并从左上角点开始,按照“左上-右上-右下-左下”的顺序进行排列;S27 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined through the following steps: 1) Obtain the coordinates of the center point according to the coordinates of the four vertices; 2) Establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and start from the upper left corner At the beginning, arrange them in the order of "upper left - upper right - lower right - lower left";
    S28区域填充,在找到并按顺序排列顶点坐标之后,将四个顶点构成的四边形区域进行二值填充,形成一个二进制掩膜;S28 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask;
    S29仿射变换输出矫正图片,对重新确定四个顶点的证件区域,根据预先设定的目标证件照大小对证件区域进行仿射变换,I output=WI input,其中,W为证件区域与目标证件大小之间的仿射变换矩阵;以此,对每一个 证件区域都进行相应的修正操作,并将修正后得到的证件图片作为矫正图片输出并保存到指定的文件路径处。 S29 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
  4. 根据权利要求3所述的方法,其特征在于:在步骤S23中,通过霍夫变换对凸包进行直线拟合的最小检测直线长度设置为100,直线之间最大间隔设置为20。The method according to claim 3, characterized in that: in step S23, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
  5. 根据权利要求3所述的方法,其特征在于:在步骤S26中,K-means的具体算法为:The method according to claim 3, characterized in that: in step S26, the specific algorithm of K-means is:
    1)随机选取4个聚类质心点μ 0、μ 1、μ 2、μ 31) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;
    2)对于每一个顶点坐标(x i,y i),通过计算与每个聚类质心的欧氏距离,找到最小距离的质心点作为其对应的质心点并标注为对应类别j:argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3;;其中,||(x i,y i)-μ j|| 2,j=0,1,2,3为计算质心点j与类别j所有顶点之间欧几里得范数;argmin j||(x i,y i)-μj||2,j=0,1,2,3为调整质心点,使得四个质心点的欧几里得范数和最小; 2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance from the centroid of each cluster, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin j | |( xi ,y i )-μ j || 2 ,j=0,1,2,3;;wherein, ||( xi ,y i )-μ j || 2 ,j=0,1, 2,3 is to calculate the Euclidean norm between the centroid point j and all vertices of category j; argmin j ||(x i ,y i )-μj||2,j=0,1,2,3 is the adjustment The centroid point, so that the Euclidean norm sum of the four centroid points is the smallest;
    3)重新计算4个质心的坐标;3) Recalculate the coordinates of the 4 centroids;
    4)重复2)和3)过程直到收敛。4) Repeat 2) and 3) process until convergence.
  6. 根据权利要求3所述的方法,其特征在于:在步骤S27的步骤4)中,左上的坐标点坐标值之和最小,并以最小坐标值之和的顶点为左上顶点,并以此为起点重新排列坐标顺序,以确定四个顶点的顺序。The method according to claim 3, is characterized in that: in step 4) of step S27, the coordinate value sum of the upper left coordinate point is the smallest, and the vertex of the smallest coordinate value sum is the upper left vertex, and this is the starting point Rearrange the coordinate order to determine the order of the four vertices.
  7. 根据权利要求1所述的方法,其特征在于:第三步的图像对比包括以下步骤:method according to claim 1, is characterized in that: the image contrast of the 3rd step comprises the following steps:
    S31图片二值化,将注册图片A和待分类图片B进行二值化,其对应 向量为x 1 x 2 x 3......x n和y 1 y 2 y 3......y nS31 image binarization, binarize the registered image A and the image B to be classified, and the corresponding vectors are x 1 x 2 x 3 ...... x n and y 1 y 2 y 3 ..... .y n ;
    S32计算向量夹角余弦值,待分类图片B的向量与注册图片A的向量的向量夹角余弦值为:
    Figure PCTCN2020140736-appb-100002
    S32 calculates the cosine value of the included angle between the vectors, and the cosine value of the included angle between the vector of the image B to be classified and the vector of the registered image A is:
    Figure PCTCN2020140736-appb-100002
    S33相似度判定,夹角的余弦越小两张图片越不相关:当夹角的余弦值接近于1时,两张图片相似;当两张图片向量夹角余弦等于1时,两张图片相同;其中,最相关或相同的注册图片A判定为待分类图片B、即输入图片的所属类别并输出。S33 Similarity determination, the smaller the cosine of the included angle, the more irrelevant the two images are: when the cosine value of the included angle is close to 1, the two images are similar; when the cosine of the included angle between the two image vectors is equal to 1, the two images are the same ; Among them, the most relevant or identical registered picture A is determined as the picture B to be classified, that is, the category to which the input picture belongs, and is output.
  8. 一种证件检测装置,其特征在于:所述装置包括电讯连接的获取输入单元、图像处理单元、图像对比分类单元和证件类别输出单元;获取输入单元,通过摄像组件获取待检测证件的检测图片及标准的注册图片;图像处理单元,通过处理器中的深度学习算法对输入图片进行处理,依次获得初步的粗糙的证件区域掩膜、精细化修正掩膜、仿射矫正变换后的矫正图像;图像对比分类单元,通过处理器中的比对算法将矫正图像与存储器存储的注册图片对比分类;证件类别输出单元,处理器将输入图片对比分选后的所属类别结果在显示器上显示并存储至存储器。A certificate detection device, characterized in that: the device comprises an acquisition input unit, an image processing unit, an image comparison and classification unit, and a certificate category output unit connected by telecommunication; the acquisition input unit obtains the detection picture and the certificate type output unit of the certificate to be detected through a camera assembly Standard registration picture; the image processing unit processes the input picture through the deep learning algorithm in the processor, and sequentially obtains a preliminary rough document area mask, a refined correction mask, and a corrected image after affine correction transformation; image The comparison and classification unit, through the comparison algorithm in the processor, compares and classifies the corrected image and the registered picture stored in the memory; the document category output unit, the processor displays the result of the category of the input picture after the comparison and sorting on the display and stores it in the memory .
  9. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于:所述计算机指令运行时执行前述方法的步骤。A computer-readable storage medium on which computer instructions are stored, characterized in that: when the computer instructions are executed, the steps of the foregoing method are executed.
  10. 一种终端,包括存储器和处理器,其特征在于:所述存储器上储存有注册图片和能够在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行前述方法的步骤。A terminal, comprising a memory and a processor, characterized in that: the memory stores a registered image and computer instructions that can be run on the processor, and the processor executes the steps of the foregoing method when the processor runs the computer instructions .
PCT/CN2020/140736 2020-12-10 2020-12-29 Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal WO2022121025A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011455630.X 2020-12-10
CN202011455630.XA CN112686248B (en) 2020-12-10 2020-12-10 Certificate increase and decrease type detection method and device, readable storage medium and terminal

Publications (1)

Publication Number Publication Date
WO2022121025A1 true WO2022121025A1 (en) 2022-06-16

Family

ID=75449040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140736 WO2022121025A1 (en) 2020-12-10 2020-12-29 Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal

Country Status (2)

Country Link
CN (1) CN112686248B (en)
WO (1) WO2022121025A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588145A (en) * 2022-12-12 2023-01-10 深圳联和智慧科技有限公司 Unmanned aerial vehicle-based river channel garbage floating identification method and system
CN117274833A (en) * 2023-11-20 2023-12-22 浙江国遥地理信息技术有限公司 Building contour processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542275A (en) * 2011-12-15 2012-07-04 广州商景网络科技有限公司 Automatic identification method for identification photo background and system thereof
US20120281077A1 (en) * 2009-11-10 2012-11-08 Icar Vision Systems S L Method and system for reading and validating identity documents
CN109815976A (en) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 A kind of certificate information recognition methods, device and equipment
CN111079571A (en) * 2019-11-29 2020-04-28 杭州数梦工场科技有限公司 Identification card information identification and edge detection model training method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11176363B2 (en) * 2017-09-29 2021-11-16 AO Kaspersky Lab System and method of training a classifier for determining the category of a document
CN109359647A (en) * 2018-10-16 2019-02-19 翟红鹰 Identify the method, equipment and computer readable storage medium of a variety of certificates
CN111242124B (en) * 2020-01-13 2023-10-31 支付宝实验室(新加坡)有限公司 Certificate classification method, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120281077A1 (en) * 2009-11-10 2012-11-08 Icar Vision Systems S L Method and system for reading and validating identity documents
CN102542275A (en) * 2011-12-15 2012-07-04 广州商景网络科技有限公司 Automatic identification method for identification photo background and system thereof
CN109815976A (en) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 A kind of certificate information recognition methods, device and equipment
CN111079571A (en) * 2019-11-29 2020-04-28 杭州数梦工场科技有限公司 Identification card information identification and edge detection model training method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588145A (en) * 2022-12-12 2023-01-10 深圳联和智慧科技有限公司 Unmanned aerial vehicle-based river channel garbage floating identification method and system
CN115588145B (en) * 2022-12-12 2023-03-21 深圳联和智慧科技有限公司 Unmanned aerial vehicle-based river channel garbage floating identification method and system
CN117274833A (en) * 2023-11-20 2023-12-22 浙江国遥地理信息技术有限公司 Building contour processing method, device, equipment and storage medium
CN117274833B (en) * 2023-11-20 2024-02-27 浙江国遥地理信息技术有限公司 Building contour processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112686248A (en) 2021-04-20
CN112686248B (en) 2022-07-22

Similar Documents

Publication Publication Date Title
WO2022121039A1 (en) Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal
US20200364443A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN109086714B (en) Form recognition method, recognition system and computer device
Gou et al. Vehicle license plate recognition based on extremal regions and restricted Boltzmann machines
Silva et al. A flexible approach for automatic license plate recognition in unconstrained scenarios
Zang et al. Vehicle license plate recognition using visual attention model and deep learning
CN101142584B (en) Method for facial features detection
CN111310662B (en) Flame detection and identification method and system based on integrated deep network
Nandi et al. Traffic sign detection based on color segmentation of obscure image candidates: a comprehensive study
CN110569756A (en) face recognition model construction method, recognition method, device and storage medium
Zhang et al. Road recognition from remote sensing imagery using incremental learning
JP2014531097A (en) Text detection using multi-layer connected components with histograms
WO2022121025A1 (en) Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal
Wazalwar et al. A design flow for robust license plate localization and recognition in complex scenes
Sun et al. A visual attention based approach to text extraction
Gawande et al. SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection
Mei et al. A novel framework for container code-character recognition based on deep learning and template matching
WO2022121021A1 (en) Identity card number detection method and apparatus, and readable storage medium and terminal
CN111695373A (en) Zebra crossing positioning method, system, medium and device
Dhar et al. Bangladeshi license plate recognition using adaboost classifier
Cai et al. Feature detection and matching with linear adjustment and adaptive thresholding
Viet et al. A robust end-to-end information extraction system for Vietnamese identity cards
Ning Vehicle license plate detection and recognition
Sathya et al. Vehicle license plate recognition (vlpr)
Ismail et al. Detection and recognition via adaptive binarization and fuzzy clustering

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964954

Country of ref document: EP

Kind code of ref document: A1