WO2022121039A1 - Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal - Google Patents

Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal Download PDF

Info

Publication number
WO2022121039A1
WO2022121039A1 PCT/CN2020/141443 CN2020141443W WO2022121039A1 WO 2022121039 A1 WO2022121039 A1 WO 2022121039A1 CN 2020141443 W CN2020141443 W CN 2020141443W WO 2022121039 A1 WO2022121039 A1 WO 2022121039A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
document
image
mask
vertices
Prior art date
Application number
PCT/CN2020/141443
Other languages
French (fr)
Chinese (zh)
Inventor
王晓亮
陈建良
田丰
王丹丹
吴昌宇
Original Assignee
广州广电运通金融电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州广电运通金融电子股份有限公司 filed Critical 广州广电运通金融电子股份有限公司
Publication of WO2022121039A1 publication Critical patent/WO2022121039A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/162Quantising the image signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18067Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the technical field of information detection or intelligent vision, in particular to a bank card tilt correction detection method, device, readable storage medium and terminal.
  • image recognition technology is gradually applied in security, military, medical, intelligent transportation and other fields, and technologies such as face recognition and fingerprint recognition are increasingly used in public security, finance, aerospace and other security fields.
  • image recognition is mainly used in the reconnaissance and identification of targets, through automatic image recognition technology to identify and strike enemy targets; in the medical field, various medical image analysis and diagnosis can be carried out through image recognition technology, On the one hand, it can greatly reduce the cost of medical treatment, and on the other hand, it can also help to improve the quality and efficiency of medical care; in the field of transportation, it can not only perform license plate recognition, but also be applied to the cutting-edge field of autonomous driving to achieve a clear view of roads, vehicles and pedestrians.
  • Deep learning method This method uses a large amount of labeled data to train the deep network in the model training stage, fits the network parameters, and realizes the modeling of the OCR (Optical Character Recognition, Optical Character Recognition) detection algorithm.
  • OCR Optical Character Recognition, Optical Character Recognition
  • the image is used as the input of the network, and the character region detection is realized through the network forward reasoning.
  • This method is currently a popular character detection method, but for the identification number detection task, this method has the following defects: (1) The non-document area image also participates in the network reasoning process, which wastes computing resources on the one hand; False detection of characters in the region existence requires additional processing logic to be eliminated; (2) This scheme consumes more computing resources, and the training and reasoning time is longer than this proposal; (3) Due to the inexplicability of the neural network, this method The frame of the positioned character area cannot accurately locate the smallest bounding rectangle of the character, and even cuts off part of the character area. That is, the traditional optical recognition (OCR) technology of document images is mainly used for high-definition scanned images. This method requires the recognized images to have clean Background, use standard print and have high resolution. However, in natural scenes, there are problems such as large text background noise, irregular text distribution, and the influence of natural light sources. The detection rate of OCR technology in actual natural scenes is not ideal, and identification of documents such as documents brings pressure to the character recognition in the subsequent steps.
  • OCR optical recognition
  • the purpose of the present invention is to provide a bank card tilt correction detection method, device, readable storage medium and terminal, which can solve the above problems.
  • BTC Bankcard Tilt Correction
  • a method for detecting bank card tilt correction under complex background comprises the following steps:
  • the first step, model training label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model;
  • the second step uses the deep learning model to find the corresponding potential document area for the picture input through the image acquisition unit, and obtains a preliminary and rough document area mask;
  • the third step standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
  • the first step of model training includes the following steps:
  • S11 Determine the certificate area, and find the certificate area in the picture of the original data through manual annotation
  • S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.
  • step S14 the input picture and the corresponding annotation file have the same size; and the json file is converted into a corresponding 0-1 binary mask before training, wherein the area with a pixel of 1 represents the certificate area, The area with pixel 0 represents the background area.
  • the initial certificate inspection in the second step includes the following steps:
  • S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;
  • S22 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area;
  • S24 rough segmentation mask upsample the 0-1 mask image to the same size as the original input image, and obtain a preliminary document rough segmentation mask image
  • S25 Legal area screening count the area a of each isolated document area in the rough segmentation mask, if a ⁇ -3 ⁇ , the area a is considered to be an illegal area, and it is removed from the rough segmentation mask to pass the legal area Filtering will filter some error areas.
  • fine-grained mask correction is performed on the legal regions in the mask image filtered in the first step, including the following steps:
  • the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the document photo;
  • S32 obtains the convex hull of the contour, obtains the minimum convex hull of the contour on the basis of the original contour, fills in the missing area of the partial segmentation, and at the same time smoothes the edge of the contour;
  • the sorting of the four vertices is determined through the following steps: 1) Obtain the coordinates of the center point according to the coordinates of the four vertices; 2) Establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and use the minimum coordinate value
  • the vertex of the sum is the upper left vertex, and the coordinate order is rearranged with the upper left vertex as the starting point, and arranged in the order of "upper left - upper right - lower right - lower left";
  • step S33 the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
  • step S36 the specific algorithm of K-means is:
  • the invention also provides a certificate detection device, the device includes an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication; wherein, the acquisition input unit acquires the detection picture of the certificate to be detected and the information output unit through the camera assembly.
  • the standard registration picture; the image processing unit processes the input picture through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, and the deducted original document.
  • the image area and the corrected image after affine transformation correction; the information extraction unit, the category and information of the corrected image are corrected by the information extraction algorithm in the processor; the information output unit, the processor extracts the category and information results of the input picture on the display Display and store to memory.
  • the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.
  • the present invention also provides a terminal, including a memory and a processor, the memory stores a registered picture and a computer instruction that can be run on the processor, and the processor executes the method of the foregoing method when the processor runs the computer instruction. step.
  • the beneficial effect of the present invention is that: by combining the deep learning technology and the traditional image processing method with the bank card tilt correction technology (Bankcard Tilt Correction, BTC) of the present application, the advantages of the two are fully integrated, and the advantages of the two are fully integrated.
  • BTC Bankcard Tilt Correction
  • Fig. 1 is the flow chart of the bank card tilt correction detection method under the complex background of the present invention
  • Figure 2 is a schematic diagram of model training
  • FIG. 3 is a simplified flow chart of the BTC testing phase
  • Fig. 4 is the method flow chart of the initial inspection of the certificate
  • Figure 5 is a flow chart of document image standardization.
  • FIG. 1-Fig. 5 A method for detecting bank card tilt correction under complex background is shown in Fig. 1-Fig. 5. The method includes the following steps.
  • the first step, model training label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model.
  • the second step is the initial inspection of the document.
  • the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained.
  • the third step standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
  • BTC relies on the powerful feature extraction ability of deep learning, so it needs to train related models before it is officially used.
  • a batch of raw data to be trained first find the area of documents such as bank cards in the picture by manual annotation. Specifically, for each document in the picture, the four vertices of the document are marked, and the coordinate positions of the vertices are saved as a json file. Next, according to the generated annotation file, the area size s of each document area is counted, which is designed to serve the subsequent testing phase. It is verified by an example that the size of the document photo area in the original data conforms to the Gaussian distribution, namely: s ⁇ N( ⁇ , ⁇ 2 ).
  • the mean ⁇ and standard deviation ⁇ of the Gaussian distribution are calculated.
  • the segmentation model is trained using the raw data and the generated annotation files. It is worth noting that in the specific training, it is necessary to keep the input image and the corresponding annotation file with the same size. Therefore, it is also necessary to convert the marked json file into a corresponding 0-1 binary mask map, in which the area with a pixel of 1 represents the document area, and the area with a pixel of 0 represents the background area.
  • model training steps of the first step are as follows.
  • S11 determines the certificate area, and finds the certificate area in the picture of the original data through manual annotation.
  • S12 Vertex labeling generates labels, labels the four vertices of the document in the document area, and saves the coordinate positions of the vertices in the form of json files to generate labels.
  • JSON JavaScript Object Notation
  • JSON is a lightweight data interchange format. Easy for humans to read and write. It is also easy to parse and generate by machine. It is based on JavaScript Programming Language, a subset of Standard ECMA-262 3rd Edition-December 1999. JSON is a sequence of tokens, consisting of six constructed characters, strings, numbers, and three literal names. Because of this, the coordinate annotation applied to this scheme can be well matched.
  • S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.
  • BTC is a two-stage, coarse-to-fine segmentation optimization model (two-stage and coarse-to-fine refinement segmentation).
  • the deep learning model is used to find the corresponding potential document area for the input picture, and a preliminary and relatively rough document area mask is obtained;
  • the second stage using traditional image processing technology,
  • the rough mask of the first stage is refined and corrected to obtain a high-quality document area mask, and the document photo is extracted from the original image by using the mask.
  • Set the ID photo size is a two-stage, coarse-to-fine segmentation optimization model (two-stage and coarse-to-fine refinement segmentation).
  • the first stage is the initial inspection of documents.
  • the goal of finding the document area is mainly completed by several sub-operations of extracting features, calculating probability, and threshold truncation, and finally obtains a preliminary rough segmentation mask.
  • the user inputs the picture, it is scaled to the input picture size suitable for the segmentation network, and then the classical Unet network model is used to extract the depth features of the input data;
  • the two-class judgment is to obtain the probability value that the feature of each position belongs to the certificate area. So far, a probability distribution map belonging to the certificate area is obtained; then, the probability distribution map is binarized according to the preset threshold.
  • S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map.
  • S22 calculates the probability, performs binary classification judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains a probability distribution map belonging to the certificate area.
  • the 0-1 mask image is upsampled to the same size as the original input image, and a preliminary document rough segmentation mask image is obtained.
  • S25 Legal area screening count the area a of each isolated document area in the rough segmentation mask, if a ⁇ -3 ⁇ , the area a is considered to be an illegal area, and it is removed from the rough segmentation mask to pass the legal area Filtering will filter some error areas.
  • the Unet network model belongs to the segmentation network.
  • Unet draws on the FCN network, and its network structure includes two symmetrical parts: the first part of the network is the same as the ordinary convolutional network, using 3x3 convolution and pooling downsampling, which can capture The context information in the image (that is, the relationship between pixels); the latter part of the network is basically symmetrical with the former, using 3x3 convolution and upsampling to achieve the purpose of output image segmentation.
  • feature fusion is also used in the network, and the features of the previous part of the downsampling network are fused with the features of the latter part of the upsampling part to obtain more accurate context information and achieve a better segmentation effect.
  • Unet uses a weighted softmax loss function, which has its own weight for each pixel, which makes the network pay more attention to the learning of edge pixels. Using this model is more suitable for the slight uneven change of the edge of the document which is not straight.
  • the second stage is standardization. On the basis of the first stage, the refinement mask refinement of the second stage is performed. As shown in Figure 5, all legal regions in the mask map obtained in the first stage must be corrected one by one. In the second step of standardization, for each legal document area, that is, the refined mask correction is performed on the legal area in the mask image after the screening in the first step, see FIG. 5 , including the following steps.
  • the contour feature is a binary mask image
  • the whole is a closed irregular curve
  • the binary mask image does not change the properties of the rectangular convex set of the ID photo.
  • Convex sets are still convex sets after affine transformation.
  • One of the good properties of ID photo is that it is a regular rectangular shape, which is a standard convex set. No matter what affine transformation the convex set undergoes in the collection stage, the properties of the convex set cannot be changed.
  • S32 obtains the convex hull of the contour, obtains the minimum convex hull of the contour on the basis of the original contour, fills in the missing area of the partial segmentation, and smoothes the edge of the contour at the same time.
  • the minimum convex hull of the contour is obtained on the basis of the original contour, and the missing area of the partial segmentation is filled, and the contour edge is smoother at the same time.
  • step S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull.
  • the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
  • Hough transform is a feature extraction, which is widely used in image analysis, computer vision and digital image processing. Extract features in objects, such as lines. This scheme uses it to accurately parse the defined document edge line.
  • S34 finds the vertices, reads all the legal straight lines in the straight line fitting to find the intersection points in pairs, so as to find the distribution range of the four vertices of the certificate photo.
  • all the legal straight lines detected in S33 can be straight lines. analytic expression. For all legal straight lines, read them to find the intersection points. This step is to find the distribution range of the four vertices of the ID photo. And in the process of finding the vertices, the case where the two lines are parallel is not considered.
  • a filter condition is set to check the legitimacy of the vertex.
  • the tolerance value tol is set in the filter condition, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal vertex coordinates , where width and height represent the width and height of the original image.
  • the tolerance value tol is set to 50.
  • min(x crosspoint , width) will make the maximum value of x crosspoint not exceed the width of the original image, and the minimum value of max(min(x crosspoint , width), 0) cannot be less than 0;
  • min(y crosspoint , height) will make the maximum value of y crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
  • S36 vertex clustering compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained.
  • K-means is the most commonly used clustering algorithm based on Euclidean distance, which is numerical, unsupervised, non-deterministic, and iterative, and the algorithm aims to minimize an objective function - the squared error function (all The sum of the distance between the observation point and its center point), it believes that the closer the distance between the two targets, the greater the similarity. Due to its excellent speed and good scalability, the Kmeans clustering algorithm can be regarded as the most famous clustering algorithm method.
  • step 4) of step S37 the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex of the sum of the smallest coordinate value is the upper left vertex, and the coordinate order is rearranged from this as the starting point to determine the four vertexes. order.
  • the invention also provides a certificate detection device, which includes an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication.
  • the input unit and obtain the inspection picture and standard registration picture of the certificate to be inspected through the camera component; the obtaining unit uses hardware equipment, including but not limited to mobile phones, IPAD, ordinary cameras, CCD industrial cameras, scanners, etc., to image the front of the certificate
  • hardware equipment including but not limited to mobile phones, IPAD, ordinary cameras, CCD industrial cameras, scanners, etc., to image the front of the certificate
  • the collected image should completely include the four borders of the document, and the inclination should not exceed plus or minus 20°, and the human eye can distinguish the document number and the edge straight line.
  • the image processing unit processes the input image through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, the deducted original image area and the affine Transform the rectified image after rectification.
  • the collected image is an image collected by a camera, which can be a static image (that is, an image collected separately), or an image in a video (that is, an image selected from the collected video according to preset standards or randomly
  • An image of the present invention can be used as the image source of the document of the present invention, and the embodiment of the present invention has no restrictions on all attributes such as the source, nature, size, etc. of the image.
  • the information extraction unit will correct the category and information of the image through the information extraction algorithm in the processor.
  • the processor displays the category and information result extracted from the input picture on the display and stores it in the memory.
  • the display includes but is not limited to the display screen of a tablet computer, computer, mobile phone, etc., which compares and classifies the certificates extracted by the processor.
  • embodiments of the present disclosure may also utilize, for example, but not limited to, image processing-based document detection algorithms (eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection). Statistical method, genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.), to perform document detection on the collected image.
  • image processing-based document detection algorithms eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection.
  • Statistical method genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.
  • the neural network when edge detection is performed on the collected image through the neural network, the neural network can be trained by using the sample image in advance, so that the trained neural network can effectively detect the edge straight lines in the image.
  • the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.
  • a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • the present invention also provides a terminal, including a memory and a processor, the memory stores a registered picture and a computer instruction that can be run on the processor, and the processor executes the method of the foregoing method when the processor runs the computer instruction. step.
  • a terminal including a memory and a processor
  • the memory stores a registered picture and a computer instruction that can be run on the processor
  • the processor executes the method of the foregoing method when the processor runs the computer instruction. step.
  • the embodiments of the present application may be provided as methods, apparatuses, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Abstract

A bankcard tilt correction (BTC)-based detection method and apparatus, a readable storage medium, and a terminal. By using BTC technology in combination with deep learning technology and a conventional image processing method, the advantages of the two are fully integrated, and for a wide variety of user input images having complex scenes, high-accuracy and high-robustness certificate segmentation and correction results can be obtained, thereby providing a foundation for subsequent certificate detection, classification, and information extraction, and improving the application range of certificate recognition; the present invention can be widely applied in the fields of security, finance, and the like.

Description

银行卡倾斜矫正检测方法、装置、可读存储介质和终端Bank card tilt correction detection method, device, readable storage medium and terminal 技术领域technical field
本发明涉及信息检测或智能视觉技术领域,具体涉及一种银行卡倾斜矫正检测方法、装置、可读存储介质和终端。The invention relates to the technical field of information detection or intelligent vision, in particular to a bank card tilt correction detection method, device, readable storage medium and terminal.
背景技术Background technique
对于证件图像识别,在安防、金融、企事业信息管理领域需要快速高效识别身份信息。早期的份证的信息大多需要人工录入,效率十分低下,而且长时间的识别过程也会使人眼疲劳,所以人工录入已经不适应于当今计算机等领域飞速发展的现状。For document image recognition, it is necessary to quickly and efficiently identify identity information in the fields of security, finance, and enterprise information management. In the early days, most of the information of ID cards required manual input, which was very inefficient, and the long-term identification process would also make people's eyes tired.
随着人工智能的兴起,图像识别技术逐步应用于安全、军事、医疗、智能交通等领域,人脸识别和指纹识别等技术越来越多的使用到公共安全、金融和航空航天等安全领域。在军事领域,图像识别主要应用于目标的侦查和识别,通过自动化的图像识别技术来进行敌方目标的识别并进行打击;在医疗领域,通过图像识别技术可以进行各类医学图像分析和诊断,一方面可以大大降低医疗的成本,另一方面也有助于提高医疗质量和效率;在交通领域不仅可以进行车牌识别,同时也可以应用到前沿的自动驾驶领域,实现对道路、车辆和行人的清晰识别,提高生活的便利并且降低人们出行成本。虽然已出现了自动识别或自动提取证件信息的技术,然而对于复杂场景,如证件在视觉内未对准、光照不均、外光场干扰、杂物覆盖等,导致证件轮廓与图像背景边界模糊,不利于证件边界的精确提取,从而导致证件号码检测效率降低或失败。为此也出现了一些解决方案如下。With the rise of artificial intelligence, image recognition technology is gradually applied in security, military, medical, intelligent transportation and other fields, and technologies such as face recognition and fingerprint recognition are increasingly used in public security, finance, aerospace and other security fields. In the military field, image recognition is mainly used in the reconnaissance and identification of targets, through automatic image recognition technology to identify and strike enemy targets; in the medical field, various medical image analysis and diagnosis can be carried out through image recognition technology, On the one hand, it can greatly reduce the cost of medical treatment, and on the other hand, it can also help to improve the quality and efficiency of medical care; in the field of transportation, it can not only perform license plate recognition, but also be applied to the cutting-edge field of autonomous driving to achieve a clear view of roads, vehicles and pedestrians. Identify, improve the convenience of life and reduce people's travel costs. Although technologies for automatic identification or automatic extraction of document information have emerged, for complex scenes, such as document misalignment in vision, uneven illumination, external light field interference, and debris coverage, the outline of the document and the border of the image background are blurred. , which is not conducive to the accurate extraction of the document boundary, resulting in reduced or failed document number detection efficiency. Some solutions for this have also emerged as follows.
传统方法:采用边缘检测算法,应用边缘检测算子定位证件边缘,应 用边缘点直线拟合确定证件边缘直线与边缘直线交点信息从而确定证件偏转角度,对证件进行旋转,再应用图像处理方法检测证件号码位置,准确检测证件边缘点是该方法的核心步骤,而边缘检测算子对图像背景复杂程度要求较高,若图像背景前景区域梯度变化小,或背景区域存在大量边缘信息情况下,将导致证件边缘点检测失败,从而无法实现证件号码的检测。Traditional method: use the edge detection algorithm, use the edge detection operator to locate the edge of the document, use the edge point line fitting to determine the information of the intersection of the document edge line and the edge straight line to determine the document deflection angle, rotate the document, and then use the image processing method to detect the document Number position, accurate detection of document edge points is the core step of this method, and the edge detection operator has high requirements on the complexity of the image background. The edge point detection of the certificate fails, so the detection of the certificate number cannot be realized.
深度学习方法:该方法在模型训练阶段应用大量标注数据对深度网络进行训练,拟合网络参数,实现OCR(Optical Character Recognition,光学字符识别)检测算法的建模,在模型预测阶段,将整张图像作为网络的输入,通过网络前向推理实现字符区域的检测。该方法为目前较为流行的字符检测方法,而对于证件号码检测任务,该方法存在如下缺陷(1)非证件区域图像也参加了网络推理过程,一方面浪费了计算资源,另一方面对于非证件区域存在做的字符存在误检测需要额外增加处理逻辑进行剔除;(2)该方案计算资源消耗较大,相比本提案训练和推理时间长;(3)因神经网络的不可解释行,该方法定位的字符区域边框存无法精确定位字符最小外接矩形框,甚至会切掉部分字符区域,即传统的证件图像光学识别(OCR)技术主要面向高清扫描的图像,该方法要求识别的图像拥有干净的背景、使用规范的印刷体并具有较高的分辨率。但是,自然场景中存在文本背景噪声大、文本分布不规范和自然光源影响等问题,OCR技术在实际自然场景中检测率并不理想,针对证件等证件识别给后面步骤的字符识别带来压力。Deep learning method: This method uses a large amount of labeled data to train the deep network in the model training stage, fits the network parameters, and realizes the modeling of the OCR (Optical Character Recognition, Optical Character Recognition) detection algorithm. The image is used as the input of the network, and the character region detection is realized through the network forward reasoning. This method is currently a popular character detection method, but for the identification number detection task, this method has the following defects: (1) The non-document area image also participates in the network reasoning process, which wastes computing resources on the one hand; False detection of characters in the region existence requires additional processing logic to be eliminated; (2) This scheme consumes more computing resources, and the training and reasoning time is longer than this proposal; (3) Due to the inexplicability of the neural network, this method The frame of the positioned character area cannot accurately locate the smallest bounding rectangle of the character, and even cuts off part of the character area. That is, the traditional optical recognition (OCR) technology of document images is mainly used for high-definition scanned images. This method requires the recognized images to have clean Background, use standard print and have high resolution. However, in natural scenes, there are problems such as large text background noise, irregular text distribution, and the influence of natural light sources. The detection rate of OCR technology in actual natural scenes is not ideal, and identification of documents such as documents brings pressure to the character recognition in the subsequent steps.
此外,虽然AI技术已经应用于各行各业,利用智能终端设备辅助银行卡等证件拍摄技术已经非常成熟和普及,能够满足部分结合实际应用场景的需求,然而,以金融领域的银行卡检测识别场景,在拍照过程中存在着大量操作不当导致银行卡发生形变,致使识别精度下降且效率降低的情况。In addition, although AI technology has been applied to all walks of life, the use of intelligent terminal equipment to assist bank cards and other document shooting technologies has become very mature and popular, and can meet the needs of some practical application scenarios. However, the bank card detection and identification scenarios in the financial field , In the process of taking pictures, there are a lot of cases that the bank card is deformed due to improper operations, resulting in a decrease in the recognition accuracy and efficiency.
基于以上情况,银行卡(再如身份证、工作证等)的智能检测中,不能根据实际应用场景的变化和复杂程度做出快速准确高效的响应,即实际应用场景的多样化和复杂化给现代证件、如银行卡的检测识别提出了更高的要求。Based on the above situation, in the intelligent detection of bank cards (such as ID cards, work permits, etc.), it is impossible to respond quickly, accurately and efficiently according to the changes and complexity of actual application scenarios, that is, the diversification and complexity of practical application scenarios give modern The detection and identification of documents, such as bank cards, put forward higher requirements.
发明内容SUMMARY OF THE INVENTION
为了克服现有技术的不足,本发明的目的在于提供一种银行卡倾斜矫正检测方法、装置、可读存储介质和终端,其能解决上述问题。In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a bank card tilt correction detection method, device, readable storage medium and terminal, which can solve the above problems.
设计原理:提出银行卡倾斜矫正技术(Bankcard Tilt Correction,BTC),BTC结合深度学习技术和传统的图像处理方法,将二者的优点充分融合,针对种类繁多、场景复杂的用户输入图像,可以得到高准确率、高鲁棒性的证件分割和矫正结果。Design principle: The Bankcard Tilt Correction (BTC) technology is proposed. BTC combines deep learning technology and traditional image processing methods to fully integrate the advantages of the two. For a wide variety of user input images with complex scenes, you can get High accuracy and robustness of document segmentation and correction results.
一种复杂背景下银行卡倾斜矫正检测方法,方法包括以下步骤:A method for detecting bank card tilt correction under complex background, the method comprises the following steps:
第一步,模型训练:对原始数据进行标注数据并生成标签,根据生成的标注文件统计证件大小,利用原始数据和标注文件对分割模型进行训练;The first step, model training: label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model;
第二步,证件初检,对于通过图像采集单元输入的图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步且粗糙的证件区域掩膜;The second step, the initial inspection of the document, uses the deep learning model to find the corresponding potential document area for the picture input through the image acquisition unit, and obtains a preliminary and rough document area mask;
第三步,标准化,对第一步获得的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸,输出矫正证件图片。The third step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
进一步的,第一步的模型训练包括以下步骤:Further, the first step of model training includes the following steps:
S11确定证件区域,通过人工标注寻找原始数据的图片中证件区域;S11 Determine the certificate area, and find the certificate area in the picture of the original data through manual annotation;
S12顶点标注生成标签,对证件区域内的证件四个顶点进行标注,并将顶点的坐标位置以json文件的方式进行保存生成标签;S12 Vertex labeling and generating labels, labeling the four vertices of the document in the document area, and saving the coordinate positions of the vertices in the form of json files to generate labels;
S13统计证件大小,根据生成的标注文件,统计每个证件区域的面积大小s,以为后续测试阶段服务;S13 Count the size of the certificate, according to the generated annotation file, count the area size s of each certificate area, so as to serve the subsequent testing stage;
S14训练分割模型,利用原始数据和生成的标注文件对分割模型进行训练。S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.
进一步的,在步骤S14中,输入图片和相应的标注文件具有相同的尺寸;且在训练前将json文件转换为对应的0-1二值掩膜图,其中像素为1的区域代表证件区域,像素为0的区域代表背景区域。Further, in step S14, the input picture and the corresponding annotation file have the same size; and the json file is converted into a corresponding 0-1 binary mask before training, wherein the area with a pixel of 1 represents the certificate area, The area with pixel 0 represents the background area.
进一步的,第二步的证件初检包括以下步骤:Further, the initial certificate inspection in the second step includes the following steps:
S21提取特征,输入图片后,将图片缩放为适合分割网络的输入图片大小,再用Unet网络模型对于输入数据提取深度特征,得到特征图;S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;
S22计算概率,对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,得到属于证件区域的概率分布图;S22 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area;
S23阈值截断,根据预先设定的阈值将概率分布图进行二值化,将大于阈值的概率设置为1,小于阈值的概率设置为0,获得0-1掩膜图;S23 Threshold truncation, binarize the probability distribution map according to the preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map;
S24粗分割掩膜,将0-1掩膜图上采样至与原始输入图片同样大小的尺寸,得到一张初步的证件粗分割掩膜图;S24 rough segmentation mask, upsample the 0-1 mask image to the same size as the original input image, and obtain a preliminary document rough segmentation mask image;
S25合法区域筛选,统计粗分割掩膜图中每个孤立的证件区域面积a,如果a≤μ-3σ,则认为该区域a为非法区域,从粗分割掩膜中剔除,以此通过合法区域筛选将部分错误区域进行过滤。S25 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask to pass the legal area Filtering will filter some error areas.
进一步的,在第三步标准化中,对第一步经筛选后的掩膜图中的合法 区域进行精细化掩膜修正,包括以下步骤:Further, in the third step of standardization, fine-grained mask correction is performed on the legal regions in the mask image filtered in the first step, including the following steps:
S31提取区域轮廓特征,轮廓特征是一张二值掩膜图,整体是一条闭合的不规则曲线,二值掩膜图不改变证件照矩形凸集的性质;S31 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the document photo;
S32求取轮廓凸包,在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘平滑;S32 obtains the convex hull of the contour, obtains the minimum convex hull of the contour on the basis of the original contour, fills in the missing area of the partial segmentation, and at the same time smoothes the edge of the contour;
S33直线拟合,使用霍夫变换对凸包的多个线段组成的不规则凸多边形进行直线拟合,以对凸包进行描述;S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull;
S34求取顶点,对直线拟合中的所有合法直线读取两两求取交点,以此寻找证件照四个顶点的分布范围,并且在求取顶点的过程中,对于两条直线平行的情况不做考虑;S34 Find the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, for the case where the two straight lines are parallel do not consider;
S35顶点合法筛选,设置筛选条件对于顶点进行合法性检查,筛选条件中设置了容忍值tol,横坐标[0-tol,width+tol]及纵坐标[0-tol,height+tol]定义为合法顶点坐标,其中width,height代表原始图像的宽度和高度,若某顶点的坐标超出了原始图像尺寸而没有超过tol,则将该顶点坐标(x crosspoint,y crosspoint)纠正到原始图像边缘处,即: S35 Vertex legal screening, set the filtering conditions to check the legality of the vertices, the tolerance value tol is set in the filtering conditions, the abscissa [0-tol, width+tol] and the ordinate [0-tol, height+tol] are defined as legal Vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates (x crosspoint , y crosspoint ) to the edge of the original image, that is :
Figure PCTCN2020141443-appb-000001
Figure PCTCN2020141443-appb-000001
S36顶点聚类,对比标准银行卡存在四个顶点,根据已求得的所有合法顶点,通过无监督聚类算法K-means将所有顶点聚为四类,其中每一类的质心即为某一个顶点的坐标,共得到四个顶点坐标;S36 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained;
S37顶点排序,为方便后续操作,通过以下步骤确定四个顶点的排序:1)根据四个顶点坐标求取中心点坐标;2)以中心点建立极坐标系,并构造从中心点指向各顶点的向量,依次求出各向量与极轴的夹角;3)按照夹 角的大小由大到小的顺序对四个顶点进行排序;4)寻找证件区域的左上角点,以最小坐标值之和的顶点为左上顶点,并以左上顶点为起点重新排列坐标顺序,按照“左上-右上-右下-左下”的顺序进行排列;S37 vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined through the following steps: 1) Obtain the coordinates of the center point according to the coordinates of the four vertices; 2) Establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and use the minimum coordinate value The vertex of the sum is the upper left vertex, and the coordinate order is rearranged with the upper left vertex as the starting point, and arranged in the order of "upper left - upper right - lower right - lower left";
S38区域填充,在找到并按顺序排列顶点坐标之后,将四个顶点构成的四边形区域进行二值填充,形成一个二进制掩膜;S38 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask;
S39仿射变换输出矫正图片,对重新确定四个顶点的证件区域,根据预先设定的目标证件照大小对证件区域进行仿射变换,I output=WI input,其中,W为证件区域与目标证件大小之间的仿射变换矩阵;以此,对每一个证件区域都进行相应的修正操作,并将修正后得到的证件图片作为矫正图片输出并保存到指定的文件路径处。 S39 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
进一步的,在步骤S33中,通过霍夫变换对凸包进行直线拟合的最小检测直线长度设置为100,直线之间最大间隔设置为20。Further, in step S33, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
进一步的,在步骤S36中,K-means的具体算法为:Further, in step S36, the specific algorithm of K-means is:
1)随机选取4个聚类质心点μ 0、μ 1、μ 2、μ 31) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;
2)对于每一个顶点坐标(x i,y i),通过计算与每个聚类质心的欧氏距离,找到最小距离的质心点作为其对应的质心点并标注为对应类别j:argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3; 2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance with each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin j | |(x i ,y i )-μ j || 2 ,j=0,1,2,3;
其中,||(x i,y i)-μ j|| 2,j=0,1,2,3为计算质心点j与类别j所有顶点之间欧几里得范数;argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3为调整质心点,使得四个质心点的欧几里得范数和最小; Among them, ||(x i ,y i )-μ j || 2 ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all the vertices of category j; argmin j || (x i , y i )-μ j || 2 , j=0,1,2,3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest;
3)重新计算4个质心的坐标;3) Recalculate the coordinates of the 4 centroids;
4)重复2)和3)过程直到收敛。4) Repeat 2) and 3) process until convergence.
本发明还提供了一种证件检测装置,装置包括电讯连接的获取输入单元、图像处理单元、信息提取单元、和信息输出单元;其中,获取输入单元,通过摄像组件获取待检测证件的检测图片及标准的注册图片;图像处理单元,通过处理器中的深度学习算法和图像处理算法对输入图片进行处理,依次获得初步的粗糙的证件区域掩膜、证件区域精修的掩膜、扣取的原图区域和仿射变换矫正后的矫正图像;信息提取单元,通过处理器中的信息提取算法将矫正图像的类别和信息;信息输出单元,处理器将输入图片提取的类别和信息结果在显示器上显示并存储至存储器。The invention also provides a certificate detection device, the device includes an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication; wherein, the acquisition input unit acquires the detection picture of the certificate to be detected and the information output unit through the camera assembly. The standard registration picture; the image processing unit processes the input picture through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, and the deducted original document. The image area and the corrected image after affine transformation correction; the information extraction unit, the category and information of the corrected image are corrected by the information extraction algorithm in the processor; the information output unit, the processor extracts the category and information results of the input picture on the display Display and store to memory.
本发明还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行前述方法的步骤。The present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed.
本发明还提供了一种终端,包括存储器和处理器,所述存储器上储存有注册图片和能够在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行前述方法的步骤。The present invention also provides a terminal, including a memory and a processor, the memory stores a registered picture and a computer instruction that can be run on the processor, and the processor executes the method of the foregoing method when the processor runs the computer instruction. step.
相比现有技术,本发明的有益效果在于:通过本申请的银行卡倾斜矫正技术(Bankcard Tilt Correction,BTC)结合深度学习技术和传统的图像处理方法,将二者的优点充分融合,针对种类繁多、场景复杂的用户输入图像,可以得到高准确率、高鲁棒性的证件分割和矫正结果,为后续证件检测、分类和信息提取提供了基础,并提高了证件识别的应用范围,在安保、金融等领域可得到广泛应用。Compared with the prior art, the beneficial effect of the present invention is that: by combining the deep learning technology and the traditional image processing method with the bank card tilt correction technology (Bankcard Tilt Correction, BTC) of the present application, the advantages of the two are fully integrated, and the advantages of the two are fully integrated. With various user input images with complex scenes, high accuracy and robustness of document segmentation and correction results can be obtained, which provides a basis for subsequent document detection, classification and information extraction, and improves the application scope of document recognition. , finance and other fields can be widely used.
附图说明Description of drawings
图1为本发明复杂背景下银行卡倾斜矫正检测方法的流程图;Fig. 1 is the flow chart of the bank card tilt correction detection method under the complex background of the present invention;
图2为模型训练简图;Figure 2 is a schematic diagram of model training;
图3为BTC测试阶段流程简图;Figure 3 is a simplified flow chart of the BTC testing phase;
图4为证件初检的方法流程图;Fig. 4 is the method flow chart of the initial inspection of the certificate;
图5为证件图像标准化的流程图。Figure 5 is a flow chart of document image standardization.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.
第一实施例first embodiment
一种复杂背景下银行卡倾斜矫正检测方法,参见图1-图5,方法包括以下步骤。A method for detecting bank card tilt correction under complex background is shown in Fig. 1-Fig. 5. The method includes the following steps.
第一步,模型训练:对原始数据进行标注数据并生成标签,根据生成的标注文件统计证件大小,利用原始数据和标注文件对分割模型进行训练。The first step, model training: label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model.
第二步,证件初检,对于通过图像采集单元输入的图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步且粗糙的证件区域掩膜。The second step is the initial inspection of the document. For the picture input through the image acquisition unit, the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained.
第三步,标准化,对第一步获得的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸,输出矫正证件图片。The third step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
模型训练model training
BTC借助于深度学习的强大特征抽取能力,因此在正式使用前需要进 行相关模型的训练操作。参见图2,对于一批待训练的原始数据,首先通过人工标注的方法找到图片中银行卡等证件的区域。具体来说,对于图片中的每一张证件,都将该证件的四个顶点进行标注,并将顶点的坐标位置以json文件的方式进行保存。接下来,根据生成的标注文件,统计每个证件区域的面积大小s,这旨在为后续测试阶段服务。经实例验证,原始数据中的证件照面积大小符合高斯分布,即:s~N(μ,σ 2)。 BTC relies on the powerful feature extraction ability of deep learning, so it needs to train related models before it is officially used. Referring to Figure 2, for a batch of raw data to be trained, first find the area of documents such as bank cards in the picture by manual annotation. Specifically, for each document in the picture, the four vertices of the document are marked, and the coordinate positions of the vertices are saved as a json file. Next, according to the generated annotation file, the area size s of each document area is counted, which is designed to serve the subsequent testing phase. It is verified by an example that the size of the document photo area in the original data conforms to the Gaussian distribution, namely: s~N(μ,σ 2 ).
通过统计每个证件区域的面积,计算得到高斯分布的均值μ和标准差σ。By counting the area of each document area, the mean μ and standard deviation σ of the Gaussian distribution are calculated.
最后,利用原始数据和生成的标注文件对分割模型进行训练。值得注意的是,在具体的训练中,需要保持输入的图片和相应的标注文件具有相同的尺寸。因此,还需要将标注得到的json文件转换为对应的0-1二值掩膜图,其中像素为1的区域代表证件区域,像素为0的区域代表背景区域。Finally, the segmentation model is trained using the raw data and the generated annotation files. It is worth noting that in the specific training, it is necessary to keep the input image and the corresponding annotation file with the same size. Therefore, it is also necessary to convert the marked json file into a corresponding 0-1 binary mask map, in which the area with a pixel of 1 represents the document area, and the area with a pixel of 0 represents the background area.
具体的,第一步的模型训练步骤如下。Specifically, the model training steps of the first step are as follows.
S11确定证件区域,通过人工标注寻找原始数据的图片中证件区域。S11 determines the certificate area, and finds the certificate area in the picture of the original data through manual annotation.
S12顶点标注生成标签,对证件区域内的证件四个顶点进行标注,并将顶点的坐标位置以json文件的方式进行保存生成标签。S12 Vertex labeling generates labels, labels the four vertices of the document in the document area, and saves the coordinate positions of the vertices in the form of json files to generate labels.
JSON(JavaScript Object Notation)是一种轻量级的数据交换格式。易于人阅读和编写。同时也易于机器解析和生成。它基于JavaScript Programming Language,Standard ECMA-262 3rd Edition-December 1999的一个子集。JSON是一个标记符的序列,包含六个构造字符、字符串、数字和三个字面名。正因此,可以很好地匹配应用到本方案的坐标标注。JSON (JavaScript Object Notation) is a lightweight data interchange format. Easy for humans to read and write. It is also easy to parse and generate by machine. It is based on JavaScript Programming Language, a subset of Standard ECMA-262 3rd Edition-December 1999. JSON is a sequence of tokens, consisting of six constructed characters, strings, numbers, and three literal names. Because of this, the coordinate annotation applied to this scheme can be well matched.
S13统计证件大小,根据生成的标注文件,统计每个证件区域的面积大小s,以为后续测试阶段服务。S13 Count the size of the certificate, and according to the generated annotation file, count the area size s of each certificate area, so as to serve the subsequent testing stage.
S14训练分割模型,利用原始数据和生成的标注文件对分割模型进行训练。S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.
至此,BTC的训练流程实施完毕。So far, the BTC training process has been implemented.
检测阶段detection stage
检测阶段分为证件初检和标准化。BTC是一个两阶段的、由粗到细的分割优化模型(two-stage and coarse-to-fine refinement segmentation)。如图3所示,在第一阶段,对于输入图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步的、较为粗糙的证件区域掩膜;在第二阶段,利用传统的图像处理技术,对第一阶段的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件照,最后对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸。The testing phase is divided into initial certificate inspection and standardization. BTC is a two-stage, coarse-to-fine segmentation optimization model (two-stage and coarse-to-fine refinement segmentation). As shown in Figure 3, in the first stage, the deep learning model is used to find the corresponding potential document area for the input picture, and a preliminary and relatively rough document area mask is obtained; in the second stage, using traditional image processing technology, The rough mask of the first stage is refined and corrected to obtain a high-quality document area mask, and the document photo is extracted from the original image by using the mask. Set the ID photo size.
第一阶段,证件初检。在第一阶段中,寻找证件区域这一目标主要由提取特征、计算概率、阈值截断这几个子操作完成,最终得到一个初步的粗分割掩膜。如图4所示,在用户输入图片后,将其缩放为适合分割网络的输入图片大小,之后采用经典的Unet网络模型对于输入数据提取深度特征;接着对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,至此,得到了一张属于证件区域的概率分布图;接下来根据预先设定的阈值将这张概率分布图进行二值化操作,将大于阈值的概率设置为1,小于阈值的概率设置为0,然后将这张0-1掩膜图上采样至与原始输入同样大小的尺寸。至此第一阶段操作完毕,得 到一张初步的证件分割掩膜图。证件初检具体步骤如下。The first stage is the initial inspection of documents. In the first stage, the goal of finding the document area is mainly completed by several sub-operations of extracting features, calculating probability, and threshold truncation, and finally obtains a preliminary rough segmentation mask. As shown in Figure 4, after the user inputs the picture, it is scaled to the input picture size suitable for the segmentation network, and then the classical Unet network model is used to extract the depth features of the input data; The two-class judgment is to obtain the probability value that the feature of each position belongs to the certificate area. So far, a probability distribution map belonging to the certificate area is obtained; then, the probability distribution map is binarized according to the preset threshold. operation, set the probability of being greater than the threshold to 1 and the probability of being less than the threshold to 0, and then upsample this 0-1 mask to the same size as the original input. So far, the first stage operation is completed, and a preliminary document segmentation mask map is obtained. The specific steps for the initial inspection of the certificate are as follows.
S21提取特征,输入图片后,将图片缩放为适合分割网络的输入图片大小,再用Unet网络模型对于输入数据提取深度特征,得到特征图。S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map.
S22计算概率,对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,得到属于证件区域的概率分布图。S22 calculates the probability, performs binary classification judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the certificate area, and obtains a probability distribution map belonging to the certificate area.
S23阈值截断,根据预先设定的阈值将概率分布图进行二值化,将大于阈值的概率设置为1,小于阈值的概率设置为0,获得0-1掩膜图。S23 Threshold truncation, binarize the probability distribution map according to a preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map.
S24粗分割掩膜,将0-1掩膜图上采样至与原始输入图片同样大小的尺寸,得到一张初步的证件粗分割掩膜图。S24 rough segmentation mask, the 0-1 mask image is upsampled to the same size as the original input image, and a preliminary document rough segmentation mask image is obtained.
S25合法区域筛选,统计粗分割掩膜图中每个孤立的证件区域面积a,如果a≤μ-3σ,则认为该区域a为非法区域,从粗分割掩膜中剔除,以此通过合法区域筛选将部分错误区域进行过滤。S25 Legal area screening, count the area a of each isolated document area in the rough segmentation mask, if a≤μ-3σ, the area a is considered to be an illegal area, and it is removed from the rough segmentation mask to pass the legal area Filtering will filter some error areas.
其中,Unet网络模型,属于分割网络,Unet借鉴了FCN网络,其网络结构包括两个对称部分:前面一部分网络与普通卷积网络相同,使用了3x3的卷积和池化下采样,能够抓住图像中的上下文信息(也即像素间的关系);后面部分网络则是与前面基本对称,使用的是3x3卷积和上采样,以达到输出图像分割的目的。此外,网络中还用到了特征融合,将前面部分下采样网络的特征与后面上采样部分的特征进行了融合以获得更准确的上下文信息,达到更好的分割效果。且,Unet使用了加权的softmax损失函数,对于每一个像素点都有自己的权重,这使得网络更加重视边缘像素的学习。采用这种模型更适应于证件边缘非直线的微小凹凸变化。Among them, the Unet network model belongs to the segmentation network. Unet draws on the FCN network, and its network structure includes two symmetrical parts: the first part of the network is the same as the ordinary convolutional network, using 3x3 convolution and pooling downsampling, which can capture The context information in the image (that is, the relationship between pixels); the latter part of the network is basically symmetrical with the former, using 3x3 convolution and upsampling to achieve the purpose of output image segmentation. In addition, feature fusion is also used in the network, and the features of the previous part of the downsampling network are fused with the features of the latter part of the upsampling part to obtain more accurate context information and achieve a better segmentation effect. Moreover, Unet uses a weighted softmax loss function, which has its own weight for each pixel, which makes the network pay more attention to the learning of edge pixels. Using this model is more suitable for the slight uneven change of the edge of the document which is not straight.
第二阶段,标准化。在第一阶段的基础上,进行第二阶段的精细化掩膜修正(refinement)。如图5所示,对于第一阶段得到的掩膜图中的所有合法区域,都要逐一进行修正处理。在第二步标准化中,对于每一个合法证件区域,即对第一步经筛选后的掩膜图中的合法区域进行精细化掩膜修正,参见图5,包括以下步骤。The second stage is standardization. On the basis of the first stage, the refinement mask refinement of the second stage is performed. As shown in Figure 5, all legal regions in the mask map obtained in the first stage must be corrected one by one. In the second step of standardization, for each legal document area, that is, the refined mask correction is performed on the legal area in the mask image after the screening in the first step, see FIG. 5 , including the following steps.
S31提取区域轮廓特征,轮廓特征是一张二值掩膜图,整体是一条闭合的不规则曲线,二值掩膜图不改变证件照矩形凸集的性质。S31 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the ID photo.
在进行接下来的操作时,首先引入一条性质以保证以下操作的合法性。When performing the following operations, first introduce a property to ensure the legality of the following operations.
性质定义:凸集经过仿射变换作用后仍为凸集。证件照的良好性质之一在于其为规则矩形形状,是一种标准的凸集集合,无论该凸集在采集阶段经过怎样的仿射变换,均不能改变其凸集的性质。Property definition: Convex sets are still convex sets after affine transformation. One of the good properties of ID photo is that it is a regular rectangular shape, which is a standard convex set. No matter what affine transformation the convex set undergoes in the collection stage, the properties of the convex set cannot be changed.
S32求取轮廓凸包,在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘平滑。S32 obtains the convex hull of the contour, obtains the minimum convex hull of the contour on the basis of the original contour, fills in the missing area of the partial segmentation, and smoothes the edge of the contour at the same time.
由于上一步的轮廓提取完全依赖于分割模型的结果,在某些不平滑的边缘处凹凸不平,这与证件照的性质不吻合。故在原始轮廓的基础上求取该轮廓的最小凸包,将部分分割缺失的区域进行填补,同时使轮廓边缘更加平滑。Since the contour extraction in the previous step completely relies on the results of the segmentation model, some uneven edges are uneven, which is inconsistent with the nature of the ID photo. Therefore, the minimum convex hull of the contour is obtained on the basis of the original contour, and the missing area of the partial segmentation is filled, and the contour edge is smoother at the same time.
S33直线拟合,使用霍夫变换对凸包的多个线段组成的不规则凸多边形进行直线拟合,以对凸包进行描述。具体实施例中,在步骤S33中,通过霍夫变换对凸包进行直线拟合的最小检测直线长度设置为100,直线之间最大间隔设置为20。S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull. In a specific embodiment, in step S33, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
其中,霍夫变换是一种特征检测(feature extraction),被广泛应用 在图像分析(image analysis)、计算机视觉(computer vision)以及数位影像处理(digital image processing),霍夫变换是用来辨别找出物件中的特征,例如:线条。本方案即用其来精确地解析定义的证件边缘直线。Among them, Hough transform is a feature extraction, which is widely used in image analysis, computer vision and digital image processing. Extract features in objects, such as lines. This scheme uses it to accurately parse the defined document edge line.
S34求取顶点,对直线拟合中的所有合法直线读取两两求取交点,以此寻找证件照四个顶点的分布范围,具体的,S33中所有检测得到的合法直线,均可以得到直线的解析式表达。针对所有的合法直线,读其两两求取交点,这一步操作旨在于寻找证件照四个顶点的分布范围。并且在求取顶点的过程中,对于两条直线平行的情况不做考虑。S34 finds the vertices, reads all the legal straight lines in the straight line fitting to find the intersection points in pairs, so as to find the distribution range of the four vertices of the certificate photo. Specifically, all the legal straight lines detected in S33 can be straight lines. analytic expression. For all legal straight lines, read them to find the intersection points. This step is to find the distribution range of the four vertices of the ID photo. And in the process of finding the vertices, the case where the two lines are parallel is not considered.
S35顶点合法筛选,在所有得到的顶点中,并非所有顶点都是合法的,因此,设置了筛选条件对于顶点进行合法性检查,为后续步骤提高了准确率和处理速度。具体的,设置筛选条件对于顶点进行合法性检查,筛选条件中设置了容忍值tol,横坐标[0-tol,width+tol],纵坐标[0-tol,height+tol]定义为合法顶点坐标,其中width、height代表原始图像的宽度和高度,具体实施例中,容忍值tol设为50。且,若某顶点的坐标超出了原始图像尺寸而没有超过tol,则将该顶点坐标(x crosspoint,y crosspoint)纠正到原始图像边缘处,即:
Figure PCTCN2020141443-appb-000002
S35 Vertex legal screening, among all the obtained vertices, not all vertices are legal. Therefore, setting filtering conditions to check the legality of vertices improves the accuracy and processing speed for subsequent steps. Specifically, a filter condition is set to check the legitimacy of the vertex. The tolerance value tol is set in the filter condition, the abscissa [0-tol, width+tol], and the ordinate [0-tol, height+tol] are defined as legal vertex coordinates , where width and height represent the width and height of the original image. In a specific embodiment, the tolerance value tol is set to 50. And, if the coordinates of a vertex exceed the original image size without exceeding tol, then correct the vertex coordinates (x crosspoint , y crosspoint ) to the edge of the original image, that is:
Figure PCTCN2020141443-appb-000002
其中,min(x crosspoint,width)将x crosspoint最大值不能超过原始图片width,max(min(x crosspoint,width),0)最小值不能小于0; Among them, min(x crosspoint , width) will make the maximum value of x crosspoint not exceed the width of the original image, and the minimum value of max(min(x crosspoint , width), 0) cannot be less than 0;
同理,min(y crosspoint,height)将y crosspoint最大值不能超过原始图片height,max(min(y corsspoint,height),0)最小值不能小于0。 In the same way, min(y crosspoint , height) will make the maximum value of y crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
S36顶点聚类,对比标准银行卡存在四个顶点,根据已求得的所有合法顶点,通过无监督聚类算法K-means将所有顶点聚为四类,其中每一类 的质心即为某一个顶点的坐标,共得到四个顶点坐标。S36 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained.
其中,K-means的具体算法为:Among them, the specific algorithm of K-means is:
1)随机选取4个聚类质心点μ 01231) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;
2)对于每一个顶点坐标(x i,y i),通过计算与每个聚类质心的欧氏距离,找到最小距离的质心点作为其对应的质心点并标注为对应类别j:argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3; 2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance with each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j: argmin j | |(x i ,y i )-μ j || 2 ,j=0,1,2,3;
其中,||(x i,y i)-μ j|| 2,j=0,1,2,3为计算质心点j与类别j所有顶点之间欧几里得范数;argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3为调整质心点,使得四个质心点的欧几里得范数和最小。 Among them, ||(x i ,y i )-μ j || 2 ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all the vertices of category j; argmin j || (x i , y i )-μ j || 2 , j=0, 1, 2, 3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest.
3)重新计算4个质心的坐标;3) Recalculate the coordinates of the 4 centroids;
4)重复2)和3)过程直到收敛。4) Repeat 2) and 3) process until convergence.
其中,K-means是最常用的基于欧式距离的聚类算法,它是数值的、非监督的、非确定的、迭代的,该算法旨在最小化一个目标函数——误差平方函数(所有的观测点与其中心点的距离之和),其认为两个目标的距离越近,相似度越大,由于具有出色的速度和良好的可扩展性,Kmeans聚类算法算得上是最著名的聚类方法。Among them, K-means is the most commonly used clustering algorithm based on Euclidean distance, which is numerical, unsupervised, non-deterministic, and iterative, and the algorithm aims to minimize an objective function - the squared error function (all The sum of the distance between the observation point and its center point), it believes that the closer the distance between the two targets, the greater the similarity. Due to its excellent speed and good scalability, the Kmeans clustering algorithm can be regarded as the most famous clustering algorithm method.
S37顶点排序,为方便后续操作,通过以下步骤确定四个顶点的排序:S37 Vertex sorting, in order to facilitate subsequent operations, the sorting of the four vertices is determined by the following steps:
1)根据四个顶点坐标求取中心点坐标;1) Obtain the coordinates of the center point according to the coordinates of the four vertices;
2)以中心点建立极坐标系,并构造从中心点指向各顶点的向量,依次求出各向量与极轴的夹角;2) Establish a polar coordinate system with the center point, and construct a vector pointing from the center point to each vertex, and obtain the angle between each vector and the polar axis in turn;
3)按照夹角的大小由大到小的顺序对四个顶点进行排序;3) Sort the four vertices according to the size of the included angle from large to small;
4)寻找证件区域的左上角点,并从左上角点开始,按照“左上-右上 -右下-左下”的顺序进行排列。4) Find the upper left corner of the document area, and start from the upper left corner, and arrange them in the order of "upper left - upper right - lower right - lower left".
其中,在步骤S37的步骤4)中,左上的坐标点坐标值之和最小,并以最小坐标值之和的顶点为左上顶点,并以此为起点重新排列坐标顺序,以确定四个顶点的顺序。Wherein, in step 4) of step S37, the sum of the coordinate values of the upper left coordinate point is the smallest, and the vertex of the sum of the smallest coordinate value is the upper left vertex, and the coordinate order is rearranged from this as the starting point to determine the four vertexes. order.
S38区域填充,在找到并按顺序排列顶点坐标之后,将四个顶点构成的四边形区域进行二值填充,形成一个二进制掩膜。S38 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask.
S39仿射变换输出矫正图片,对重新确定四个顶点的证件区域,根据预先设定的目标证件照大小对证件区域进行仿射变换,I output=WI input,其中,W为证件区域与目标证件大小之间的仿射变换矩阵;以此,对每一个证件区域都进行相应的修正操作,并将修正后得到的证件图片作为矫正图片输出并保存到指定的文件路径处。 S39 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
至此,对于每一个证件区域都可以进行相应的修正操作,并将修正后得到的证件图片保存到指定的文件路径处,至此,银行卡倾斜矫正的全部流程处理完毕。So far, the corresponding correction operation can be performed for each certificate area, and the certificate picture obtained after correction can be saved to the specified file path. At this point, the whole process of bank card tilt correction is completed.
第二实施例Second Embodiment
本发明还提供了一种证件检测装置,所述装置包括电讯连接的获取输入单元、图像处理单元、信息提取单元、和信息输出单元。The invention also provides a certificate detection device, which includes an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication.
获取输入单元,通过摄像组件获取待检测证件的检测图片及标准的注册图片;获取单元利用硬件设备,包括但不限于手机,IPAD,普通摄像头,CCD工业相机、扫描仪等,对证件正面进行图像信息采集,注意采集到的图像应完全的包含证件的四条边界,并且倾斜不超过正负20°,且人眼能分辨证件号码和边缘直线。Obtain the input unit, and obtain the inspection picture and standard registration picture of the certificate to be inspected through the camera component; the obtaining unit uses hardware equipment, including but not limited to mobile phones, IPAD, ordinary cameras, CCD industrial cameras, scanners, etc., to image the front of the certificate For information collection, note that the collected image should completely include the four borders of the document, and the inclination should not exceed plus or minus 20°, and the human eye can distinguish the document number and the edge straight line.
图像处理单元,通过处理器中的深度学习算法和图像处理算法对输入图片进行处理,依次获得初步的粗糙的证件区域掩膜、证件区域精修的掩膜、扣取的原图区域和仿射变换矫正后的矫正图像。The image processing unit processes the input image through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, the deducted original image area and the affine Transform the rectified image after rectification.
其中的采集的图像,是通过摄像头采集的图像,可以是一张静态图像(即:单独采集的图像),也可以是一张视频中图像(即从采集的视频中按照预设标准或随机选取的一张图像),均可用于本发明证件的图像源,本发明实施例对于图像的来源、性质、大小等等所有属性均无限制。The collected image is an image collected by a camera, which can be a static image (that is, an image collected separately), or an image in a video (that is, an image selected from the collected video according to preset standards or randomly An image of the present invention) can be used as the image source of the document of the present invention, and the embodiment of the present invention has no restrictions on all attributes such as the source, nature, size, etc. of the image.
信息提取单元,通过处理器中的信息提取算法将矫正图像的类别和信息。The information extraction unit will correct the category and information of the image through the information extraction algorithm in the processor.
信息输出单元,处理器将输入图片提取的类别和信息结果在显示器上显示并存储至存储器。其中,显示器包括但不限于平板电脑、计算机、手机等的显示屏,将处理器提取的证件对比分类显示。In the information output unit, the processor displays the category and information result extracted from the input picture on the display and stores it in the memory. Among them, the display includes but is not limited to the display screen of a tablet computer, computer, mobile phone, etc., which compares and classifies the certificates extracted by the processor.
本领域技术人员基于本公开实施例的记载可以知悉,除了神经网络外,在本公开实施例还可以利用例如但不限于:基于图像处理的字符检测算法(例如,基于直方图粗分割和奇异值特征的字符/号码检测算法,基于二进小波变换的字符/号码检测算法,等等),对采集图像进行字符检测。另外,除了神经网络外,在本公开实施例也可以利用例如但不限于:基于图像处理的证件检测算法(例如,边缘检测法,数学形态学法,基于纹理分析的定位方法,行检测和边缘统计法,遗传算法,霍夫(Hough)变换和轮廓线法,基于小波变换的方法,等等),对采集图像进行证件检测。Those skilled in the art may know based on the description of the embodiments of the present disclosure that, in addition to neural networks, in the embodiments of the present disclosure, for example, but not limited to: character detection algorithms based on image processing (for example, based on histogram rough segmentation and singular value Character/number detection algorithm based on feature, character/number detection algorithm based on binary wavelet transform, etc.), to perform character detection on the collected image. In addition, in addition to neural networks, embodiments of the present disclosure may also utilize, for example, but not limited to, image processing-based document detection algorithms (eg, edge detection, mathematical morphology, texture analysis-based localization, line detection, and edge detection). Statistical method, genetic algorithm, Hough transform and contour method, method based on wavelet transform, etc.), to perform document detection on the collected image.
本公开实施例中,通过神经网络对采集图像进行边缘检测时,可以预先利用样本图像对神经网络进行训练,使得训练好的神经网络能够实现对 图像中边缘直线的有效检测。In the embodiment of the present disclosure, when edge detection is performed on the collected image through the neural network, the neural network can be trained by using the sample image in advance, so that the trained neural network can effectively detect the edge straight lines in the image.
第三实施例Third Embodiment
本发明还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行前述方法的步骤。其中,所述方法请参见前述部分的详细介绍,此处不再赘述。The present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed, the steps of the aforementioned method are performed. Wherein, for the method, please refer to the detailed introduction in the foregoing part, and details are not repeated here.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于计算机可读存储介质中,计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the computer-readable medium includes a permanent Persistent and non-permanent, removable and non-removable media can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
第四实施例Fourth Embodiment
本发明还提供了一种终端,包括存储器和处理器,所述存储器上储存有注册图片和能够在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行前述方法的步骤。其中,所述方法请参见前述部分的 详细介绍,此处不再赘述。The present invention also provides a terminal, including a memory and a processor, the memory stores a registered picture and a computer instruction that can be run on the processor, and the processor executes the method of the foregoing method when the processor runs the computer instruction. step. Wherein, for the method, please refer to the detailed introduction in the foregoing section, and details are not repeated here.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or inherent to such a process, method, article of manufacture or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or device that includes the element.
本领域技术人员应明白,本申请的实施例可提供为方法、装置、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as methods, apparatuses, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

  1. 复杂背景下银行卡倾斜矫正检测方法,其特征在于,方法包括以下步骤:The bank card tilt correction detection method under complex background is characterized in that, the method comprises the following steps:
    第一步,模型训练:对原始数据进行标注数据并生成标签,根据生成的标注文件统计证件大小,利用原始数据和标注文件对分割模型进行训练;The first step, model training: label the original data and generate labels, count the document size according to the generated label files, and use the original data and label files to train the segmentation model;
    第二步,证件初检,对于通过图像采集单元输入的图片利用深度学习模型寻找相应的潜在证件区域,得到一个初步且粗糙的证件区域掩膜;The second step is the initial inspection of the document. For the picture input through the image acquisition unit, the deep learning model is used to find the corresponding potential document area, and a preliminary and rough document area mask is obtained;
    第三步,标准化,对第一步获得的粗糙掩膜进行精细化修正,得到高质量的证件区域掩膜,利用该掩膜在原图中提取证件区域,对于得到的证件照进行仿射矫正变换,将其变换为预设定的证件照尺寸,输出矫正证件图片。The third step, standardization, fine-tune the rough mask obtained in the first step to obtain a high-quality document area mask, use the mask to extract the document area in the original image, and perform affine correction transformation on the obtained document photo , convert it to the preset ID photo size, and output the corrected ID photo.
  2. 根据权利要求1所述的方法,其特征在于,第一步的模型训练包括以下步骤:The method according to claim 1, wherein the model training of the first step comprises the following steps:
    S11确定证件区域,通过人工标注寻找原始数据的图片中证件区域;S11 Determine the certificate area, and find the certificate area in the picture of the original data through manual annotation;
    S12顶点标注生成标签,对证件区域内的证件四个顶点进行标注,并将顶点的坐标位置以json文件的方式进行保存生成标签;S12 Vertex labeling and generating labels, labeling the four vertices of the document in the document area, and saving the coordinate positions of the vertices in the form of json files to generate labels;
    S13统计证件大小,根据生成的标注文件,统计每个证件区域的面积大小s,以为后续测试阶段服务;S13 Count the size of the certificate, according to the generated annotation file, count the area size s of each certificate area, so as to serve the subsequent testing stage;
    S14训练分割模型,利用原始数据和生成的标注文件对分割模型进行训练。S14 trains the segmentation model, and uses the original data and the generated annotation files to train the segmentation model.
  3. 根据权利要求2所述的方法,其特征在于:在步骤S14中,输入图片和相应的标注文件具有相同的尺寸;且在训练前将json文件转换为对应的0-1二值掩膜图,其中像素为1的区域代表证件区域,像素为0的区 域代表背景区域。The method according to claim 2, wherein: in step S14, the input picture and the corresponding annotation file have the same size; and the json file is converted into a corresponding 0-1 binary mask image before training, The area where the pixel is 1 represents the document area, and the area where the pixel is 0 represents the background area.
  4. 根据权利要求1所述的方法,其特征在于,第二步的证件初检包括以下步骤:The method according to claim 1, wherein the initial inspection of the certificate in the second step comprises the following steps:
    S21提取特征,输入图片后,将图片缩放为适合分割网络的输入图片大小,再用Unet网络模型对于输入数据提取深度特征,得到特征图;S21 extracts features, after inputting a picture, scales the picture to a size suitable for the input picture of the segmentation network, and then uses the Unet network model to extract depth features from the input data to obtain a feature map;
    S22计算概率,对于特征图中的每个位置的特征进行二分类判断,求得每个位置的特征属于证件区域的概率值,得到属于证件区域的概率分布图;S22 calculates the probability, carries out two-class judgment on the feature of each position in the feature map, obtains the probability value that the feature of each position belongs to the document area, and obtains the probability distribution map belonging to the document area;
    S23阈值截断,根据预先设定的阈值将概率分布图进行二值化,将大于阈值的概率设置为1,小于阈值的概率设置为0,获得0-1掩膜图;S23 Threshold truncation, binarize the probability distribution map according to the preset threshold, set the probability greater than the threshold to 1, and set the probability less than the threshold to 0 to obtain a 0-1 mask map;
    S24粗分割掩膜,将0-1掩膜图上采样至与原始输入图片同样大小的尺寸,得到一张初步的证件粗分割掩膜图;S24 rough segmentation mask, the 0-1 mask image is upsampled to the same size as the original input image, and a preliminary document rough segmentation mask image is obtained;
    S25合法区域筛选,在训练阶段对银行卡的面积进行统计,计算训练集中的分布函数,得到平均值μ和标准差σ,统计粗分割掩膜图中每个孤立的证件区域面积a,如果a≤μ-3σ,则认为该区域a为非法区域,从粗分割掩膜中剔除,以此通过合法区域筛选将部分错误区域进行过滤。S25 legal area screening, count the area of the bank card in the training phase, calculate the distribution function in the training set, get the mean μ and standard deviation σ, and count the area a of each isolated document area in the rough segmentation mask map, if a ≤μ-3σ, then the area a is considered to be an illegal area, and it is removed from the rough segmentation mask, so as to filter some error areas through legal area screening.
  5. 根据权利要求1所述的方法,其特征在于,在第三步标准化中,对第一步经筛选后的掩膜图中的合法区域进行精细化掩膜修正,包括以下步骤:The method according to claim 1, characterized in that, in the third step of standardization, refining the mask correction is performed on the legal area in the mask map after the first step of screening, comprising the following steps:
    S31提取区域轮廓特征,轮廓特征是一张二值掩膜图,整体是一条闭合的不规则曲线,二值掩膜图不改变证件照矩形凸集的性质;S31 extracts the regional contour feature, the contour feature is a binary mask image, the whole is a closed irregular curve, and the binary mask image does not change the properties of the rectangular convex set of the document photo;
    S32求取轮廓凸包,在原始轮廓的基础上求取该轮廓的最小凸包,将 部分分割缺失的区域进行填补,同时使轮廓边缘平滑;S32 obtains the contour convex hull, obtains the minimum convex hull of this contour on the basis of the original contour, and fills the missing area of the partial segmentation, and makes the contour edge smooth simultaneously;
    S33直线拟合,使用霍夫变换对凸包的多个线段组成的不规则凸多边形进行直线拟合,以对凸包进行描述;S33 line fitting, using Hough transform to perform straight line fitting on an irregular convex polygon composed of multiple line segments of the convex hull to describe the convex hull;
    S34求取顶点,对直线拟合中的所有合法直线读取两两求取交点,以此寻找证件照四个顶点的分布范围,并且在求取顶点的过程中,对于两条直线平行的情况不做考虑;S34 Find the vertices, read all the legal straight lines in the straight line fitting to find the intersection points, so as to find the distribution range of the four vertices of the ID photo, and in the process of finding the vertices, for the case where the two straight lines are parallel do not consider;
    S35顶点合法筛选,设置筛选条件对于顶点进行合法性检查,筛选条件中设置了容忍值tol,横坐标[0-tol,width+tol]及纵坐标[0-tol,height+tol]定义为合法顶点坐标,其中width,height代表原始图像的宽度和高度,若某顶点的坐标(x crosspoint,y crosspoint)超出了原始图像尺寸而没有超过tol,则将该顶点坐标纠正到原始图像边缘处,即: S35 Vertex legal screening, set the filtering conditions to check the legality of the vertices, the tolerance value tol is set in the filtering conditions, the abscissa [0-tol, width+tol] and the ordinate [0-tol, height+tol] are defined as legal Vertex coordinates, where width and height represent the width and height of the original image. If the coordinates of a vertex (x crosspoint , y crosspoint ) exceed the original image size without exceeding tol, then correct the vertex coordinates to the edge of the original image, that is :
    Figure PCTCN2020141443-appb-100001
    其中,
    Figure PCTCN2020141443-appb-100001
    in,
    min(x crosspoint,width)将x crosspoint最大值不能超过原始图片width,max(min(x crosspoint,width),0)最小值不能小于0; min(x crosspoint ,width)The maximum value of x crosspoint cannot exceed the original image width, and the minimum value of max(min(x crosspoint ,width),0) cannot be less than 0;
    同理,min(y crosspoint,height)将y crosspoint最大值不能超过原始图片height,max(min(y corsspoint,height),0)最小值不能小于0。 In the same way, min(y crosspoint , height) will make the maximum value of y crosspoint not exceed the height of the original image, and the minimum value of max(min(y corsspoint , height), 0) cannot be less than 0.
    S36顶点聚类,对比标准银行卡存在四个顶点,根据已求得的所有合法顶点,通过无监督聚类算法K-means将所有顶点聚为四类,其中每一类的质心即为某一个顶点的坐标,共得到四个顶点坐标;S36 vertex clustering, compared with the standard bank card, there are four vertices. According to all the legal vertices that have been obtained, all vertices are clustered into four categories through the unsupervised clustering algorithm K-means, and the centroid of each category is a certain The coordinates of the vertex, a total of four vertex coordinates are obtained;
    S37顶点排序,为方便后续操作,通过以下步骤确定四个顶点的排序:1)根据四个顶点坐标求取中心点坐标;2)以中心点建立极坐标系,并构造从中心点指向各顶点的向量,依次求出各向量与极轴的夹角;3)按照 夹角的大小由大到小的顺序对四个顶点进行排序;4)寻找证件区域的左上角点,以最小坐标值之和的顶点为左上顶点,并以左上顶点为起点重新排列坐标顺序,按照“左上-右上-右下-左下”的顺序进行排列;S37 vertex sorting, in order to facilitate subsequent operations, determine the sorting of the four vertices through the following steps: 1) obtain the coordinates of the center point according to the coordinates of the four vertices; 2) establish a polar coordinate system with the center point, and construct a point from the center point to each vertex 3) Sort the four vertices according to the size of the included angle from large to small; 4) Find the upper left corner of the document area, and use the minimum coordinate value The vertex of the sum is the upper left vertex, and the coordinate order is rearranged with the upper left vertex as the starting point, and arranged in the order of "upper left - upper right - lower right - lower left";
    S38区域填充,在找到并按顺序排列顶点坐标之后,将四个顶点构成的四边形区域进行二值填充,形成一个二进制掩膜;S38 area filling, after finding and arranging the vertex coordinates in order, the quadrilateral area formed by the four vertices is filled with binary values to form a binary mask;
    S39仿射变换输出矫正图片,对重新确定四个顶点的证件区域,根据预先设定的目标证件照大小对证件区域进行仿射变换,I output=WI input,其中,W为证件区域与目标证件大小之间的仿射变换矩阵;以此,对每一个证件区域都进行相应的修正操作,并将修正后得到的证件图片作为矫正图片输出并保存到指定的文件路径处。 S39 Affine transformation outputs the corrected picture, and for the document area with the four vertices re-determined, affine transformation is performed on the document area according to the preset target document photo size, I output =WI input , where W is the document area and the target document The affine transformation matrix between the sizes; in this way, the corresponding correction operation is performed on each certificate area, and the certificate image obtained after correction is output as a corrected image and saved to the specified file path.
  6. 根据权利要求5所述的方法,其特征在于:在步骤S33中,通过霍夫变换对凸包进行直线拟合的最小检测直线长度设置为100,直线之间最大间隔设置为20。The method according to claim 5, characterized in that: in step S33, the minimum detected straight line length for performing straight line fitting on the convex hull by Hough transform is set to 100, and the maximum interval between straight lines is set to 20.
  7. 根据权利要求5所述的方法,其特征在于:在步骤S36中,K-means的具体算法为:The method according to claim 5, wherein: in step S36, the specific algorithm of K-means is:
    1)随机选取4个聚类质心点μ 01231) Randomly select 4 cluster centroid points μ 0 , μ 1 , μ 2 , μ 3 ;
    2)对于每一个顶点坐标(x i,y i),通过计算与每个聚类质心的欧氏距离,找到最小距离的质心点作为其对应的质心点并标注为对应类别j: 2) For each vertex coordinate (x i , y i ), by calculating the Euclidean distance from each cluster centroid, find the centroid point with the smallest distance as its corresponding centroid point and mark it as the corresponding category j:
    argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3; argmin j ||(x i ,y i )-μ j || 2 ,j=0,1,2,3;
    其中,||(x i,y i)-μ j|| 2,j=0,1,2,3为计算质心点j与类别j所有顶点之间欧几里得范数;argmin j||(x i,y i)-μ j|| 2,j=0,1,2,3为调整质心点,使得四个质心点的欧几里得范数和最小; Among them, ||(x i ,y i )-μ j || 2 ,j=0,1,2,3 is the Euclidean norm between the centroid point j and all the vertices of category j; argmin j || (x i , y i )-μ j || 2 , j=0,1,2,3 is to adjust the centroid points so that the Euclidean norm sum of the four centroid points is the smallest;
    3)重新计算4个质心的坐标;3) Recalculate the coordinates of the 4 centroids;
    4)重复2)和3)过程直到收敛。4) Repeat 2) and 3) process until convergence.
  8. 一种证件检测装置,其特征在于:所述装置包括电讯连接的获取输入单元、图像处理单元、信息提取单元、和信息输出单元;其中,A certificate detection device, characterized in that: the device comprises an acquisition input unit, an image processing unit, an information extraction unit, and an information output unit connected by telecommunication; wherein,
    获取输入单元,通过摄像组件获取待检测证件的检测图片及标准的注册图片;Obtain the input unit, and obtain the detection picture and the standard registration picture of the certificate to be detected through the camera assembly;
    图像处理单元,通过处理器中的深度学习算法和图像处理算法对输入图片进行处理,依次获得初步的粗糙的证件区域掩膜、证件区域精修的掩膜、扣取的原图区域和仿射变换矫正后的矫正图像;The image processing unit processes the input image through the deep learning algorithm and the image processing algorithm in the processor, and sequentially obtains the preliminary rough document area mask, the document area refined mask, the deducted original image area and the affine area. The rectified image after transformation and rectification;
    信息提取单元,通过处理器中的信息提取算法将矫正图像的类别和信息;The information extraction unit corrects the category and information of the image through the information extraction algorithm in the processor;
    信息输出单元,处理器将输入图片提取的类别和信息结果在显示器上显示并存储至存储器。In the information output unit, the processor displays the category and information result extracted from the input picture on the display and stores it in the memory.
  9. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于:所述计算机指令运行时执行权利要求1-7任一项所述方法的步骤。A computer-readable storage medium on which computer instructions are stored, characterized in that: when the computer instructions are executed, the steps of the method according to any one of claims 1-7 are executed.
  10. 一种终端,包括存储器和处理器,其特征在于:所述存储器上储存有注册图片和能够在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行权利要求1-7任一项所述方法的步骤。A terminal, comprising a memory and a processor, characterized in that: the memory stores a registered picture and computer instructions that can be run on the processor, and the processor executes claims 1- when running the computer instructions 7 the steps of any one of the methods.
PCT/CN2020/141443 2020-12-10 2020-12-30 Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal WO2022121039A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011458177.8A CN112686812B (en) 2020-12-10 2020-12-10 Bank card inclination correction detection method and device, readable storage medium and terminal
CN202011458177.8 2020-12-10

Publications (1)

Publication Number Publication Date
WO2022121039A1 true WO2022121039A1 (en) 2022-06-16

Family

ID=75449185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/141443 WO2022121039A1 (en) 2020-12-10 2020-12-30 Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal

Country Status (2)

Country Link
CN (1) CN112686812B (en)
WO (1) WO2022121039A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882489A (en) * 2022-07-07 2022-08-09 浙江智慧视频安防创新中心有限公司 Method, device, equipment and medium for horizontally correcting rotary license plate
CN115272206A (en) * 2022-07-18 2022-11-01 深圳市医未医疗科技有限公司 Medical image processing method, medical image processing device, computer equipment and storage medium
CN115457559A (en) * 2022-08-19 2022-12-09 上海通办信息服务有限公司 Method, device and equipment for intelligently correcting text and license pictures
CN117095423A (en) * 2023-10-20 2023-11-21 上海银行股份有限公司 Bank bill character recognition method and device
CN117315664A (en) * 2023-09-18 2023-12-29 山东博昂信息科技有限公司 Scrap steel bucket number identification method based on image sequence
CN117409261A (en) * 2023-12-14 2024-01-16 成都数之联科技股份有限公司 Element angle classification method and system based on classification model

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033543B (en) * 2021-04-27 2024-04-05 中国平安人寿保险股份有限公司 Curve text recognition method, device, equipment and medium
CN113344000A (en) * 2021-06-29 2021-09-03 南京星云数字技术有限公司 Certificate copying and recognizing method and device, computer equipment and storage medium
CN113870262B (en) * 2021-12-02 2022-04-19 武汉飞恩微电子有限公司 Printed circuit board classification method and device based on image processing and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537219A (en) * 2018-03-20 2018-09-14 上海眼控科技股份有限公司 A kind of intelligent detecting method and device for financial statement outline border
CN108682015A (en) * 2018-05-28 2018-10-19 科大讯飞股份有限公司 Lesion segmentation method, apparatus, equipment and storage medium in a kind of biometric image
JP2018199473A (en) * 2017-05-30 2018-12-20 株式会社Soken Steering-angle determining device and automatic driving vehicle
CN110458161A (en) * 2019-07-15 2019-11-15 天津大学 A kind of mobile robot doorplate location method of combination deep learning
CN111027564A (en) * 2019-12-20 2020-04-17 长沙千视通智能科技有限公司 Low-illumination imaging license plate recognition method and device based on deep learning integration

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110715A (en) * 2019-04-30 2019-08-09 北京金山云网络技术有限公司 Text detection model training method, text filed, content determine method and apparatus
CN110866871A (en) * 2019-11-15 2020-03-06 深圳市华云中盛科技股份有限公司 Text image correction method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018199473A (en) * 2017-05-30 2018-12-20 株式会社Soken Steering-angle determining device and automatic driving vehicle
CN108537219A (en) * 2018-03-20 2018-09-14 上海眼控科技股份有限公司 A kind of intelligent detecting method and device for financial statement outline border
CN108682015A (en) * 2018-05-28 2018-10-19 科大讯飞股份有限公司 Lesion segmentation method, apparatus, equipment and storage medium in a kind of biometric image
CN110458161A (en) * 2019-07-15 2019-11-15 天津大学 A kind of mobile robot doorplate location method of combination deep learning
CN111027564A (en) * 2019-12-20 2020-04-17 长沙千视通智能科技有限公司 Low-illumination imaging license plate recognition method and device based on deep learning integration

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882489A (en) * 2022-07-07 2022-08-09 浙江智慧视频安防创新中心有限公司 Method, device, equipment and medium for horizontally correcting rotary license plate
CN115272206A (en) * 2022-07-18 2022-11-01 深圳市医未医疗科技有限公司 Medical image processing method, medical image processing device, computer equipment and storage medium
CN115457559A (en) * 2022-08-19 2022-12-09 上海通办信息服务有限公司 Method, device and equipment for intelligently correcting text and license pictures
CN115457559B (en) * 2022-08-19 2024-01-16 上海通办信息服务有限公司 Method, device and equipment for intelligently correcting texts and license pictures
CN117315664A (en) * 2023-09-18 2023-12-29 山东博昂信息科技有限公司 Scrap steel bucket number identification method based on image sequence
CN117315664B (en) * 2023-09-18 2024-04-02 山东博昂信息科技有限公司 Scrap steel bucket number identification method based on image sequence
CN117095423A (en) * 2023-10-20 2023-11-21 上海银行股份有限公司 Bank bill character recognition method and device
CN117095423B (en) * 2023-10-20 2024-01-05 上海银行股份有限公司 Bank bill character recognition method and device
CN117409261A (en) * 2023-12-14 2024-01-16 成都数之联科技股份有限公司 Element angle classification method and system based on classification model
CN117409261B (en) * 2023-12-14 2024-02-20 成都数之联科技股份有限公司 Element angle classification method and system based on classification model

Also Published As

Publication number Publication date
CN112686812A (en) 2021-04-20
CN112686812B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
WO2022121039A1 (en) Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal
US20200364443A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
Gou et al. Vehicle license plate recognition based on extremal regions and restricted Boltzmann machines
Zang et al. Vehicle license plate recognition using visual attention model and deep learning
Silva et al. A flexible approach for automatic license plate recognition in unconstrained scenarios
CN101142584B (en) Method for facial features detection
CN108334881B (en) License plate recognition method based on deep learning
CN111310662B (en) Flame detection and identification method and system based on integrated deep network
CN104751142A (en) Natural scene text detection algorithm based on stroke features
Zhang et al. Road recognition from remote sensing imagery using incremental learning
CN110298376A (en) A kind of bank money image classification method based on improvement B-CNN
Dehshibi et al. Persian vehicle license plate recognition using multiclass Adaboost
WO2022121025A1 (en) Certificate category increase and decrease detection method and apparatus, readable storage medium, and terminal
CN112101208A (en) Feature series fusion gesture recognition method and device for elderly people
CN105335760A (en) Image number character recognition method
Gawande et al. SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection
CN110363196B (en) Method for accurately recognizing characters of inclined text
WO2022121021A1 (en) Identity card number detection method and apparatus, and readable storage medium and terminal
Mei et al. A novel framework for container code-character recognition based on deep learning and template matching
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN111062393B (en) Natural scene Chinese character segmentation method based on spectral clustering
CN109325487B (en) Full-category license plate recognition method based on target detection
CN107330436B (en) Scale criterion-based panoramic image SIFT optimization method
Ning Vehicle license plate detection and recognition
CN108171750A (en) The chest handling positioning identification system of view-based access control model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964968

Country of ref document: EP

Kind code of ref document: A1