CN105825243A

CN105825243A - Method and device for certificate image detection

Info

Publication number: CN105825243A
Application number: CN201510007186.8A
Authority: CN
Inventors: 陈岳峰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-01-07
Filing date: 2015-01-07
Publication date: 2016-08-03

Abstract

The invention provides a method and a device for certificate image detection. According to the invention, the position of a certificate image in an image to be detected is detected through a trained certificate detector. Compared with a traditional an edge based identity card detection algorithm, a machine learning based algorithm provided by the invention is more roust, and a problem of complicated background of the certificate image can also be processed. In addition, more accurate detection for the location of the certificate image of a rotary image to be detected is realized through angle correction for the image to be detected.

Description

Certificate image detection method and equipment

Technical Field

The present application relates to the field of communications and computers, and in particular, to a method and apparatus for detecting a document image.

Background

At present, in some website identity authentication programs, a photo of a user holding a certificate of the user needs to be acquired, and then the position of the certificate in the photo needs to be automatically detected from the photo, so that a technical basis is provided for the subsequent identification of certificate information. For example, in a certain shopping website, when a seller performs the real-name authentication of an opening shop, the seller needs to upload a photo of the own handheld identity card, and then needs to automatically detect the position of the identity card in the photo of the handheld identity card of the seller, so that the identity card close-up is shown to the seller in the authentication, and the seller can automatically detect whether the identity card is fuzzy, thereby improving the one-time passing rate of auditing.

However, the traditional edge-based certificate detection algorithm is only suitable for images with simpler background. The background is complex, and the existing method has many false detection areas. In addition, when the original image has rotation, the identity card cannot be detected by the traditional method.

Disclosure of Invention

The application aims to provide a certificate detection method and system, and solves the problem that the certificate or the area where the certificate is located cannot be detected at present.

In view of the above, the present application provides a method for detecting a document image, including:

training a certificate detector cascaded by a plurality of strong classifiers based on the MB-LBP characteristics;

and detecting the certificate image in the image to be detected through the certificate detector.

Further, training a credential detector cascaded by a plurality of strong classifiers based on the MB-LBP features comprises:

acquiring a plurality of certificate images and non-certificate images as positive samples and negative samples respectively;

randomly selecting different areas from the positive sample and the negative sample, and respectively extracting MB-LBP characteristics of the different areas;

training a credential detector cascaded by a plurality of strong classifiers based on the MB-LBP features of the different regions.

Further, training a credential detector cascaded by a plurality of strong classifiers based on the MB-LBP features of the different regions comprises:

based on MB-LBP characteristics of different regions, using a gentleAdaboost strategy to select characteristics, and generating strong classifiers each composed of a plurality of multi-branch regression tree weak classifiers according to the selected characteristics;

a credential detector that cascades a plurality of strong classifiers using a cascadeAdaboost policy.

Further, the extracting the MB-LBP features of the different regions respectively comprises:

dividing each area into a plurality of sub-areas respectively, and calculating the pixel mean value of each sub-area;

obtaining a binary coded MB-LBP characteristic of each region by comparing the pixel mean value of the central sub-region in each region with the pixel mean values of the surrounding sub-regions;

the binary coded MB-LBP features for each region are converted to decimal MB-LBP features.

Further, based on the MB-LBP features of the different regions and using a gentleAdaboost strategy to perform feature selection, generating strong classifiers each composed of a plurality of multi-branch regression tree weak classifiers according to the selected features comprises:

repeating the following steps until the number of the multi-branch regression tree weak classifiers in the current strong classifier reaches the preset number, and outputting the generated strong classifier:

setting the weight of each positive sample or negative sample according to the number of the positive samples and the negative samples, and initializing a strong classifier;

training based on decimal MB-LBP characteristics of different regions and weights of positive samples or negative samples to obtain corresponding multi-branch regression tree weak classifiers;

selecting the multi-branch regression tree weak classifier with the minimum loss from the multi-branch regression tree weak classifiers and adding the multi-branch regression tree weak classifier into a strong classifier;

and updating the weights of the positive sample and the negative sample according to the selected multi-branch regression tree weak classifier.

Further, training to obtain the corresponding multi-branch regression tree weak classifier based on the decimal MB-LBP characteristics of the different regions and the weights of the positive samples or the negative samples comprises:

obtaining a first loss function based on the decimal MB-LBP characteristics of the different regions and the weights of the positive samples or the negative samples;

each multi-branch regression tree weak classifier that yields decimal MB-LBP features corresponding to different regions is trained after minimization by a first penalty function.

Further, the first loss function is expressed as follows:

J_{loss} ({x^{i}}_{LBP}, f_{m} ({x^{i}}_{LBP})) = Σ_{i = 1}^{N} w_{i} {(y_{i} - f_{m} ({x^{i}}_{LBP}))}^{2},

wherein f is_m(xⁱ _LBP) Denotes the m-th weak classifier, N denotes the number of positive and negative samples, xⁱ _LBPDecimal MB-LBP feature, y, representing the area of the ith positive or negative sample_iIndicates the type of sample, y being a positive sample_i1, negative example y_i＝-1，w_iRepresenting the weight of the current i-th positive or negative sample,f_m(xⁱ _LBP) Using a multi-branch tree, f_m(xⁱ _LBP) The definition is as follows:

f_{m} ({x^{i}}_{LBP}) = \{\begin{matrix} a_{0} & x_{LBP} = 0 \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{j} & x_{LBP} = j \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{255} & x_{LBP} = 255 \end{matrix} .

further, in each multi-branch regression tree weak classifier trained to obtain the decimal MB-LBP features corresponding to different regions after the minimization of the first loss function,

by minimizing J_loss(x_LBP,f_m(xⁱ _LBP) To findI.e. f_m(xⁱ _LBP)。

Further, updating the weights of the positive sample and the negative sample according to the selected weak classifier of the multi-branch regression tree, wherein the weights are corrected according to the following formula:

w_{i} = \frac{w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))}{Σ_{i} w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))} .

further, through still include before the certificate detector detects the certificate image in waiting to detect the image:

and correcting the angle of the image to be detected.

Further, the angle correction on the image to be detected includes:

training an ImageNet model;

acquiring certificate image training samples of various rotation angles, and extracting high-level semantic features of each certificate image training sample by using the ImageNet model;

training the high-level semantic features of the certificate image training samples by using a linear support vector machine to obtain a classifier for determining the image rotation angle;

and determining the rotation angle of an image to be detected through the classifier, and adjusting the angle of the image to be detected according to the rotation angle of the image to be detected.

Further, training the ImageNet model comprises:

scaling the sample images in the ImageNet database to the same resolution;

carrying out an average value reduction operation on the sample image;

randomly cutting the sample image subjected to the average value reduction operation to a uniform size;

setting a neural network of the ImageNet model as a convolutional layer and a full-link layer which are connected in sequence;

the randomly cut sample image is transmitted in the forward direction of the neural network, and the loss value of the parameter to be optimized of the ImageNet model is calculated;

and reversely solving the gradient of the parameter to be optimized layer by layer on the neural network through a chain derivation method according to the loss value of the parameter to be optimized, and optimizing the parameter to be optimized by using a random gradient descent algorithm according to the solved gradient to obtain the ImageNet model after the parameter is optimized.

Further, calculating the loss value J (theta) of the parameter to be optimized of the ImageNet model, wherein the loss value J (theta) of the parameter to be optimized is obtained according to a second loss function,

J (θ) = - \frac{1}{n} [Σ_{i = 1}^{n} Σ_{j = 1}^{k} 1 (y (i) = j) \log \frac{e^{x_{i}^{j}}}{Σ_{l = 1}^{k} e^{x_{i}^{j}}}],

where θ represents the parameter to be optimized, n represents the number of sample images, k represents the number of output classes,the output value representing the jth class of the last fully-connected layer of the ith sample.

Further, in the step of reversely solving the gradient of the parameter to be optimized layer by layer on the neural network through a chain derivation method according to the loss value of the parameter to be optimized, the gradient of the parameter to be optimized is solved according to the following formula:

g_{t + 1} = 0.9 \cdot g_{t} - 0.0005 \cdot η \cdot θ_{t} - η &lang; \frac{&PartialD; J (θ)}{&PartialD; θ} | θ_{t} &rang;,

wherein, g_tGradient, g, representing parameter to be optimized for the t-th iteration_t+1A gradient, θ, of the parameter to be optimized representing the t +1 th iteration_tRepresenting the parameter to be optimized for the t-th iteration,means that the second loss function J (theta) is given a value of theta for the parameter theta to be currently optimized_tGradient of time, 0.9. g_tRepresenting the moment of inertia, 0.0005. theta_tIs a regularization constraint on θ, η denotes the learning rate。

Further, in the ImageNet model after the parameters to be optimized are optimized according to the obtained gradient and by using a random gradient descent algorithm to obtain the optimized parameters, the parameters to be optimized are optimized by the following formula:

θ_t+1＝θ_t+g_t+1wherein, theta_t+1Represents the parameters to be optimized for the t +1 th iteration.

Further, the extracting the high-level semantic features of each certificate image training sample by using the ImageNet model comprises the following steps:

the certificate image training sample is scaled to the size which is the same as the size of the sample image which is cut randomly in the ImageNet model;

carrying out mean value reduction operation on the zoomed certificate image training sample;

extracting the full-connection-layer high-level semantic features of each certificate image training sample after the mean value reduction operation by using the ImageNet model after the parameters are optimized;

and carrying out sparsifying operation on the high-level semantic features of each certificate image training sample, and carrying out normalization operation on the sparsified high-level semantic features by adopting a two-norm.

There is also provided according to another aspect of the present application, apparatus for document image detection, comprising:

first means for training a credential detector cascaded by a plurality of strong classifiers based on MB-LBP features;

and the second device is used for detecting the certificate image in the image to be detected through the certificate detector.

Further, the first apparatus includes:

the first module is used for acquiring a plurality of certificate images and non-certificate images as positive samples and negative samples respectively;

a first and second module for randomly selecting different regions from the positive sample and the negative sample, and respectively extracting MB-LBP characteristics of the different regions;

a first third module for training a credential detector cascaded by a plurality of strong classifiers based on the MB-LBP features of the different regions.

Further, the first third module comprises:

the first third unit is used for carrying out feature selection based on MB-LBP features of different regions by using a gentleAdaboost strategy and generating strong classifiers each consisting of a plurality of multi-branch regression tree weak classifiers according to the selected features;

the certificate detector comprises a first third second unit and a second third second unit, wherein the first third second unit is used for cascading a plurality of strong classifiers into the certificate detector by using a cascadeAdaboost strategy.

Further, the first and second modules include:

the first and second units are used for dividing each area into a plurality of sub-areas and calculating the pixel mean value of each sub-area;

a first binary unit for obtaining binary coded MB-LBP features of each region by comparing the pixel mean of the central sub-region within each region with the pixel mean of the surrounding sub-regions;

a first binary-third unit for converting the binary coded MB-LBP features of each region into decimal MB-LBP features.

Further, the first third unit includes:

the first three one-to-one subunit is used for setting the weight of each positive sample or negative sample according to the number of the positive samples and the negative samples and initializing the strong classifier;

the first three-two subunit unit is used for training and obtaining a corresponding multi-branch regression tree weak classifier based on decimal MB-LBP characteristics of different areas and weights of positive samples or negative samples;

the first three-one-three subunit is used for selecting the multi-branch regression tree weak classifier with the minimum loss from the multi-branch regression tree weak classifiers and adding the multi-branch regression tree weak classifier into the strong classifier;

the first three-first-four subunit is used for updating the weights of the positive samples and the negative samples according to the selected multi-branch regression tree weak classifiers, judging whether the number of the multi-branch regression tree weak classifiers in the current strong classifier reaches a preset number, if so, turning to the first three-first-five subunit for execution, and if not, turning to the first three-second subunit for execution;

and the first three-one-five subunit is used for outputting the generated strong classifier.

Further, the first three-two subunit unit is used for obtaining a first loss function based on the decimal MB-LBP characteristics of different areas and the weights of the positive samples or the negative samples, and training each multi-branch regression tree weak classifier corresponding to the decimal MB-LBP characteristics of the different areas after the first loss function is minimized.

Further, in the first loss function obtained by the first three-two subunit, the following is expressed:

J_{loss} ({x^{i}}_{LBP}, f_{m} ({x^{i}}_{LBP})) = Σ_{i = 1}^{N} w_{i} {(y_{i} - f_{m} ({x^{i}}_{LBP}))}^{2},

f_{m} ({x^{i}}_{LBP}) = \{\begin{matrix} a_{0} & x_{LBP} = 0 \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{j} & x_{LBP} = j \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{255} & x_{LBP} = 255 \end{matrix} .

further, the first three-two subunit is formed by minimizing J_loss(x_LBP,f_m(xⁱ _LBP) To find

a_{j} = \frac{Σ_{i} w_{i} y_{i} δ ({x^{i}}_{LBP} = j)}{Σ_{i} w_{i} δ ({x^{i}}_{LBP} = j)}

I.e. f_m(xⁱ _LBP)。

Further, the first three-four subunit updates the weights of the positive samples and the negative samples according to the selected multi-branch regression tree weak classifier, and the weights are corrected according to the following formula:

w_{i} = \frac{w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))}{Σ_{i} w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))} .

further, the apparatus further includes a third device for correcting an angle of the image to be detected.

Further, the third apparatus includes:

a third module for training the ImageNet model;

the third module is used for acquiring certificate image training samples with various rotation angles and extracting the high-level semantic features of each certificate image training sample by using the ImageNet model;

the third module is used for training the high-level semantic features of the certificate image training samples by using a linear support vector machine to obtain a classifier for determining the image rotation angle;

and the third and fourth modules are used for determining the rotation angle of an image to be detected through the classifier and adjusting the angle of the image to be detected according to the rotation angle of the image to be detected.

Further, the third module includes:

a third one-to-one unit for scaling the sample images in the ImageNet database to the same resolution;

the first second unit is used for carrying out an average value reduction operation on the sample image;

a third unit, configured to randomly crop the sample image after the averaging operation to a uniform size;

the third four units are used for setting the neural network of the ImageNet model into a convolutional layer and a full-connection layer which are connected in sequence;

the third unit and the fifth unit are used for transmitting the randomly cut sample image in the forward direction of the neural network and calculating a loss value of a parameter to be optimized of the ImageNet model;

and the third unit, the sixth unit, the fourth unit, the fifth unit and the sixth unit are used for reversely solving the gradient of the parameter to be optimized layer by layer on the neural network through a chain derivation method according to the loss value of the parameter to be optimized, and optimizing the parameter to be optimized according to the solved gradient and by using a random gradient descent algorithm to obtain the ImageNet model after the parameter is optimized.

Further, in the third five unit, the loss value J (theta) of the parameter to be optimized of the ImageNet model is calculated according to the following second loss function,

J (θ) = - \frac{1}{n} [Σ_{i = 1}^{n} Σ_{j = 1}^{k} 1 (y (i) = j) \log \frac{e^{x_{i}^{j}}}{Σ_{l = 1}^{k} e^{x_{i}^{j}}}],

Further, the third sixth unit obtains the gradient of the parameter to be optimized according to the following formula:

g_{t + 1} = 0.9 \cdot g_{t} - 0.0005 \cdot η \cdot θ_{t} - η &lang; \frac{&PartialD; J (θ)}{&PartialD; θ} | θ_{t} &rang;,

wherein, g_tGradient, g, representing parameter to be optimized for the t-th iteration_t+1A gradient, θ, of the parameter to be optimized representing the t +1 th iteration_tRepresenting the parameter to be optimized for the t-th iteration,means that the second loss function J (theta) is given a value of theta for the parameter theta to be currently optimized_tGradient of time, 0.9. g_tRepresenting the moment of inertia, 0.0005. theta_tIs a regularization constraint on θ, η denotes the learning rate.

Further, the third sixth unit optimizes the parameter to be optimized by the following formula:

Further, the third module comprises:

the third one-second unit is used for acquiring certificate image training samples with various rotation angles;

a second unit, which is used for scaling the certificate image training sample to the same size as the sample image cut randomly in the ImageNet model;

the third and the third units are used for carrying out mean value reduction operation on the zoomed certificate image training sample;

a third, a second and a fourth unit, configured to extract the full-connected layer high-level semantic features of each certificate image training sample after the mean reduction operation by using the ImageNet model after the parameter optimization;

and the third twenty-fifth unit is used for performing sparsification operation on the high-level semantic features of each certificate image training sample and performing normalization operation on the sparse high-level semantic features by adopting a two-norm.

Compared with the prior art, the position of the certificate image in the image to be detected is detected through the trained certificate detector, and compared with the traditional edge-based identity card detection algorithm, the algorithm based on machine learning is more robust, and the problem that the background of the certificate image is complex can be solved.

In addition, the angle of the image to be detected is corrected, so that the rotating image to be detected is more accurately detected at the position of the certificate image.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a document image detection method according to one aspect of the present application;

FIG. 2 illustrates a flow chart of a method of credential image detection in a preferred embodiment of the present application;

FIG. 3 shows a flow chart of a document image detection method of a preferred embodiment of the present application;

FIG. 4 illustrates a detection schematic of a credential detector of an embodiment of the present application;

FIG. 5 shows a flow chart of a document image detection method according to another preferred embodiment of the present application;

FIG. 6 illustrates a schematic diagram of binary, decimal coded MB-LBP feature acquisition for each region according to an embodiment of the present application;

FIG. 7 is a flow chart of a document image detection method according to yet another preferred embodiment of the present application;

FIG. 8 is a flow chart of a document image detection method according to yet another preferred embodiment of the present application;

FIG. 9 shows a flow chart of a document image detection method of another preferred embodiment of the present application;

FIG. 10 shows a flow chart of a document image detection method of another preferred embodiment of the present application;

FIG. 11 is a flow chart of a document image detection method according to yet another preferred embodiment of the present application;

FIG. 12 illustrates a block diagram of a neural network of the ImageNet model of an embodiment of the present application;

FIG. 13 shows a flow chart of a document image detection method of yet another preferred embodiment of the present application;

FIG. 14 shows a block diagram of an apparatus for document image inspection in another aspect of the present application;

FIG. 15 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application;

FIG. 16 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application;

FIG. 17 shows a block diagram of an apparatus for document image inspection in accordance with another preferred embodiment of the present application;

FIG. 18 shows a block diagram of an apparatus for document image inspection in accordance with yet a further preferred embodiment of the present application;

FIG. 19 shows a block diagram of an apparatus for document image inspection in accordance with another preferred embodiment of the present application;

FIG. 20 shows a block diagram of an apparatus for document image inspection in accordance with another preferred embodiment of the present application;

FIG. 21 shows a block diagram of an apparatus for document image inspection in accordance with yet a further preferred embodiment of the present application;

FIG. 22 shows a block diagram of an apparatus for document image inspection in accordance with yet a further preferred embodiment of the present application;

fig. 23 shows a schematic diagram of an embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmyedia), such as modulated data signals and carrier waves.

Fig. 1 shows a flow diagram of a document image detection method according to an aspect of the present application, which, in conjunction with fig. 1, proposes a document image detection method comprising:

step S1, training a certificate detector formed by a plurality of strong classifiers in a cascade manner based on the characteristics of MB-LBP (multiscale Block LocalBinaryPattern);

and step S3, detecting the certificate image in the image to be detected through the certificate detector. Herein, the document image includes, but is not limited to, an identification card image, a passport image, a work card image, a student card image, and a driver card image. In this step, detecting the certificate image in the image to be detected not only means detecting whether the certificate image exists in the image to be detected, but also includes detecting the position of the certificate image in the image to be detected. The position of the certificate image in the image to be detected is detected through the trained certificate detector, compared with the traditional edge-based identity card detection algorithm, the algorithm based on machine learning is more robust, and the problem that the background of the certificate image is complex can be solved.

FIG. 2 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. Referring to fig. 2, step S1 in fig. 1 includes:

step S11, acquiring a plurality of certificate images and non-certificate images as positive samples and negative samples respectively; for example, 103 positive samples and 500 negative samples can be used in this step, where a positive sample is an identification card image with a resolution of 40 × 60, and a negative sample is an arbitrary image without an identification card;

step S12, randomly selecting different areas from the positive sample and the negative sample, and respectively extracting MB-LBP characteristics of the different areas;

step S13, training a certificate detector cascaded by a plurality of strong classifiers based on the MB-LBP characteristics of different areas. Here, each strong classifier is generated by a plurality of multi-branch regression tree weak classifiers each trained from MB-LBP features based on the positive and negative samples.

FIG. 3 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. Referring to fig. 3, step S13 in fig. 2 includes:

s131, based on MB-LBP characteristics of different regions, using a gentleAdaboost strategy to select characteristics, and generating strong classifiers each composed of a plurality of multi-branch regression tree weak classifiers according to the selected characteristics;

and step S132, cascading the strong classifiers into the certificate detector by using a cascadeAdaboost strategy. Since the MB-LBP characteristics of the different regions are binary 01 codes, it cannot be measured with a general distance. Therefore, the weak classifier is trained by adopting a multi-branch regression tree strategy, then the gentleAdaboost strategy is used for feature selection, the strong classifier is combined, and finally the cascadeAdaboost strategy is introduced to accelerate the certificate detection algorithm.

In a preferred embodiment of the present application, the certificate detector has a smaller number of multi-branch regression tree weak classifiers of the preceding strong classifier and a larger number of multi-branch regression tree weak classifiers of the succeeding strong classifier. In order to make the trained detector have higher detection rate and lower time complexity, the detector for detecting the identity card is trained by adopting a strategy of cascading strong classifiers. The training process of the cascade classifier mainly comprises the combination of strong classifiers, the more the former strong classifiers are simpler, and the higher detection rate is ensured, so the former strong classifiers are mainly used for filtering candidate areas which are not identity cards. The later cascading of strong classifiers requires a higher accuracy rate to ensure that areas that are not identification cards are filtered. For example, a credential detector cascaded with 16 strong classifiers can be trained. For example, in step S13, a document detector formed by 16 strong classifiers in cascade is generated, and accordingly, in step S3, the image to be detected is input to the document detector, the document detector is a process of filtering non-document areas step by step when detecting a document area, a determination is made by the cascade of strong classifiers by calculating the MB-LBP feature in the candidate area, if not, the area is directly considered as not a document, and if it is judged as a document by the strong classifiers, the next strong classifier is entered for determination, and only when all strong classifiers consider as documents, the area is the position of the document. It will be understood by those skilled in the art that the above description of the training of credential detectors is by way of example only and that other existing or future training of credential detectors, as applicable to the present application, is intended to be included within the scope of the present application and is hereby incorporated by reference.

FIG. 5 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. Referring to fig. 5, the extracting the MB-LBP features of the different regions in step S12 in fig. 2 includes:

step S121, dividing each area into a plurality of sub-areas, and calculating the pixel mean value of each sub-area;

step S122, obtaining the binary coded MB-LBP characteristics of each area by comparing the pixel mean value of the central sub-area in each area with the pixel mean values of the surrounding sub-areas;

step S123, the binary coded MB-LBP characteristics of each region are converted into decimal MB-LBP characteristics. For example, as shown in fig. 6, if each of the positive and negative samples is divided into 9 rectangular regions in S121, the binary code of each of the positive or negative samples can be converted into a decimal eigenvalue x by the following formula_LBP：

x_{LBP} = Σ_{b = 1}^{8} s (g_{b} - g_{c}) 2^{b},

Wherein,

s (g_{b} - g_{c}) = \{\begin{matrix} 1 & g_{b} - g_{c} > 0 \\ 0 & g_{b} - g_{c} \leq 0 \end{matrix},

b denotes the serial number of one of the 8 surrounding areas, g_bMean pixel value, g, representing one of 8 surrounding regions_cRepresenting the average pixel value of the center region, the binary-coded MB-LBP feature of the region in fig. 6 is 11101001, which translates to a decimal MB-LBP feature of 223. It will be understood by those skilled in the art that the foregoing description of the extracted MB-LBP features is by way of example only and that other existing or future extracted MB-LBP features, as applicable to the present application, are intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

FIG. 7 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. With reference to fig. 7, step S131 in fig. 3 includes:

step S1311, setting weight of each positive sample or negative sample according to the number N of positive samples and negative samplesInitializing strong classifier f (x) 0;

step S1312, training and obtaining a corresponding multi-branch regression tree weak classifier based on the decimal MB-LBP characteristics of different regions and the weights of the positive samples or the negative samples;

step S1313, selecting the multi-branch regression tree weak classifier with the minimum loss from the multi-branch regression tree weak classifiers and adding the multi-branch regression tree weak classifier into the strong classifier;

step S1314, updating the weights of the positive samples and the negative samples according to the selected multi-branch regression tree weak classifiers, and determining whether the number of the multi-branch regression tree weak classifiers in the current strong classifier reaches a preset number, if so, going to step S1315, and if not, going to step S1312;

step S1315, outputting the generated strong classifier

FIG. 8 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. Referring to fig. 8, step S1312 in fig. 7 includes:

step S13121, obtaining a first loss function based on the decimal MB-LBP characteristics of the different regions and the weights of the positive samples or the negative samples;

step S13122, each multi-branch regression tree weak classifier is trained to obtain the decimal MB-LBP features corresponding to different regions after the minimization of the first loss function.

In a preferred embodiment of the present application, the first loss function in step S13121 is expressed as follows:

J_{loss} ({x^{i}}_{LBP}, f_{m} ({x^{i}}_{LBP})) = Σ_{i = 1}^{N} w_{i} {(y_{i} - f_{m} ({x^{i}}_{LBP}))}^{2},

f_{m} ({x^{i}}_{LBP}) = \{\begin{matrix} a_{0} & x_{LBP} = 0 \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{j} & x_{LBP} = j \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{255} & x_{LBP} = 255 \end{matrix} .

in a preferred embodiment of the present application, step S13122 is performed by minimizing J_loss(x_LBP,f_m(xⁱ _LBP) To findI.e. f_m(xⁱ _LBP). It will be understood by those skilled in the art that the above description of loss functions is by way of example only and that other existing or future loss functions, as may be applicable to the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In a preferred embodiment of the present application, in the step S1314, the weights of the positive samples and the negative samples are updated according to the selected weak classifier of the multi-branch regression tree, and the weights are corrected according to the following formula:

it will be understood by those skilled in the art that the foregoing description of weight correction is by way of example only, and that other existing or future weight corrections, as may be applicable to the present application, are intended to be included within the scope of the present application and are hereby incorporated by reference.

FIG. 9 shows a flow chart of a document image detection method of a preferred embodiment of the present application. With reference to fig. 9, step S3 in fig. 1 further includes:

and step S2, correcting the angle of the image to be detected, thereby realizing more accurate detection of the position of the certificate image on the rotated image to be detected.

FIG. 10 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. Referring to fig. 10, step S2 in fig. 9 includes:

step S21, training an ImageNet model; specifically, image classification is a very active research direction in the fields of computer vision, pattern recognition and machine learning, in recent years, with the soundness of large-scale image data sets, the accuracy of image classification algorithms is greatly improved, and image classification competition platforms based on ImageNet databases (over 100 ten thousand samples, 1000 classes) are available every year, and deep convolutional neural networks are widely used in the field of image understanding at present, including image classification, image retrieval, target detection and the like. Particularly in the aspect of image classification, the accuracy of an image classification algorithm based on a deep convolutional neural network is 10 percent higher than that of a traditional BOW model, when the sample size is small, an image classification model for training the deep convolutional neural network is prone to overfitting, and therefore the generalization capability of the model is poor;

step S22, certificate image training samples of various rotation angles are obtained, and the ImageNet model is used for extracting the high-level semantic features of each certificate image training sample; specifically, there are thousands of different age groups in the internet environment, users with different characters, the image to be detected uploaded by the user, such as the handheld certificate photo, is also strange, through data analysis, it is found that the handheld photo uploaded by many users is rotated, and generally the image to be detected can be divided into normal rotation of 0 degree, rotation of 90 degrees, rotation of 180 degrees, and rotation of 270 degrees, and for these characteristics, the present application adopts an image classification mode to determine the rotation angle of the image to be detected, and herein, a plurality of same type certificate images without rotation can be collected, and then these certificate images are rotated randomly, such as respectively rotating 90 degrees, 180 degrees, and 270 degrees, so as to obtain 4 certificate images with different rotation directions as the certificate image training sample for determining the rotation direction of this type of certificate images, such as 8609 samples (I) can be obtained (i.e., 8609 samples (I) can be obtained_i，y_i) Specifically, the following is formulated:

r＝random(0，4)，

(I_{i}, y_{i}) = \{\begin{matrix} (I, 0) & r = 0 \\ (R_{90} (I), 1) & r = 1 \\ (R_{180} (I), 2) & r = 2 \\ (R_{270} (I), 3) & r = 3 \end{matrix},

wherein, I_iRepresenting certificate image training samples, y_iLabel, R representing certificate image training sample_a(I) Presentation to document image trainingThe sample is rotated a degrees. It will be understood by those skilled in the art that the above description of the document image training sample is by way of example only, and that other existing or future document image training samples, as may be suitable for use in the present application, are also intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

Step S23, training the high-level semantic features of the certificate image training samples by using a linear support vector machine to obtain a classifier for determining the image rotation angle;

and step S24, determining the rotation angle of an image to be detected through the classifier, and adjusting the angle of the image to be detected according to the rotation angle of the image to be detected. When the image to be detected is input, the classifier trained in step S23 for determining the rotation angle of the image to be detected is used to determine the rotation angle of the image to be detected, the image to be detected is adjusted to 0 rotation angle, and then the certificate detector trained in step S1 is used to detect the position of the certificate of the adjusted image to be detected, for example, if the image of the certificate is detected, the certificate area can be displayed in an enlarged manner, and if the image is not detected, the user is prompted that the uploaded image to be detected does not meet the standard.

FIG. 11 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. Referring to fig. 11, step S21 in fig. 10 includes:

step S211, zooming the sample images in the ImageNet database to the same resolution, for example, zooming all the sample images in the ImageNet database to 256 × 256 × 3, preferably, in order to obtain better zooming effect, when the resolution of the sample images in the ImageNet database is greater than 256, adopting an ANTI-ALIASING (ANTI-ALIASING) image zooming algorithm, and when the resolution of the sample images in the ImageNet database is less than 256, adopting a BICUBIC (BICUBIC) zooming algorithm, thereby reducing the loss caused by image zooming, obtaining the sample image I zoomed to the same resolution by the following formula_new：

I_{new} = \{\begin{matrix} {Resize}_{ANTIALIAS}^{256} (I) & I_{w} &GreaterEqual; 256 and I_{h} &GreaterEqual; 256 \\ {Resize}_{BICUBIC}^{256} (I) & others \end{matrix},

Wherein I denotes a sample image before scaling, I_wDenotes the width of the sample image I before scaling, I_hThe representation represents the height of the sample image I before scaling,indicating that the image scaling algorithm using antialiasing (ANTI-ALIASING) scales the sample images in the ImageNet database to a resolution of 256 × 256,showing scaling of sample images in the ImageNet database to a resolution of 256 × 256, those skilled in the art will appreciate that the above description of scaling of sample images is by way of example only, and that other existing or existing images may be usedScaling of sample images that may occur in the future, as applicable to the present application, is also intended to be encompassed within the scope of the present application and is hereby incorporated by reference.

Step S212, carrying out an average value reduction operation on the sample image; where the mean of the sample image is normalized to 0, the sample image I after the subtraction operation_new2This can be obtained by the following formula:

wherein n represents the number of sample images, and i represents the serial number of a certain sample image in the n sample images;

step S213, randomly cropping the sample image after the average value reduction operation to a uniform size, wherein in order to make the model have translation invariance, the zoomed image needs to be randomly cropped when the ImageNet model is trained, the cropping resolution is set to 32, so that the image which is originally zoomed to the resolution 256 × 256 × 3 is cropped to the resolution 224 × 224 × 3, and the randomly cropped sample image I can be obtained by the following formula_new3：

x₀＝rand(0,31),y₀＝rand(0,31)

Wherein x is₀X-direction cropping start point, y, representing random cropping of sample image after the averaging operation₀Y-direction crop start point, I, representing random crop of sample image after mean reduction_new3(x, y) represents the pixel value of a certain pixel point of the sample image after random cropping, I_new2(x+x₀,y+y₀) Representing the pixel value of a certain pixel point of a sample image before random cropping

Step S214, setting the neural network of the ImageNet model as a convolutional layer and a full-link layer which are connected in sequence; here, an ImageNet model of a deep convolutional neural network may be trained on a GPU (graphics processing unit), the convolutional neural network may perform a convolution operation on each image in a calculation process, and the convolution operation of intensive sampling may cause a very high calculation complexity, so that the training process may be performed by using the GPU, for example, the GPU model may be teslak20 series GPUs, and compared with a conventional neural network, the deep convolutional neural network mainly has 2 features, one is convolution for extracting image features, and the other is that the depth of the network is relatively deep, and the model includes a large number of parameters. As shown in fig. 12, the neural network of the ImageNet model comprises 5 convolutional layers and 3 fully-connected layers, which may contain 650 ten thousand parameters to be optimized in total, the method comprises the following steps that 5 convolutional layers are respectively CPN1, CPN2, CONV3, CONV4 and CP5, 3 full-connection layers are respectively FC6, FC7 and FC8, CPN1 sequentially performs convolution operation (filter, padding, stride, filter), pooling operation (sizex, stride) and normalization operation, CPN2 sequentially performs convolution operation (filter, grouping, padding, stride, filter), pooling operation (sizex, stride) and normalization operation, CONV3 sequentially performs convolution operation (filter, padding, stride, filter) and normalization operation, CONV4 sequentially performs convolution operation (filter, padding, stride, filter) and normalization operation, CP5 sequentially performs convolution operation (convolution operation, padding, stride, filter) and normalization operation), CP5 sequentially performs FC 5 sequentially outputs 896, and FC 4096 outputs; it will be understood by those skilled in the art that the foregoing descriptions of the neural network of the ImageNet model are merely exemplary, and that other existing or future neural networks of the ImageNet model, as applicable to the present application, are also intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

Step S215, the randomly cut sample image is transmitted in the forward direction of the neural network, and the loss value of the parameter to be optimized of the ImageNet model is calculated;

and S216, reversely solving the gradient of the parameter to be optimized layer by layer on the neural network through a chain derivation method according to the loss value of the parameter to be optimized, and optimizing the parameter to be optimized according to the obtained gradient and by using a random gradient descent algorithm (SGD) to obtain an ImageNet model after the parameter is optimized.

In a preferred embodiment of the present application, the loss value J (θ) of the parameter to be optimized in step S215 is obtained according to a second loss function,

J (θ) = - \frac{1}{n} [Σ_{i = 1}^{n} Σ_{j = 1}^{k} 1 (y (i) = j) \log \frac{e^{x_{i}^{j}}}{Σ_{l = 1}^{k} e^{x_{i}^{j}}}],

where θ represents the parameter to be optimized, n represents the number of sample images, k represents the number of output classes,is shown asThe output value of the jth class of the last fully-connected layer of the i samples. It will be understood by those skilled in the art that the above description of loss functions is by way of example only and that other existing or future loss functions, as may be applicable to the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In a preferred embodiment of the present application, in step S216, the gradient of the parameter to be optimized is obtained according to the following formula:

g_{t + 1} = 0.9 \cdot g_{t} - 0.0005 \cdot η \cdot θ_{t} - η &lang; \frac{&PartialD; J (θ)}{&PartialD; θ} | θ_{t} &rang;,

wherein, g_tGradient, g, representing parameter to be optimized for the t-th iteration_t+1A gradient, θ, of the parameter to be optimized representing the t +1 th iteration_tRepresenting the parameter to be optimized for the t-th iteration,means that the second loss function J (theta) is given a value of theta for the parameter theta to be currently optimized_tGradient of time, 0.9. g_tRepresenting the moment of inertia, the introduction of which causesThe decline and gradient update of the loss function J (theta) are more stable, 0.0005 & theta_tIs a regularization constraint on θ, η denotes a learning rate, generally, the learning rate is set to 0.01 at the beginning of training, and the learning rate is adjusted to be 10 times smaller, i.e., 0.001, when the loss function J (θ) is not decreased after training.

In a more preferred embodiment of the present application, in step S216, the parameter to be optimized is optimized according to the following formula:

θ_t+1＝θ_t+g_t+1wherein, theta_t+1Represents the parameters to be optimized for the t +1 th iteration. It will be understood by those skilled in the art that the foregoing description of optimizing a parameter to be optimized is by way of example only, and that other existing or future approaches to optimizing a parameter to be optimized, such as may be applicable to the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

FIG. 13 shows a flow chart of a certificate image detection method of a preferred embodiment of the present application. With reference to fig. 13, in step S22 of fig. 10, the extracting the high-level semantic features of each certificate image training sample using the ImageNet model includes:

step S221, the certificate image training sample is scaled to the size which is the same as the size of the sample image which is cut randomly in the ImageNet model; here, the certificate image training sample can be scaled to the same size as the randomly cropped sample image in the ImageNet model by using bilinear interpolation, for example, if the resolution of the ImageNet model input data trained in S213, i.e., the randomly cropped sample image, is 224 × 224 × 3, the certificate image training sample is directly scaled to 224 × 224 × 3;

step S222, performing mean value reduction operation on the zoomed certificate image training sample; specifically, in the step, the mean value of the certificate image is normalized to 0, and the certificate image training sample I after the mean value reduction operation_new5This can be obtained by the following formula:

wherein, I_new4Representing certificate image training samples before the mean value reduction operation, n ' representing the number of the certificate image training samples, and i ' representing the serial number of a certain certificate image training sample in n ' sample images;

step S223, extracting the full-connection-layer high-level semantic features of each certificate image training sample after the mean value reduction operation by using the ImageNet model after the parameters are optimized; here, extracted are the 3 fully-connected layer (f6, f7, f8 fully-connected layer) high-level semantic features of each credential image training sample, e.g., a total high-level semantic feature dimension of 9192 may be extracted;

and S224, performing sparsification operation on the high-level semantic features of each certificate image training sample, and performing normalization operation on the sparse high-level semantic features by adopting a two-norm. Here, performing the thinning operation is realized by the following equation:

F_{new}^{i} = \{\begin{matrix} F^{i} & F^{i} &GreaterEqual; 0 \\ 0 & otherwise \end{matrix},

wherein, FⁱRepresenting the ith dimension high-level semantic feature before the thinning operation,representing the ith dimension high-level semantic features after the thinning operation;

the normalization operation of the high-level semantic features subjected to the sparsification operation by adopting a two-norm is realized by the following formula:

wherein,representing the high-level semantic features after normalization operation. It will be understood by those skilled in the art that the foregoing descriptions of thinning and normalization are by way of example only, and that other existing or future forms of thinning and normalization, as applicable to the present application, are intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

Preferably, in S23, a linear support vector machine is used to train the high-level semantic features of the sparse and normalized certificate image training samples to obtain a classifier for determining the image rotation angle. Here, since the high-level semantic features of the certificate image training samples subjected to the thinning and normalization, that is, the certificate image training samples in step S224, have the characteristics of sparseness and high dimensionality, such features are often linearly separable. Therefore, a linear support vector machine is adopted as a classifier for judging the rotation angle of the certificate image. In the actual training process, all training samples can be divided into 5 parts, 4 parts are used for training as certificate image training samples, 1 part is used for testing as certificate image testing samples, and after cross checking, the accuracy of the classifier for determining the image rotation angle obtained by training is 99%.

The embodiment can be realized by C + + language, in the ImageNet model training stage, the GPU is used for training, the type of the GPU can be teslaK20, and in addition, the determination of the rotation direction of the image to be detected and the detection of the certificate area can be realized by C + + language. Those of ordinary skill in the art should understand that the foregoing description of implementations is by way of example only, and other implementations are possible and contemplated as may become available or apparent hereinafter and are intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

Fig. 14 shows a block diagram of an apparatus for document image inspection according to another aspect of the present application which, in conjunction with fig. 14, proposes an apparatus 100 for document image inspection comprising:

a first device 1 for training a certificate detector cascaded by a plurality of strong classifiers based on MB-LBP characteristics;

and the second device 2 is used for detecting the certificate image in the image to be detected through the certificate detector. Herein, the document image includes, but is not limited to, an identification card image, a passport image, a work card image, a student card image, and a driver card image. In this step, detecting the certificate image in the image to be detected not only means detecting whether the certificate image exists in the image to be detected, but also includes detecting the position of the certificate image in the image to be detected. The position of the certificate image in the image to be detected is detected through the trained certificate detector, compared with the traditional edge-based identity card detection algorithm, the algorithm based on machine learning is more robust, and the problem that the background of the certificate image is complex can be solved.

Fig. 15 shows a block diagram of an apparatus for document image inspection according to a preferred embodiment of the present application, and in conjunction with fig. 15, the first device 1 in fig. 14 includes:

the first module 11 is used for acquiring a plurality of certificate images and non-certificate images as positive samples and negative samples respectively; for example, the first module 11 may use 103 positive samples and 500 negative samples, where a positive sample is an id card image with a resolution of 40 × 60, and a negative sample is an arbitrary image without an id card;

a first and second module 12, configured to randomly select different regions from the positive sample and the negative sample, and extract MB-LBP features of the different regions respectively;

a first third module 13 for training a credential detector cascaded by a plurality of strong classifiers based on the MB-LBP features of the different regions. Here, each strong classifier is generated by a plurality of multi-branch regression tree weak classifiers each trained from MB-LBP features based on the positive and negative samples.

FIG. 16 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. With reference to fig. 16, the first third module 13 in fig. 15 includes:

a first third unit 131, configured to perform feature selection based on MB-LBP features of different regions and using a gentleAdaboost strategy, and generate strong classifiers each composed of multiple multi-branch regression tree weak classifiers according to the selected features;

a first third second unit 132 for a credential detector that cascades a plurality of strong classifiers using the cascadaboost policy. Since the MB-LBP characteristics of the different regions are binary 01 codes, it cannot be measured with a general distance. Therefore, the weak classifier is trained by adopting a multi-branch regression tree strategy, then the gentleAdaboost strategy is used for feature selection, the strong classifier is combined, and finally the cascadeAdaboost strategy is introduced to accelerate the certificate detection algorithm.

In a preferred embodiment of the present application, the certificate detector has a smaller number of multi-branch regression tree weak classifiers of the preceding strong classifier and a larger number of multi-branch regression tree weak classifiers of the succeeding strong classifier. In order to make the trained detector have higher detection rate and lower time complexity, the detector for detecting the identity card is trained by adopting a strategy of cascading strong classifiers. The training process of the cascade classifier mainly comprises the combination of strong classifiers, the more the former strong classifiers are simpler, and the higher detection rate is ensured, so the former strong classifiers are mainly used for filtering candidate areas which are not identity cards. The later cascading of strong classifiers requires a higher accuracy rate to ensure that areas that are not identification cards are filtered. For example, a credential detector cascaded with 16 strong classifiers can be trained. For example, the first third module 13 generates a certificate detector composed of 16 strong classifiers in cascade as shown in fig. 4, and accordingly, the second device 2 inputs the image to be detected into the certificate detector, the certificate detector is a process of filtering non-certificate regions step by step when detecting the certificate region, a judgment is made by the strong classifiers in cascade through calculating the MB-LBP feature in the candidate region, if the certificate region is not the certificate region, the region is directly considered as not the certificate, if the certificate is judged by the strong classifiers, the next strong classifier is entered for judgment, and the region is the position of the certificate only when all the strong classifiers consider as the certificate. It will be understood by those skilled in the art that the above description of the training of credential detectors is by way of example only and that other existing or future training of credential detectors, as applicable to the present application, is intended to be included within the scope of the present application and is hereby incorporated by reference.

FIG. 17 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. With reference to fig. 17, the first two-module 12 in fig. 15 includes:

a first-second unit 121, configured to divide each region into a plurality of sub-regions, and calculate a pixel average value of each sub-region;

a first binary unit 122 for obtaining binary coded MB-LBP features of each region by comparing the pixel mean of the central sub-region within each region with the pixel mean of the surrounding sub-regions;

a first binary-third unit 123 for converting the binary coded MB-LBP features of each region into decimal MB-LBP features. For example, as shown in fig. 6, if the first binary unit 121 divides each of the positive and negative samples into 9 rectangular areas, the following can be disclosedEquation converts the binary code of each positive or negative sample into a decimal characteristic value x_LBP：

x_{LBP} = Σ_{b = 1}^{8} s (g_{b} - g_{c}) 2^{b},

Wherein,

s (g_{b} - g_{c}) = \{\begin{matrix} 1 & g_{b} - g_{c} > 0 \\ 0 & g_{b} - g_{c} \leq 0 \end{matrix},

b denotes the serial number of one of the 8 surrounding areas, g_bRepresenting the average of one of the 8 surrounding regionsPixel value, g_cRepresenting the average pixel value of the center region, the binary-coded MB-LBP feature of the region in fig. 6 is 11101001, which translates to a decimal MB-LBP feature of 223. It will be understood by those skilled in the art that the foregoing description of the extracted MB-LBP features is by way of example only and that other existing or future extracted MB-LBP features, as applicable to the present application, are intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

FIG. 18 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. Referring to fig. 18, the first triple unit 131 in fig. 16 includes:

a first three one-to-one subunit 1311, configured to set a weight of each positive sample or negative sample according to the number of the positive samples and the negative samples, and initialize the strong classifier; here, the weight of each positive sample or negative sample may be set according to the number N of positive samples and negative samplesInitializing strong classifier f (x) 0;

a first three-two subunit 1312, configured to obtain a corresponding multi-branch regression tree weak classifier based on the decimal MB-LBP features of different regions and the weight training of the positive samples or the negative samples;

a first third subunit 1313, configured to select the multi-branch regression tree weak classifier with the minimum loss from the multi-branch regression tree weak classifiers and add the selected multi-branch regression tree weak classifier to the strong classifier;

a first third-fourth subunit 1314, configured to update the weights of the positive samples and the negative samples according to the selected multi-branch regression tree weak classifiers, and determine whether the number of multi-branch regression tree weak classifiers in the current strong classifier reaches a preset number, if yes, go to the first third-first-fifth subunit 1315 for execution, and if no, go to the first third-second subunit 1312 for execution;

a first three-five subunit 1315 for outputting the generated strong classifier

F (x) = sign [Σ_{m = 1}^{M} f_{m} (x)] .

In a preferred embodiment of the present application, the first three-two subunit 1312 is configured to obtain a first loss function based on the decimal MB-LBP features of different regions and the weights of the positive samples or the negative samples, and train each multi-branch regression tree weak classifier corresponding to the decimal MB-LBP features of different regions after minimizing the first loss function.

In a preferred embodiment of the present application, the first loss function obtained by the first three-two subunit is expressed as follows:

J_{loss} ({x^{i}}_{LBP}, f_{m} ({x^{i}}_{LBP})) = Σ_{i = 1}^{N} w_{i} {(y_{i} - f_{m} ({x^{i}}_{LBP}))}^{2},

f_{m} ({x^{i}}_{LBP}) = \{\begin{matrix} a_{0} & x_{LBP} = 0 \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{j} & x_{LBP} = j \\ \cdot \cdot \cdot & \cdot \cdot \cdot \\ a_{255} & x_{LBP} = 255 \end{matrix} .

in a preferred embodiment of the present application, the first three-two subunit 13122 is implemented by minimizing J_loss(x_LBP,f_m(xⁱ _LBP) To find

a_{j} = \frac{Σ_{i} w_{i} y_{i} δ ({x^{i}}_{LBP} = j)}{Σ_{i} w_{i} δ ({x^{i}}_{LBP} = j)}

Namely, it is

It will be understood by those skilled in the art that the above description of loss functions is by way of example only and that other existing or future loss functions, as may be applicable to the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In a preferred embodiment of the present application, the first three-four subunit 1314 updates the weights of the positive and negative samples according to the selected weak classifiers of the multi-branch regression tree, and the weights are corrected according to the following formula:

w_{i} = \frac{w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))}{Σ_{i} w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))} .

FIG. 19 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. With reference to fig. 19 and 14, the apparatus further includes a third device 3 for correcting the angle of the image to be detected, so as to perform more accurate detection of the position of the certificate image on the rotated image to be detected.

FIG. 20 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. With reference to fig. 20, the third device 3 of fig. 19 comprises:

a third module 31 for training the ImageNet model; specifically, image classification is a very active research direction in the fields of computer vision, pattern recognition and machine learning, in recent years, with the soundness of large-scale image data sets, the accuracy of image classification algorithms is greatly improved, and image classification competition platforms based on ImageNet databases (over 100 ten thousand samples, 1000 classes) are available every year, and deep convolutional neural networks are widely used in the field of image understanding at present, including image classification, image retrieval, target detection and the like. Particularly in the aspect of image classification, the accuracy of an image classification algorithm based on a deep convolutional neural network is 10 percent higher than that of a traditional BOW model, when the sample size is small, an image classification model for training the deep convolutional neural network is prone to overfitting, and therefore the generalization capability of the model is poor;

a third second module 32, which acquires certificate image training samples with various rotation angles, and extracts the high-level semantic features of each certificate image training sample by using the ImageNet model; specifically, there are thousands of different age groups in the internet environment, users with different characters, the image to be detected uploaded by the user, such as the handheld certificate photo, is also strange, through data analysis, it is found that the handheld photo uploaded by many users is rotated, and generally the image to be detected can be divided into normal rotation of 0 degree, rotation of 90 degrees, rotation of 180 degrees, and rotation of 270 degrees, and for these characteristics, the present application adopts an image classification mode to determine the rotation angle of the image to be detected, and herein, a plurality of same type certificate images without rotation can be collected, and then these certificate images are rotated randomly, such as respectively rotating 90 degrees, 180 degrees, and 270 degrees, so as to obtain 4 certificate images with different rotation directions as the certificate image training sample for determining the rotation direction of this type of certificate images, such as 8609 samples (I) can be obtained (i.e., 8609 samples (I) can be obtained_i，y_i) Specifically, the following is formulated:

r＝random(0，4)，

(I_{i}, y_{i}) = \{\begin{matrix} (I, 0) & r = 0 \\ (R_{90} (I), 1) & r = 1 \\ (R_{180} (I), 2) & r = 2 \\ (R_{270} (I), 3) & r = 3 \end{matrix},

wherein, I_iRepresenting certificate image training samples, y_iLabel, R representing certificate image training sample_a(I) The method represents that the certificate image training sample is rotated by a degrees. It will be understood by those skilled in the art that the above description of the document image training sample is by way of example only, and that other existing or future document image training samples, as may be suitable for use in the present application, are also intended to be encompassed within the scope of the present application and are hereby incorporated by reference.

The third module 33 trains the high-level semantic features of the certificate image training samples by using a linear support vector machine to obtain a classifier for determining the image rotation angle;

and a third and fourth module 34, determining a rotation angle of an image to be detected through the classifier, and adjusting the angle of the image to be detected according to the rotation angle of the image to be detected. When the image to be detected is input, the classifier trained in step S23 for determining the rotation angle of the image to be detected is used to determine the rotation angle of the image to be detected, the image to be detected is adjusted to 0 rotation angle, and then the certificate detector trained in step S1 is used to detect the position of the certificate of the adjusted image to be detected, for example, if the image of the certificate is detected, the certificate area can be displayed in an enlarged manner, and if the image is not detected, the user is prompted that the uploaded image to be detected does not meet the standard.

FIG. 21 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. With reference to fig. 21, the third module 31 in fig. 20 includes:

a third one-to-one unit 311, configured to scale the sample images in the ImageNet database to the same resolution;

a first two-unit 312, configured to perform an averaging operation on the sample image;

a third unit 312, configured to randomly crop the sample image after the averaging operation to a uniform size;

a third fourth unit 314, configured to set the neural network of the ImageNet model as a convolutional layer and a fully-connected layer that are sequentially connected;

a third fifth unit 315, configured to forward propagate the randomly cropped sample image in the neural network, and calculate a loss value of a parameter to be optimized of the ImageNet model;

and a third sixth unit 316, configured to reversely find, layer by layer, a gradient of the parameter to be optimized on the neural network through a chain derivation method according to the loss value of the parameter to be optimized, and optimize the parameter to be optimized according to the found gradient and by using a random gradient descent algorithm to obtain an ImageNet model after the parameter is optimized.

In a preferred embodiment of the present application, the third fifth unit 315 calculates a loss value J (θ) of the parameter to be optimized of the ImageNet model according to a second loss function,

J (θ) = - \frac{1}{n} [Σ_{i = 1}^{n} Σ_{j = 1}^{k} 1 (y (i) = j) \log \frac{e^{x_{i}^{j}}}{Σ_{l = 1}^{k} e^{x_{i}^{j}}}],

In a preferred embodiment of the present application, the third six unit 316 obtains the gradient of the parameter to be optimized according to the following formula:

g_{t + 1} = 0.9 \cdot g_{t} - 0.0005 \cdot η \cdot θ_{t} - η &lang; \frac{&PartialD; J (θ)}{&PartialD; θ} | θ_{t} &rang;,

In a preferred embodiment of the present application, the third six unit 316 optimizes the parameter to be optimized according to the following formula:

FIG. 22 shows a block diagram of an apparatus for document image inspection in accordance with a preferred embodiment of the present application. Referring to fig. 22, in fig. 20, the third module 32 includes:

a third first unit 321, configured to acquire credential image training samples at multiple rotation angles;

a third second unit 322, configured to scale the certificate image training sample to the same size as the sample image randomly cut in the ImageNet model;

a third unit 323, configured to perform an average value reduction operation on the scaled certificate image training sample;

a third fourth unit 324, configured to extract the full-link layer high-level semantic features of each certificate image training sample after the averaging operation by using the ImageNet model after the parameter optimization;

and a third fifth unit 325, configured to perform sparsification operation on the high-level semantic features of each certificate image training sample, and perform normalization operation on the sparse-operated high-level semantic features by using a two-norm.

In a specific application embodiment, it is required to detect whether an identification card image exists in a handheld photo provided by a user person, and if so, determine the position of the identification card image in the handheld photo. As shown in fig. 23, with the method of the present application, first, a handheld photograph 201 is obtained, the rotation angle of the handheld photograph 201 can be determined by the classifier obtained in step S23, an angle-adjusted photograph 202 is obtained according to the rotation angle of the handheld photograph 201, and a handheld photograph 203 framing the position of the identity card image is obtained by training in step S1, where the identity card detector is used to detect the identity card image in the image to be detected.

To sum up, this application detects out the position of certificate image in waiting to detect the image through the certificate detector that trains, compares with traditional ID card detection algorithm based on the edge, and this application is more robust based on the algorithm of machine learning, and can handle the problem that the background of certificate image is complicated.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method of document image detection, comprising:

2. The method of claim 1, wherein training a credential detector cascaded by a plurality of strong classifiers based on MB-LBP features comprises:

3. The method of claim 2, wherein training the credential detector cascaded by a plurality of strong classifiers based on the MB-LBP features of the different regions comprises:

4. The method of claim 3, wherein extracting the MB-LBP features of the different regions respectively comprises:

5. The method of claim 4, wherein the feature selection is performed based on the MB-LBP features of the different regions and using a GentleAdaboost strategy, and the generating strong classifiers each composed of a plurality of multi-branch regression tree weak classifiers according to the selected features comprises:

6. The method of claim 5, wherein training the corresponding multi-branch regression tree weak classifiers based on the decimal MB-LBP features of the different regions and the weights of the positive or negative samples comprises:

7. The method of claim 6, wherein the first loss function is represented as follows:

J_{loss} ({x^{i}}_{LBP}, f_{m} ({x^{i}}_{LBP})) = Σ_{i = 1}^{N} w_{i} {(y_{i} - f_{m} ({x^{i}}_{LBP}))}^{2},

f_{m} ({x^{i}}_{LBP}) = \{\begin{matrix} a_{0} & x_{LBP} = 0 \\ . . . & . . . \\ a_{j} & x_{LBP} = j \\ . . . & . . . \\ a_{255} & x_{LBP} = 255 \end{matrix} .

8. the method of claim 7, wherein in each multi-branch regression tree weak classifier trained to decimal MB-LBP features corresponding to different regions after minimization by a first penalty function,

by minimizing J_loss(x_LBP,f_m(xⁱ _LBP) To findI.e. f_m(xⁱ _LBP)。

9. The method of claim 8, wherein the weights of the positive and negative samples are updated according to the selected multi-branch regression tree weak classifiers, the weights being corrected according to:

w_{i} = \frac{w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))}{Σ_{i} w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))} .

10. the method of any one of claims 1 to 9, wherein detecting the document image in the image to be detected by the document detector further comprises, prior to:

and correcting the angle of the image to be detected.

11. The method of claim 10, wherein correcting the angle for the image to be detected comprises:

training an ImageNet model;

12. The method of claim 11, wherein training the ImageNet model comprises:

scaling the sample images in the ImageNet database to the same resolution;

carrying out an average value reduction operation on the sample image;

13. The method of claim 12, wherein the calculating the loss value of the parameter to be optimized of the ImageNet model, the loss value J (θ) of the parameter to be optimized is obtained according to a second loss function,

J (θ) = - \frac{1}{n} [Σ_{i = 1}^{n} Σ_{j = 1}^{k} 1 (y (i) = j) \log \frac{e^{x_{i}^{j}}}{Σ_{l = 1}^{k} e^{x_{i}^{j}}}],

14. The method according to claim 13, wherein, in the step-by-step inverse gradient of the parameter to be optimized on the neural network by means of chain derivation, the gradient of the parameter to be optimized is determined according to the following formula:

g_{t + 1} = 0.9 \cdot g_{t} - 0.0005 \cdot η \cdot θ_{t} - η &lang; \frac{&PartialD; J (θ)}{&PartialD; θ} | θ_{t} &rang;,

15. The method of claim 14, wherein the parameter to be optimized is optimized according to the obtained gradient and using a random gradient descent algorithm to obtain an optimized parameter in the ImageNet model, and the parameter to be optimized is optimized according to the following formula:

16. The method of claim 12, wherein extracting high-level semantic features of each certificate image training sample using the ImageNet model comprises:

17. An apparatus for document image detection, comprising:

18. The apparatus of claim 17, wherein the first means comprises:

19. The apparatus of claim 18, wherein the first third module comprises:

20. The apparatus of claim 19, wherein the first two modules comprise:

21. The apparatus of claim 20, wherein the first triad comprises:

22. The apparatus of claim 21, wherein the first three-two subunit is configured to derive a first penalty function based on the decimal MB-LBP features of the different regions and weights of the positive samples or the negative samples, and each multi-branch regression tree weak classifier that derives the decimal MB-LBP features corresponding to the different regions is trained after minimization of the first penalty function.

23. The apparatus of claim 22, wherein the first loss function obtained by the first three-two subunit is expressed as follows:

J_{loss} ({x^{i}}_{LBP}, f_{m} ({x^{i}}_{LBP})) = Σ_{i = 1}^{N} w_{i} {(y_{i} - f_{m} ({x^{i}}_{LBP}))}^{2},

f_{m} ({x^{i}}_{LBP}) = \{\begin{matrix} a_{0} & x_{LBP} = 0 \\ . . . & . . . \\ a_{j} & x_{LBP} = j \\ . . . & . . . \\ a_{255} & x_{LBP} = 255 \end{matrix} .

24. the apparatus of claim 23, wherein the first three-two subunit passes through minimizing J_loss(x_LBP,f_m(xⁱ _LBP) To findI.e. f_m(xⁱ _LBP)。

25. The apparatus of claim 24, wherein the first three-four subunit updates weights of the positive and negative samples according to the selected multi-branch regression tree weak classifier, the weights being corrected according to:

w_{i} = \frac{w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))}{Σ_{i} w_{i} \exp (- y_{i} f_{m} (x_{LBP}^{i}))} .

26. the apparatus of any one of claims 17 to 25, further comprising a third means for correcting an angle of the image to be detected.

27. The apparatus of claim 26, wherein the third means comprises:

a third module for training the ImageNet model;

28. The apparatus of claim 27, wherein the third module comprises:

29. The apparatus of claim 28, wherein a third five unit calculates loss values of parameters to be optimized of the ImageNet model, wherein the loss values J (θ) of the parameters to be optimized are obtained according to a second loss function,

J (θ) = - \frac{1}{n} [Σ_{i = 1}^{n} Σ_{j = 1}^{k} 1 (y (i) = j) \log \frac{e^{x_{i}^{j}}}{Σ_{l = 1}^{k} e^{x_{i}^{j}}}],

30. The device according to claim 29, wherein the third six unit finds the gradient of the parameter to be optimized according to the following formula:

g_{t + 1} = 0.9 \cdot g_{t} - 0.0005 \cdot η \cdot θ_{t} - η &lang; \frac{&PartialD; J (θ)}{&PartialD; θ} | θ_{t} &rang;,

31. The apparatus of claim 30, wherein the third-sixth unit optimizes the parameter to be optimized by:

32. The apparatus of claim 28, wherein the third module comprises: