CN110414592A

CN110414592A - A kind of Digital verification code recognition methods based on multi-task learning

Info

Publication number: CN110414592A
Application number: CN201910672921.5A
Authority: CN
Inventors: 宋晓茹; 吴雪; 高嵩; 陈超波; 李继超; 彭雨豪
Original assignee: Xian Technological University
Current assignee: Xian Technological University
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2019-11-05

Abstract

The Digital verification code recognition methods based on multi-task learning that the present invention relates to a kind of, this method specifically include that 4 identifying code training sample sets for firstly generating simulation；Operation is normalized to Digital verification code image；Design convolutional neural networks model, feature extraction is carried out using 2 shared convolution pond layers, the full articulamentum in 4 in parallel 2 carries out 4 digital predictions in identifying code respectively, then random initializtion is carried out to convolutional neural networks model, it is constantly trained by multitask loss function, finally obtains Digital verification code identification model；Digital verification code to be identified is input in trained identifying code identification model, final identifying code predicted figure is obtained.Method of the invention only needs an identification process, avoid the operation that is split of number in identifying code is reduced because partitioning algorithm it is bad caused by the low problem of identifying code discrimination, improve the robustness of identifying code identification.

Description

A kind of Digital verification code recognition methods based on multi-task learning

Technical field

The invention belongs to computer visions and field of artificial intelligence, are related to a kind of Digital verification code recognition methods；Tool Body is related to a kind of volume neural network Digital verification code recognition methods using multi-task learning method.

Background technique

Along with the development of every science and technology, the especially development in computer science and technology field, network is giving people Life while bring great convenience, safety problem also becomes increasingly conspicuous.Network verification code, which is used as, is responsible for protection network account First of system of defense of number security system, is mainly used for resisting rogue program, prevents abuse Internet resources.Identifying code is known automatically Existing safety of verification code can be improved in other technology, and helps to design safer identifying code, and then effectively ensure that network Safety, it has also become a most important problem.

With the development of deep learning and artificial intelligence technology, image recognition is carried out using convolutional neural networks and has become heat Point, the most common technology for carrying out identifying code identification using convolutional neural networks are first located identifying code image in advance It manages, be sub-partitioned into individual digit, be finally sent into convolutional neural networks and identified.Digital verification code is wherein divided into single number It is time-consuming serious that word is re-fed into the process that convolutional neural networks are identified, and the quality of dividing method directly influences subsequent authentication The accuracy of code identification.Therefore, studying a kind of efficient method for recognizing verification code has important practical value.

Summary of the invention

The Digital verification code recognition methods based on multi-task learning that the present invention provides a kind of, to solve prior art utilization Convolutional neural networks are to the problem for needing segmentation and time-consuming in Digital verification code identification process.

In order to reach the purpose of the present invention, the technical solution of the adopted offer of the present invention is:

A kind of Digital verification code recognition methods based on multi-task learning, comprising the following steps:

Step (1), the identifying code training sample set containing 4 bit digitals for generating simulation；

Step (2) carries out one-hot coding to each digital label of identifying code；

Pretreatment is normalized to Digital verification code training sample set in step (3)；

Step (4), the convolutional neural networks model for designing multi-task learning；

Step (5) is trained using normalized training sample set, obtains trained Digital verification code identification mould Type；

Operation is normalized to new unknown images, and utilizes trained Digital verification code identification model for step (6) It is identified.

Further, in the step (1), including Mnist data set, 4 identifying code data sets of synthesis and division are obtained Training set and test set.

Further, in the step (3), the normalization operation including Digital verification code image.

Further, in the step (4), the choosing of design and activation primitive including convolutional neural networks model framework Take: 2 convolution pond layers of design are tested as feature extraction layer, the full articulamentum in 4 in parallel 2 as multi-task learning layer 4 digital predictions in code are demonstrate,proved, choose relu as activation primitive.

Further, in the step (5), setting, parameter initialization method, loss function including training the number of iterations With the selection of backpropagation optimization algorithm.

Beneficial effects of the present invention:

Multi-task learning is introduced Digital verification code identification by method proposed by the invention.First to identifying code image into Row normalization operation reduces the influence to subsequent identification；Secondly using the shared method with multi-task learning of convolution, 2 are rolled up Product pond layer is used as the feature extraction of entire image, using the full articulamentum in 4 in parallel 2 to every number in Digital verification code Word is predicted, is avoided the digital segmentation operation in conventional digital identifying code identification process, is greatly reduced identifying code identification Time, improve identifying code identification robustness.

Detailed description of the invention

Fig. 1 is the identifying code number of synthesis, wherein Fig. 1 (a) is the Digital verification code figure without blank character, and Fig. 1 (b) is Digital verification code figure containing blank character；

Fig. 2 is the multitask identifying code identification model based on convolutional neural networks that the present invention designs；

Fig. 3 is prediction result figure.

Specific embodiment

Invention is further described in detail with specific implementation with reference to the accompanying drawing.

The Digital verification code recognition methods based on multi-task learning that the present invention provides a kind of, comprising the following steps:

Step 1, Mnist data set is downloaded, synthesizes the identifying code image containing 4-digit number, the parts of images of synthesis is as schemed Shown in 1 (a) and Fig. 1 (b), white space is expressed as number 10.

Download address: http://yann.lecun.com/exdb/mnist/

Step 2, number 0~10 is subjected to one-hot coding, for example 0 is encoded to 10000000000,1 and is encoded to 01000000000, blank is encoded to 00000000001.

Step 3, operation is normalized to the identifying code image of synthesis.

Step 4, the convolutional neural networks model for designing multi-task learning specifically includes 2 convolution pond layers as shared Feature extractor, 42 parallel full articulamentums predict 4 numbers in identifying code as multitask output model respectively, Nonlinear activation function in convolutional neural networks model is selected as relu activation primitive.

Step 5: being trained using normalized training sample set, obtain trained Digital verification code identification model. Specifically include the selection of trained the number of iterations, parameter initialization method, cost function and backpropagation undated parameter method.

Step 6: prediction: operation being normalized to new unknown images, and is identified using trained Digital verification code Model is identified.

Specific embodiments of the present invention are as follows:

Step 1: the downloading hand-written volumetric data set of http://yann.lecun.com/exdb/mnist/ includes 70000 in total The picture of 28*28, wherein 60000 training images and 10000 test images, 10 numbers in total.Then using existing Mnist data acquisition system Cheng Xin containing 4 digital identifying code images, then dividing training set is 50000, and verifying collection is 10000, test set is 10000.And in view of containing blank character, blank character is expressed as 10.

Step 2, number 0~9 and blank character are subjected to one-hot coding, as shown in the table.

0:10000000000	1:01000000000
		2:00100000000	3:00010000000
4:00001000000	5:00000100000
		6:00000010000	7:00000001000
8:00000000100	9:00000000010
		Blank character 10:00000000001

Step 3: by each pixel value of Digital verification code image divided by 255, normalizing between [0,1].

Step 4: designing the convolutional neural networks model of multi-task learning, specifically include 2 convolution pond layers as shared Feature extractor, 42 parallel full articulamentums predict 4 numbers in identifying code as multitask output model respectively, Nonlinear activation function in convolutional neural networks model is selected as relu activation primitive.It is specifically based on the convolution of multi-task learning The identification of neural network Digital verification code is as shown in Figure 2.

Step 5: being trained using normalized training sample set, obtain trained Digital verification code identification model. Specifically include choose training the number of iterations be 20 steps, parameter initial method be cutting gearbox method, definition optimization Cost function is cross entropy loss function, and the method for defining backpropagation undated parameter is Adam optimization algorithm, and Adam study is calculated The step of method undated parameter, is as follows:

(1) it is concentrated from training data and takes out the small lot data { x comprising m sample₁, x₂... x_m, the corresponding mesh of data Mark uses y_iIt indicates.

(2) gradient of every weight parameter of m training sample of t moment is calculated:

Wherein, L_wFor cross entropy loss function, w is the parameter of convolutional neural networks.

(3) momentum index weighted average are calculated:

S=ρ₁s+(1-ρ₁)g

Wherein ρ₁General value is that 0.9, s is the first moment that initial value is 0.

(4) accumulation squared gradient is calculated

R=ρ₂r+(1-ρ₂)g*g

Wherein ρ₂General value is that 0.990, r is the second moment that initial value is 0.

(5) drift correction is carried out to momentum index weighted average

(6) drift correction is carried out to accumulation squared gradient:

(7) renewal amount of weighting parameter is calculated:

Wherein, δ is the constant established for numerical stability, general value 10^-7。

(8) weighting parameter is updated:

W=w+ △ w

In the algorithm, the momentum index weighted average and accumulation squared gradient of the variable quantity of each parameter and its own It is related, it can achieve the purpose of the different learning rate of different parameter adaptations.

Step 6: prediction: normalized being done to new unknown images, and enterprising in trained Digital verification code model Row prediction, one of prediction result are as shown in Figure 3.

Method provided by the invention is taken in training set by being emulated on the Digital verification code data set of synthesis The discrimination for obtaining 96.9% achieves 95.2% discrimination in verifying concentration.Compared to using image preprocessing, segmentation, again The method for carrying out single character recognition using convolutional neural networks, the invention avoids Digital verification codes to do the process divided, benefit The time of Digital verification code identification is improved with the shared method with multi-task learning of convolution, reduces and is led because partitioning algorithm is bad The not high problem of the identifying code discrimination of cause increases the robustness of identifying code identification.

Use above specific case is illustrated the present invention, is merely used to help understand the present invention, not to limit The system present invention.For those skilled in the art, according to the thought of the present invention, can also make several simple It deduces, deform or replaces.

Claims

1. a kind of Digital verification code recognition methods based on multi-task learning, which comprises the following steps:

Step (5) is trained using normalized training sample set, obtains trained Digital verification code identification model；

Operation is normalized to new unknown images, and is carried out using trained Digital verification code identification model for step (6) Identification.

2. the Digital verification code recognition methods based on multi-task learning according to claim 1, which is characterized in that the step (1) in, including 4 Mnist data set, synthesis identifying code data sets is obtained and divide training set and test set.

3. the Digital verification code recognition methods based on multi-task learning according to claim 1, which is characterized in that the step (3) in, the normalization operation including Digital verification code image.

4. the Digital verification code recognition methods based on multi-task learning according to claim 1, which is characterized in that the step (4) in, the selection of design and activation primitive including convolutional neural networks model framework: 2 convolution pond layers of design are as special Extract layer is levied, the full articulamentum in 4 in parallel 2 carries out 4 digital predictions in identifying code as multi-task learning layer, chooses Relu is as activation primitive.

5. the Digital verification code recognition methods based on multi-task learning according to claim 1, which is characterized in that the step (5) in, the selection of setting, parameter initialization method, loss function and backpropagation optimization algorithm including training the number of iterations.