CN109993169A

CN109993169A - One kind is based on character type method for recognizing verification code end to end

Info

Publication number: CN109993169A
Application number: CN201910288585.4A
Authority: CN
Inventors: 梁延灼
Original assignee: Shandong Inspur Cloud Information Technology Co Ltd
Current assignee: Shandong Inspur Cloud Information Technology Co Ltd
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2019-07-09

Abstract

The present invention is more particularly directed to one kind based on character type method for recognizing verification code end to end.It should generate that identifying code is similar has marked identifying code picture with original based on character type method for recognizing verification code end to end, first acquisition identifying code picture, pretreatment identifying code picture, identifying code was labeled as training sample；Then, it constructs based on depth learning technology and utilizes ready training sample set training convolutional neural networks model；Identifying code picture to be identified is identified using trained convolutional neural networks model.It can should quickly and accurately identify with distortion, adhesion, the character type identifying code containing interfering line and noise based on character type method for recognizing verification code end to end, and greatly improve automatic test efficiency.

Description

One kind is based on character type method for recognizing verification code end to end

Technical field

The present invention relates to image procossings and technical field of character recognition, in particular to a kind of to be tested based on character type end to end Demonstrate,prove code recognition methods.

Background technique

Identifying code (Completely Automated PublicTuring test to tell Computers and Humans Apart, abbreviation CAPTCHA) it is a kind of open turing test for distinguishing the mankind and computer.Computer in order to prevent Program simulates human behavior and carries out Brute Force password, malice brush ticket, sends the activities such as a large amount of junk information, a large amount of web station systems (such as e-banking system, transaction system, community forum) is all provided with identifying code mechanism.

Currently, common identifying code includes the letter and number of distortion, adhesion, or there are interfering lines and much noise point etc. Situation, What is more to prevent machine recognition identifying code by way of sliding block, click.Although identifying code is to a certain extent The machine behavior of malice is prevented, is increased when also carrying out automatic test to the platform comprising identifying code but then very big Human cost and time and cost, seriously affected automatic test efficiency.

Therefore, how quickly and accurately automatic identification identifying code at key urgently to be resolved in automatic test course Problem.

Based on the above situation, for distortion, adhesion, the character identifying code containing interfering line and noise spot is contained, in conjunction with depth Learning method, the invention proposes one kind based on character type method for recognizing verification code end to end, is not necessarily to separating character, Neng Goucong Method of the identifying code picture starting point to end automatic identification identifying code.

Summary of the invention

In order to compensate for the shortcomings of the prior art, the present invention provides it is a kind of be simple and efficient tested based on character type end to end Demonstrate,prove code recognition methods.

The present invention is achieved through the following technical solutions:

One kind is based on character type method for recognizing verification code end to end, it is characterised in that: the following steps are included:

(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked verifying with original Code picture, is labeled as training sample identifying code；

(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks Model；

(C) identifying code picture to be identified is identified using trained convolutional neural networks model.

In the step (A), identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, is passed through Thresholding algorithm pre-processes former identifying code image, and original picture is converted to grayscale image, and carries out in binaryzation removal picture A large amount of noises.

In the step (A), band distortion, adhesion are generated and/or containing interference using the picture processing packet Pillow of python The similar with former identifying code of line has marked identifying code picture.

The ImageFont module in packet Pillow is handled using picture, and font similar with origin authentication code and text are set Size；Carrying out rotation to image using transform () method makes it generate distortion effects；It is image increasing using Draw module Add interfering line.

In the step (B), specifically includes the following steps:

(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training Sample carries out model pre-training；

(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final again Model；

(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional Neural net Final recognition result is obtained in network model.

In the step (1), convolutional neural networks model, the convolutional neural networks model are built using tensorflow Including three convolutional layers, four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.

Three convolutional layers are used to extract the profound feature of picture；Four activation primitives select ReLU function to make For nonlinear activation function；Three pond layers are responsible for reducing sampling, retain most significant feature；Four Dropout Layer is responsible for the data of a part of node of random drop in the training process to mitigate over-fitting；The Softmax output layer is used for Obtain final recognition result.

In the model pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, every wheel Batch number is 128.

The beneficial effects of the present invention are: should be based on character type method for recognizing verification code end to end, it can be quickly and accurately Identification has distortion, adhesion, the character type identifying code containing interfering line and noise, greatly improves automatic test efficiency.

Detailed description of the invention

Attached drawing 1 is that the present invention is based on character type method for recognizing verification code schematic diagrames end to end.

Specific embodiment

In order to which technical problems, technical solutions and advantages to be solved are more clearly understood, tie below Drawings and examples are closed, the present invention will be described in detail.It should be noted that specific embodiment described herein is only used To explain the present invention, it is not intended to limit the present invention.

It should be based on character type method for recognizing verification code end to end, comprising the following steps:

In the step (B), specifically includes the following steps:

In order to save time and money cost and make training pattern that there is good recognition effect, need to acquire a large amount of mark The identifying code picture being poured in is as training sample.One is artificial marks for the common method of mark identifying code picture at present；It is another Kind is to give stamp platform to carry out payment mark.Since required training sample is excessive (about needing 20,000 samples), using method One will increase a large amount of human cost, and will increase a large amount of monetary cost using method two.It should be based on character type end to end Method for recognizing verification code, proposition first automatically generates identifying code similar as far as possible with identifying code to be identified, then by the verifying of generation Code carries out pre-training to model as training sample, secondly obtains a small amount of former identifying code picture (about one thousand sheets) and is manually marked Note, is trained again using model of this sample set to pre-training, and fine tuning model parameter knows that having for model preferably Other effect.Specific identification process is as follows:

A, a large amount of similar identifying code pictures are generated

(1) identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, to origin authentication code figure Piece is pre-processed.Original picture is converted into grayscale image, and carries out a large amount of noises in binaryzation removal picture.

(2) identifying code is generated using the picture processing packet Pillow of python.Specifically utilize ImageFont mould therein Font similar with origin authentication code and text size is arranged in block；Carrying out rotation to image using transform () method makes it Generate distortion effects；It is that image increases interfering line using Draw module.

B, convolutional neural networks model is built

Convolutional neural networks model is built using tensorflow.

Convolutional neural networks model specifically includes that (1) three convolutional layer, main function are the profound level spies for extracting picture Sign；(2) four activation primitives select ReLU function as nonlinear activation function；(3) three pond layers, important function are Sampling is reduced, most significant feature is retained；(4) four Dropout layers, effect is that random drop is a part of in the training process The data of node mitigate over-fitting；(5) Softmax output layers, for obtaining final recognition result.

C, training convolutional neural networks model

Using the similar identifying code picture generated in step B as training sample set, it is 0.001 that initial learning rate, which is arranged, 20 wheels are trained in total, and the batch number of every wheel is 128, then starts training pattern.After model pre-training, re-enter artificial The former identifying code picture marked carries out the model that training is finally needed again as training sample, to pre-training model.

D, identifying code identification is carried out using trained model

Trained model is inputed to after identifying code picture to be identified is carried out gray processing and binary conversion treatment, through model Final recognition result is obtained after prediction.

Claims

1. one kind is based on character type method for recognizing verification code end to end, which comprises the following steps:

(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked identifying code figure with original Piece is labeled as training sample identifying code；

2. according to claim 1 based on character type method for recognizing verification code end to end, it is characterised in that: the step (A) in, identifying code picture difficulty and the subsequent model training speed of raising is generated in order to reduce, original is verified by thresholding algorithm Code image is pre-processed, and original picture is converted to grayscale image, and carries out a large amount of noises in binaryzation removal picture.

3. according to claim 1 based on character type method for recognizing verification code end to end, it is characterised in that: the step (A) in, using python picture processing packet Pillow generate band distort, adhesion and/or containing interfering line with former identifying code phase As marked identifying code picture.

4. according to claim 3 based on character type method for recognizing verification code end to end, it is characterised in that: utilize picture Font similar with origin authentication code and text size is arranged in ImageFont module in processing packet Pillow；It utilizes Transform () method, which carries out rotation to image, makes it generate distortion effects；It is that image increases interfering line using Draw module.

5. according to claim 1, described in any one of 2,3 or 4 based on character type method for recognizing verification code end to end, It is characterized in that, in the step (B), specifically includes the following steps:

(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training sample, Carry out model pre-training；

(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final model again；

(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional neural networks mould Final recognition result is obtained in type.

6. according to claim 5 based on character type method for recognizing verification code end to end, it is characterised in that: the step (1) in, convolutional neural networks model is built using tensorflow, the convolutional neural networks model includes three convolutional layers, Four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.

7. according to claim 6 based on character type method for recognizing verification code end to end, it is characterised in that: described three Convolutional layer is used to extract the profound feature of picture；Four activation primitives select ReLU function as nonlinear activation letter Number；Three pond layers are responsible for reducing sampling, retain most significant feature；Described four Dropout layers are responsible for training The data of a part of node of random drop mitigate over-fitting in journey；The Softmax output layer is for obtaining final identification As a result.

8. according to claim 5 based on character type method for recognizing verification code end to end, it is characterised in that: the model In pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, and the batch number of every wheel is 128.