CN109993169A - One kind is based on character type method for recognizing verification code end to end - Google Patents

One kind is based on character type method for recognizing verification code end to end Download PDF

Info

Publication number
CN109993169A
CN109993169A CN201910288585.4A CN201910288585A CN109993169A CN 109993169 A CN109993169 A CN 109993169A CN 201910288585 A CN201910288585 A CN 201910288585A CN 109993169 A CN109993169 A CN 109993169A
Authority
CN
China
Prior art keywords
identifying code
picture
character type
training
type method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910288585.4A
Other languages
Chinese (zh)
Inventor
梁延灼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Cloud Information Technology Co Ltd
Original Assignee
Shandong Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Cloud Information Technology Co Ltd filed Critical Shandong Inspur Cloud Information Technology Co Ltd
Priority to CN201910288585.4A priority Critical patent/CN109993169A/en
Publication of CN109993169A publication Critical patent/CN109993169A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention is more particularly directed to one kind based on character type method for recognizing verification code end to end.It should generate that identifying code is similar has marked identifying code picture with original based on character type method for recognizing verification code end to end, first acquisition identifying code picture, pretreatment identifying code picture, identifying code was labeled as training sample;Then, it constructs based on depth learning technology and utilizes ready training sample set training convolutional neural networks model;Identifying code picture to be identified is identified using trained convolutional neural networks model.It can should quickly and accurately identify with distortion, adhesion, the character type identifying code containing interfering line and noise based on character type method for recognizing verification code end to end, and greatly improve automatic test efficiency.

Description

One kind is based on character type method for recognizing verification code end to end
Technical field
The present invention relates to image procossings and technical field of character recognition, in particular to a kind of to be tested based on character type end to end Demonstrate,prove code recognition methods.
Background technique
Identifying code (Completely Automated PublicTuring test to tell Computers and Humans Apart, abbreviation CAPTCHA) it is a kind of open turing test for distinguishing the mankind and computer.Computer in order to prevent Program simulates human behavior and carries out Brute Force password, malice brush ticket, sends the activities such as a large amount of junk information, a large amount of web station systems (such as e-banking system, transaction system, community forum) is all provided with identifying code mechanism.
Currently, common identifying code includes the letter and number of distortion, adhesion, or there are interfering lines and much noise point etc. Situation, What is more to prevent machine recognition identifying code by way of sliding block, click.Although identifying code is to a certain extent The machine behavior of malice is prevented, is increased when also carrying out automatic test to the platform comprising identifying code but then very big Human cost and time and cost, seriously affected automatic test efficiency.
Therefore, how quickly and accurately automatic identification identifying code at key urgently to be resolved in automatic test course Problem.
Based on the above situation, for distortion, adhesion, the character identifying code containing interfering line and noise spot is contained, in conjunction with depth Learning method, the invention proposes one kind based on character type method for recognizing verification code end to end, is not necessarily to separating character, Neng Goucong Method of the identifying code picture starting point to end automatic identification identifying code.
Summary of the invention
In order to compensate for the shortcomings of the prior art, the present invention provides it is a kind of be simple and efficient tested based on character type end to end Demonstrate,prove code recognition methods.
The present invention is achieved through the following technical solutions:
One kind is based on character type method for recognizing verification code end to end, it is characterised in that: the following steps are included:
(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked verifying with original Code picture, is labeled as training sample identifying code;
(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks Model;
(C) identifying code picture to be identified is identified using trained convolutional neural networks model.
In the step (A), identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, is passed through Thresholding algorithm pre-processes former identifying code image, and original picture is converted to grayscale image, and carries out in binaryzation removal picture A large amount of noises.
In the step (A), band distortion, adhesion are generated and/or containing interference using the picture processing packet Pillow of python The similar with former identifying code of line has marked identifying code picture.
The ImageFont module in packet Pillow is handled using picture, and font similar with origin authentication code and text are set Size;Carrying out rotation to image using transform () method makes it generate distortion effects;It is image increasing using Draw module Add interfering line.
In the step (B), specifically includes the following steps:
(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training Sample carries out model pre-training;
(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final again Model;
(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional Neural net Final recognition result is obtained in network model.
In the step (1), convolutional neural networks model, the convolutional neural networks model are built using tensorflow Including three convolutional layers, four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.
Three convolutional layers are used to extract the profound feature of picture;Four activation primitives select ReLU function to make For nonlinear activation function;Three pond layers are responsible for reducing sampling, retain most significant feature;Four Dropout Layer is responsible for the data of a part of node of random drop in the training process to mitigate over-fitting;The Softmax output layer is used for Obtain final recognition result.
In the model pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, every wheel Batch number is 128.
The beneficial effects of the present invention are: should be based on character type method for recognizing verification code end to end, it can be quickly and accurately Identification has distortion, adhesion, the character type identifying code containing interfering line and noise, greatly improves automatic test efficiency.
Detailed description of the invention
Attached drawing 1 is that the present invention is based on character type method for recognizing verification code schematic diagrames end to end.
Specific embodiment
In order to which technical problems, technical solutions and advantages to be solved are more clearly understood, tie below Drawings and examples are closed, the present invention will be described in detail.It should be noted that specific embodiment described herein is only used To explain the present invention, it is not intended to limit the present invention.
It should be based on character type method for recognizing verification code end to end, comprising the following steps:
(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked verifying with original Code picture, is labeled as training sample identifying code;
(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks Model;
(C) identifying code picture to be identified is identified using trained convolutional neural networks model.
In the step (A), identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, is passed through Thresholding algorithm pre-processes former identifying code image, and original picture is converted to grayscale image, and carries out in binaryzation removal picture A large amount of noises.
In the step (A), band distortion, adhesion are generated and/or containing interference using the picture processing packet Pillow of python The similar with former identifying code of line has marked identifying code picture.
The ImageFont module in packet Pillow is handled using picture, and font similar with origin authentication code and text are set Size;Carrying out rotation to image using transform () method makes it generate distortion effects;It is image increasing using Draw module Add interfering line.
In the step (B), specifically includes the following steps:
(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training Sample carries out model pre-training;
(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final again Model;
(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional Neural net Final recognition result is obtained in network model.
In the step (1), convolutional neural networks model, the convolutional neural networks model are built using tensorflow Including three convolutional layers, four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.
Three convolutional layers are used to extract the profound feature of picture;Four activation primitives select ReLU function to make For nonlinear activation function;Three pond layers are responsible for reducing sampling, retain most significant feature;Four Dropout Layer is responsible for the data of a part of node of random drop in the training process to mitigate over-fitting;The Softmax output layer is used for Obtain final recognition result.
In the model pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, every wheel Batch number is 128.
In order to save time and money cost and make training pattern that there is good recognition effect, need to acquire a large amount of mark The identifying code picture being poured in is as training sample.One is artificial marks for the common method of mark identifying code picture at present;It is another Kind is to give stamp platform to carry out payment mark.Since required training sample is excessive (about needing 20,000 samples), using method One will increase a large amount of human cost, and will increase a large amount of monetary cost using method two.It should be based on character type end to end Method for recognizing verification code, proposition first automatically generates identifying code similar as far as possible with identifying code to be identified, then by the verifying of generation Code carries out pre-training to model as training sample, secondly obtains a small amount of former identifying code picture (about one thousand sheets) and is manually marked Note, is trained again using model of this sample set to pre-training, and fine tuning model parameter knows that having for model preferably Other effect.Specific identification process is as follows:
A, a large amount of similar identifying code pictures are generated
(1) identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, to origin authentication code figure Piece is pre-processed.Original picture is converted into grayscale image, and carries out a large amount of noises in binaryzation removal picture.
(2) identifying code is generated using the picture processing packet Pillow of python.Specifically utilize ImageFont mould therein Font similar with origin authentication code and text size is arranged in block;Carrying out rotation to image using transform () method makes it Generate distortion effects;It is that image increases interfering line using Draw module.
B, convolutional neural networks model is built
Convolutional neural networks model is built using tensorflow.
Convolutional neural networks model specifically includes that (1) three convolutional layer, main function are the profound level spies for extracting picture Sign;(2) four activation primitives select ReLU function as nonlinear activation function;(3) three pond layers, important function are Sampling is reduced, most significant feature is retained;(4) four Dropout layers, effect is that random drop is a part of in the training process The data of node mitigate over-fitting;(5) Softmax output layers, for obtaining final recognition result.
C, training convolutional neural networks model
Using the similar identifying code picture generated in step B as training sample set, it is 0.001 that initial learning rate, which is arranged, 20 wheels are trained in total, and the batch number of every wheel is 128, then starts training pattern.After model pre-training, re-enter artificial The former identifying code picture marked carries out the model that training is finally needed again as training sample, to pre-training model.
D, identifying code identification is carried out using trained model
Trained model is inputed to after identifying code picture to be identified is carried out gray processing and binary conversion treatment, through model Final recognition result is obtained after prediction.

Claims (8)

1. one kind is based on character type method for recognizing verification code end to end, which comprises the following steps:
(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked identifying code figure with original Piece is labeled as training sample identifying code;
(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks model;
(C) identifying code picture to be identified is identified using trained convolutional neural networks model.
2. according to claim 1 based on character type method for recognizing verification code end to end, it is characterised in that: the step (A) in, identifying code picture difficulty and the subsequent model training speed of raising is generated in order to reduce, original is verified by thresholding algorithm Code image is pre-processed, and original picture is converted to grayscale image, and carries out a large amount of noises in binaryzation removal picture.
3. according to claim 1 based on character type method for recognizing verification code end to end, it is characterised in that: the step (A) in, using python picture processing packet Pillow generate band distort, adhesion and/or containing interfering line with former identifying code phase As marked identifying code picture.
4. according to claim 3 based on character type method for recognizing verification code end to end, it is characterised in that: utilize picture Font similar with origin authentication code and text size is arranged in ImageFont module in processing packet Pillow;It utilizes Transform () method, which carries out rotation to image, makes it generate distortion effects;It is that image increases interfering line using Draw module.
5. according to claim 1, described in any one of 2,3 or 4 based on character type method for recognizing verification code end to end, It is characterized in that, in the step (B), specifically includes the following steps:
(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training sample, Carry out model pre-training;
(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final model again;
(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional neural networks mould Final recognition result is obtained in type.
6. according to claim 5 based on character type method for recognizing verification code end to end, it is characterised in that: the step (1) in, convolutional neural networks model is built using tensorflow, the convolutional neural networks model includes three convolutional layers, Four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.
7. according to claim 6 based on character type method for recognizing verification code end to end, it is characterised in that: described three Convolutional layer is used to extract the profound feature of picture;Four activation primitives select ReLU function as nonlinear activation letter Number;Three pond layers are responsible for reducing sampling, retain most significant feature;Described four Dropout layers are responsible for training The data of a part of node of random drop mitigate over-fitting in journey;The Softmax output layer is for obtaining final identification As a result.
8. according to claim 5 based on character type method for recognizing verification code end to end, it is characterised in that: the model In pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, and the batch number of every wheel is 128.
CN201910288585.4A 2019-04-11 2019-04-11 One kind is based on character type method for recognizing verification code end to end Pending CN109993169A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910288585.4A CN109993169A (en) 2019-04-11 2019-04-11 One kind is based on character type method for recognizing verification code end to end

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910288585.4A CN109993169A (en) 2019-04-11 2019-04-11 One kind is based on character type method for recognizing verification code end to end

Publications (1)

Publication Number Publication Date
CN109993169A true CN109993169A (en) 2019-07-09

Family

ID=67133215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910288585.4A Pending CN109993169A (en) 2019-04-11 2019-04-11 One kind is based on character type method for recognizing verification code end to end

Country Status (1)

Country Link
CN (1) CN109993169A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555298A (en) * 2019-08-30 2019-12-10 阿里巴巴(中国)有限公司 Verification code recognition model training and recognition method, medium, device and computing equipment
CN110555462A (en) * 2019-08-02 2019-12-10 深圳索信达数据技术有限公司 non-fixed multi-character verification code identification method based on convolutional neural network
CN111079117A (en) * 2019-11-28 2020-04-28 上海三零卫士信息安全有限公司 LeNet and SSD-based point-contact type verification code automatic identification method
CN111667549A (en) * 2020-04-28 2020-09-15 华东师范大学 Method, device and storage medium for generating graphic verification code based on countermeasure sample and random transformation
CN111753846A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Website verification method, device, equipment and storage medium based on RPA and AI
CN112380409A (en) * 2020-10-26 2021-02-19 武汉天宝莱信息技术有限公司 Verification code identification method based on automatic crawler
CN117132989A (en) * 2023-10-23 2023-11-28 山东大学 Character verification code identification method, system and equipment based on convolutional neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899571A (en) * 2015-06-12 2015-09-09 成都数联铭品科技有限公司 Random sample generation method for recognition of complex character
CN107085730A (en) * 2017-03-24 2017-08-22 深圳爱拼信息科技有限公司 A kind of deep learning method and device of character identifying code identification
CN108765333A (en) * 2018-05-24 2018-11-06 华南理工大学 A kind of depth map improving method based on depth convolutional neural networks
CN109086772A (en) * 2018-08-16 2018-12-25 成都市映潮科技股份有限公司 A kind of recognition methods and system distorting adhesion character picture validation code
US10192148B1 (en) * 2017-08-22 2019-01-29 Gyrfalcon Technology Inc. Machine learning of written Latin-alphabet based languages via super-character

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899571A (en) * 2015-06-12 2015-09-09 成都数联铭品科技有限公司 Random sample generation method for recognition of complex character
CN107085730A (en) * 2017-03-24 2017-08-22 深圳爱拼信息科技有限公司 A kind of deep learning method and device of character identifying code identification
US10192148B1 (en) * 2017-08-22 2019-01-29 Gyrfalcon Technology Inc. Machine learning of written Latin-alphabet based languages via super-character
CN108765333A (en) * 2018-05-24 2018-11-06 华南理工大学 A kind of depth map improving method based on depth convolutional neural networks
CN109086772A (en) * 2018-08-16 2018-12-25 成都市映潮科技股份有限公司 A kind of recognition methods and system distorting adhesion character picture validation code

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555462A (en) * 2019-08-02 2019-12-10 深圳索信达数据技术有限公司 non-fixed multi-character verification code identification method based on convolutional neural network
CN110555298A (en) * 2019-08-30 2019-12-10 阿里巴巴(中国)有限公司 Verification code recognition model training and recognition method, medium, device and computing equipment
CN110555298B (en) * 2019-08-30 2021-10-26 阿里巴巴(中国)有限公司 Verification code recognition model training and recognition method, medium, device and computing equipment
CN111079117A (en) * 2019-11-28 2020-04-28 上海三零卫士信息安全有限公司 LeNet and SSD-based point-contact type verification code automatic identification method
CN111079117B (en) * 2019-11-28 2024-02-13 上海三零卫士信息安全有限公司 Automatic point-contact verification code identification method based on LeNet and SSD
CN111667549A (en) * 2020-04-28 2020-09-15 华东师范大学 Method, device and storage medium for generating graphic verification code based on countermeasure sample and random transformation
CN111667549B (en) * 2020-04-28 2023-04-07 华东师范大学 Method, device and storage medium for generating graphic verification code based on countermeasure sample and random transformation
CN111753846A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Website verification method, device, equipment and storage medium based on RPA and AI
CN112380409A (en) * 2020-10-26 2021-02-19 武汉天宝莱信息技术有限公司 Verification code identification method based on automatic crawler
CN117132989A (en) * 2023-10-23 2023-11-28 山东大学 Character verification code identification method, system and equipment based on convolutional neural network
CN117132989B (en) * 2023-10-23 2024-01-26 山东大学 Character verification code identification method, system and equipment based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN109993169A (en) One kind is based on character type method for recognizing verification code end to end
CN110009057B (en) Graphic verification code identification method based on deep learning
CN109241383B (en) A kind of type of webpage intelligent identification Method and system based on deep learning
CN106951832B (en) Verification method and device based on handwritten character recognition
CN111968193B (en) Text image generation method based on StackGAN (secure gas network)
CN104966097A (en) Complex character recognition method based on deep learning
CN111177366A (en) Method, device and system for automatically generating extraction type document abstract based on query mechanism
CN111078978A (en) Web credit website entity identification method and system based on website text content
CN105118509A (en) Security authentication method based on voiceprint two-dimensional code
CN110969681A (en) Method for generating handwriting characters based on GAN network
Das et al. Multi‐script versus single‐script scenarios in automatic off‐line signature verification
CN110517696A (en) A kind of offline Voiceprint Recognition System of implantable
CN113886792A (en) Application method and system of print control instrument combining voiceprint recognition and face recognition
CN109145723A (en) A kind of seal recognition methods, system, terminal installation and storage medium
Laishram et al. A neural network based handwritten Meitei Mayek alphabet optical character recognition system
AlKhateeb et al. Word-based handwritten Arabic scripts recognition using DCT features and neural network classifier
Fallah et al. Detecting features of human personality based on handwriting using learning algorithms
CN111199208A (en) Head portrait gender identification method and system based on deep learning framework
CN103136546A (en) Multi-dimension authentication method and authentication device of on-line signature
CN107122653A (en) A kind of picture validation code processing method and processing device
CN116564315A (en) Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium
CN114461779A (en) Case writing element extraction method
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN111460105B (en) Topic mining method, system, equipment and storage medium based on short text
CN114281966A (en) Question template generation method, question answering device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190709