CN109993169A - One kind is based on character type method for recognizing verification code end to end - Google Patents
One kind is based on character type method for recognizing verification code end to end Download PDFInfo
- Publication number
- CN109993169A CN109993169A CN201910288585.4A CN201910288585A CN109993169A CN 109993169 A CN109993169 A CN 109993169A CN 201910288585 A CN201910288585 A CN 201910288585A CN 109993169 A CN109993169 A CN 109993169A
- Authority
- CN
- China
- Prior art keywords
- identifying code
- picture
- character type
- training
- type method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The present invention is more particularly directed to one kind based on character type method for recognizing verification code end to end.It should generate that identifying code is similar has marked identifying code picture with original based on character type method for recognizing verification code end to end, first acquisition identifying code picture, pretreatment identifying code picture, identifying code was labeled as training sample;Then, it constructs based on depth learning technology and utilizes ready training sample set training convolutional neural networks model;Identifying code picture to be identified is identified using trained convolutional neural networks model.It can should quickly and accurately identify with distortion, adhesion, the character type identifying code containing interfering line and noise based on character type method for recognizing verification code end to end, and greatly improve automatic test efficiency.
Description
Technical field
The present invention relates to image procossings and technical field of character recognition, in particular to a kind of to be tested based on character type end to end
Demonstrate,prove code recognition methods.
Background technique
Identifying code (Completely Automated PublicTuring test to tell Computers and
Humans Apart, abbreviation CAPTCHA) it is a kind of open turing test for distinguishing the mankind and computer.Computer in order to prevent
Program simulates human behavior and carries out Brute Force password, malice brush ticket, sends the activities such as a large amount of junk information, a large amount of web station systems
(such as e-banking system, transaction system, community forum) is all provided with identifying code mechanism.
Currently, common identifying code includes the letter and number of distortion, adhesion, or there are interfering lines and much noise point etc.
Situation, What is more to prevent machine recognition identifying code by way of sliding block, click.Although identifying code is to a certain extent
The machine behavior of malice is prevented, is increased when also carrying out automatic test to the platform comprising identifying code but then very big
Human cost and time and cost, seriously affected automatic test efficiency.
Therefore, how quickly and accurately automatic identification identifying code at key urgently to be resolved in automatic test course
Problem.
Based on the above situation, for distortion, adhesion, the character identifying code containing interfering line and noise spot is contained, in conjunction with depth
Learning method, the invention proposes one kind based on character type method for recognizing verification code end to end, is not necessarily to separating character, Neng Goucong
Method of the identifying code picture starting point to end automatic identification identifying code.
Summary of the invention
In order to compensate for the shortcomings of the prior art, the present invention provides it is a kind of be simple and efficient tested based on character type end to end
Demonstrate,prove code recognition methods.
The present invention is achieved through the following technical solutions:
One kind is based on character type method for recognizing verification code end to end, it is characterised in that: the following steps are included:
(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked verifying with original
Code picture, is labeled as training sample identifying code;
(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks
Model;
(C) identifying code picture to be identified is identified using trained convolutional neural networks model.
In the step (A), identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, is passed through
Thresholding algorithm pre-processes former identifying code image, and original picture is converted to grayscale image, and carries out in binaryzation removal picture
A large amount of noises.
In the step (A), band distortion, adhesion are generated and/or containing interference using the picture processing packet Pillow of python
The similar with former identifying code of line has marked identifying code picture.
The ImageFont module in packet Pillow is handled using picture, and font similar with origin authentication code and text are set
Size;Carrying out rotation to image using transform () method makes it generate distortion effects;It is image increasing using Draw module
Add interfering line.
In the step (B), specifically includes the following steps:
(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training
Sample carries out model pre-training;
(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final again
Model;
(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional Neural net
Final recognition result is obtained in network model.
In the step (1), convolutional neural networks model, the convolutional neural networks model are built using tensorflow
Including three convolutional layers, four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.
Three convolutional layers are used to extract the profound feature of picture;Four activation primitives select ReLU function to make
For nonlinear activation function;Three pond layers are responsible for reducing sampling, retain most significant feature;Four Dropout
Layer is responsible for the data of a part of node of random drop in the training process to mitigate over-fitting;The Softmax output layer is used for
Obtain final recognition result.
In the model pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, every wheel
Batch number is 128.
The beneficial effects of the present invention are: should be based on character type method for recognizing verification code end to end, it can be quickly and accurately
Identification has distortion, adhesion, the character type identifying code containing interfering line and noise, greatly improves automatic test efficiency.
Detailed description of the invention
Attached drawing 1 is that the present invention is based on character type method for recognizing verification code schematic diagrames end to end.
Specific embodiment
In order to which technical problems, technical solutions and advantages to be solved are more clearly understood, tie below
Drawings and examples are closed, the present invention will be described in detail.It should be noted that specific embodiment described herein is only used
To explain the present invention, it is not intended to limit the present invention.
It should be based on character type method for recognizing verification code end to end, comprising the following steps:
(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked verifying with original
Code picture, is labeled as training sample identifying code;
(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks
Model;
(C) identifying code picture to be identified is identified using trained convolutional neural networks model.
In the step (A), identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, is passed through
Thresholding algorithm pre-processes former identifying code image, and original picture is converted to grayscale image, and carries out in binaryzation removal picture
A large amount of noises.
In the step (A), band distortion, adhesion are generated and/or containing interference using the picture processing packet Pillow of python
The similar with former identifying code of line has marked identifying code picture.
The ImageFont module in packet Pillow is handled using picture, and font similar with origin authentication code and text are set
Size;Carrying out rotation to image using transform () method makes it generate distortion effects;It is image increasing using Draw module
Add interfering line.
In the step (B), specifically includes the following steps:
(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training
Sample carries out model pre-training;
(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final again
Model;
(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional Neural net
Final recognition result is obtained in network model.
In the step (1), convolutional neural networks model, the convolutional neural networks model are built using tensorflow
Including three convolutional layers, four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.
Three convolutional layers are used to extract the profound feature of picture;Four activation primitives select ReLU function to make
For nonlinear activation function;Three pond layers are responsible for reducing sampling, retain most significant feature;Four Dropout
Layer is responsible for the data of a part of node of random drop in the training process to mitigate over-fitting;The Softmax output layer is used for
Obtain final recognition result.
In the model pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, every wheel
Batch number is 128.
In order to save time and money cost and make training pattern that there is good recognition effect, need to acquire a large amount of mark
The identifying code picture being poured in is as training sample.One is artificial marks for the common method of mark identifying code picture at present;It is another
Kind is to give stamp platform to carry out payment mark.Since required training sample is excessive (about needing 20,000 samples), using method
One will increase a large amount of human cost, and will increase a large amount of monetary cost using method two.It should be based on character type end to end
Method for recognizing verification code, proposition first automatically generates identifying code similar as far as possible with identifying code to be identified, then by the verifying of generation
Code carries out pre-training to model as training sample, secondly obtains a small amount of former identifying code picture (about one thousand sheets) and is manually marked
Note, is trained again using model of this sample set to pre-training, and fine tuning model parameter knows that having for model preferably
Other effect.Specific identification process is as follows:
A, a large amount of similar identifying code pictures are generated
(1) identifying code picture difficulty and the subsequent model training speed of raising are generated in order to reduce, to origin authentication code figure
Piece is pre-processed.Original picture is converted into grayscale image, and carries out a large amount of noises in binaryzation removal picture.
(2) identifying code is generated using the picture processing packet Pillow of python.Specifically utilize ImageFont mould therein
Font similar with origin authentication code and text size is arranged in block;Carrying out rotation to image using transform () method makes it
Generate distortion effects;It is that image increases interfering line using Draw module.
B, convolutional neural networks model is built
Convolutional neural networks model is built using tensorflow.
Convolutional neural networks model specifically includes that (1) three convolutional layer, main function are the profound level spies for extracting picture
Sign;(2) four activation primitives select ReLU function as nonlinear activation function;(3) three pond layers, important function are
Sampling is reduced, most significant feature is retained;(4) four Dropout layers, effect is that random drop is a part of in the training process
The data of node mitigate over-fitting;(5) Softmax output layers, for obtaining final recognition result.
C, training convolutional neural networks model
Using the similar identifying code picture generated in step B as training sample set, it is 0.001 that initial learning rate, which is arranged,
20 wheels are trained in total, and the batch number of every wheel is 128, then starts training pattern.After model pre-training, re-enter artificial
The former identifying code picture marked carries out the model that training is finally needed again as training sample, to pre-training model.
D, identifying code identification is carried out using trained model
Trained model is inputed to after identifying code picture to be identified is carried out gray processing and binary conversion treatment, through model
Final recognition result is obtained after prediction.
Claims (8)
1. one kind is based on character type method for recognizing verification code end to end, which comprises the following steps:
(A) identifying code picture is acquired first, pretreatment identifying code picture generates that identifying code is similar has marked identifying code figure with original
Piece is labeled as training sample identifying code;
(B) then, it constructs based on deep learning technology and utilizes ready training sample set training convolutional neural networks model;
(C) identifying code picture to be identified is identified using trained convolutional neural networks model.
2. according to claim 1 based on character type method for recognizing verification code end to end, it is characterised in that: the step
(A) in, identifying code picture difficulty and the subsequent model training speed of raising is generated in order to reduce, original is verified by thresholding algorithm
Code image is pre-processed, and original picture is converted to grayscale image, and carries out a large amount of noises in binaryzation removal picture.
3. according to claim 1 based on character type method for recognizing verification code end to end, it is characterised in that: the step
(A) in, using python picture processing packet Pillow generate band distort, adhesion and/or containing interfering line with former identifying code phase
As marked identifying code picture.
4. according to claim 3 based on character type method for recognizing verification code end to end, it is characterised in that: utilize picture
Font similar with origin authentication code and text size is arranged in ImageFont module in processing packet Pillow;It utilizes
Transform () method, which carries out rotation to image, makes it generate distortion effects;It is that image increases interfering line using Draw module.
5. according to claim 1, described in any one of 2,3 or 4 based on character type method for recognizing verification code end to end,
It is characterized in that, in the step (B), specifically includes the following steps:
(1) convolutional neural networks model is built, using the identifying code picture that marked similar with former identifying code as training sample,
Carry out model pre-training;
(2) using the origin authentication code picture manually marked as training sample, training pattern obtains final model again;
(3) identifying code picture to be identified is subjected to gray processing and binary conversion treatment, is input to trained convolutional neural networks mould
Final recognition result is obtained in type.
6. according to claim 5 based on character type method for recognizing verification code end to end, it is characterised in that: the step
(1) in, convolutional neural networks model is built using tensorflow, the convolutional neural networks model includes three convolutional layers,
Four activation primitives, three pond layers, four Dropout layers and a Softmax output layer.
7. according to claim 6 based on character type method for recognizing verification code end to end, it is characterised in that: described three
Convolutional layer is used to extract the profound feature of picture;Four activation primitives select ReLU function as nonlinear activation letter
Number;Three pond layers are responsible for reducing sampling, retain most significant feature;Described four Dropout layers are responsible for training
The data of a part of node of random drop mitigate over-fitting in journey;The Softmax output layer is for obtaining final identification
As a result.
8. according to claim 5 based on character type method for recognizing verification code end to end, it is characterised in that: the model
In pre-training and model training, initial learning rate is set as 0.001, trains 20 wheels in total, and the batch number of every wheel is 128.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910288585.4A CN109993169A (en) | 2019-04-11 | 2019-04-11 | One kind is based on character type method for recognizing verification code end to end |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910288585.4A CN109993169A (en) | 2019-04-11 | 2019-04-11 | One kind is based on character type method for recognizing verification code end to end |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109993169A true CN109993169A (en) | 2019-07-09 |
Family
ID=67133215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910288585.4A Pending CN109993169A (en) | 2019-04-11 | 2019-04-11 | One kind is based on character type method for recognizing verification code end to end |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109993169A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555298A (en) * | 2019-08-30 | 2019-12-10 | 阿里巴巴(中国)有限公司 | Verification code recognition model training and recognition method, medium, device and computing equipment |
CN110555462A (en) * | 2019-08-02 | 2019-12-10 | 深圳索信达数据技术有限公司 | non-fixed multi-character verification code identification method based on convolutional neural network |
CN111079117A (en) * | 2019-11-28 | 2020-04-28 | 上海三零卫士信息安全有限公司 | LeNet and SSD-based point-contact type verification code automatic identification method |
CN111667549A (en) * | 2020-04-28 | 2020-09-15 | 华东师范大学 | Method, device and storage medium for generating graphic verification code based on countermeasure sample and random transformation |
CN111753846A (en) * | 2020-06-30 | 2020-10-09 | 北京来也网络科技有限公司 | Website verification method, device, equipment and storage medium based on RPA and AI |
CN112380409A (en) * | 2020-10-26 | 2021-02-19 | 武汉天宝莱信息技术有限公司 | Verification code identification method based on automatic crawler |
CN117132989A (en) * | 2023-10-23 | 2023-11-28 | 山东大学 | Character verification code identification method, system and equipment based on convolutional neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899571A (en) * | 2015-06-12 | 2015-09-09 | 成都数联铭品科技有限公司 | Random sample generation method for recognition of complex character |
CN107085730A (en) * | 2017-03-24 | 2017-08-22 | 深圳爱拼信息科技有限公司 | A kind of deep learning method and device of character identifying code identification |
CN108765333A (en) * | 2018-05-24 | 2018-11-06 | 华南理工大学 | A kind of depth map improving method based on depth convolutional neural networks |
CN109086772A (en) * | 2018-08-16 | 2018-12-25 | 成都市映潮科技股份有限公司 | A kind of recognition methods and system distorting adhesion character picture validation code |
US10192148B1 (en) * | 2017-08-22 | 2019-01-29 | Gyrfalcon Technology Inc. | Machine learning of written Latin-alphabet based languages via super-character |
-
2019
- 2019-04-11 CN CN201910288585.4A patent/CN109993169A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899571A (en) * | 2015-06-12 | 2015-09-09 | 成都数联铭品科技有限公司 | Random sample generation method for recognition of complex character |
CN107085730A (en) * | 2017-03-24 | 2017-08-22 | 深圳爱拼信息科技有限公司 | A kind of deep learning method and device of character identifying code identification |
US10192148B1 (en) * | 2017-08-22 | 2019-01-29 | Gyrfalcon Technology Inc. | Machine learning of written Latin-alphabet based languages via super-character |
CN108765333A (en) * | 2018-05-24 | 2018-11-06 | 华南理工大学 | A kind of depth map improving method based on depth convolutional neural networks |
CN109086772A (en) * | 2018-08-16 | 2018-12-25 | 成都市映潮科技股份有限公司 | A kind of recognition methods and system distorting adhesion character picture validation code |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555462A (en) * | 2019-08-02 | 2019-12-10 | 深圳索信达数据技术有限公司 | non-fixed multi-character verification code identification method based on convolutional neural network |
CN110555298A (en) * | 2019-08-30 | 2019-12-10 | 阿里巴巴(中国)有限公司 | Verification code recognition model training and recognition method, medium, device and computing equipment |
CN110555298B (en) * | 2019-08-30 | 2021-10-26 | 阿里巴巴(中国)有限公司 | Verification code recognition model training and recognition method, medium, device and computing equipment |
CN111079117A (en) * | 2019-11-28 | 2020-04-28 | 上海三零卫士信息安全有限公司 | LeNet and SSD-based point-contact type verification code automatic identification method |
CN111079117B (en) * | 2019-11-28 | 2024-02-13 | 上海三零卫士信息安全有限公司 | Automatic point-contact verification code identification method based on LeNet and SSD |
CN111667549A (en) * | 2020-04-28 | 2020-09-15 | 华东师范大学 | Method, device and storage medium for generating graphic verification code based on countermeasure sample and random transformation |
CN111667549B (en) * | 2020-04-28 | 2023-04-07 | 华东师范大学 | Method, device and storage medium for generating graphic verification code based on countermeasure sample and random transformation |
CN111753846A (en) * | 2020-06-30 | 2020-10-09 | 北京来也网络科技有限公司 | Website verification method, device, equipment and storage medium based on RPA and AI |
CN112380409A (en) * | 2020-10-26 | 2021-02-19 | 武汉天宝莱信息技术有限公司 | Verification code identification method based on automatic crawler |
CN117132989A (en) * | 2023-10-23 | 2023-11-28 | 山东大学 | Character verification code identification method, system and equipment based on convolutional neural network |
CN117132989B (en) * | 2023-10-23 | 2024-01-26 | 山东大学 | Character verification code identification method, system and equipment based on convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109993169A (en) | One kind is based on character type method for recognizing verification code end to end | |
CN110009057B (en) | Graphic verification code identification method based on deep learning | |
CN109241383B (en) | A kind of type of webpage intelligent identification Method and system based on deep learning | |
CN106951832B (en) | Verification method and device based on handwritten character recognition | |
CN111968193B (en) | Text image generation method based on StackGAN (secure gas network) | |
CN104966097A (en) | Complex character recognition method based on deep learning | |
CN111177366A (en) | Method, device and system for automatically generating extraction type document abstract based on query mechanism | |
CN111078978A (en) | Web credit website entity identification method and system based on website text content | |
CN105118509A (en) | Security authentication method based on voiceprint two-dimensional code | |
CN110969681A (en) | Method for generating handwriting characters based on GAN network | |
Das et al. | Multi‐script versus single‐script scenarios in automatic off‐line signature verification | |
CN110517696A (en) | A kind of offline Voiceprint Recognition System of implantable | |
CN113886792A (en) | Application method and system of print control instrument combining voiceprint recognition and face recognition | |
CN109145723A (en) | A kind of seal recognition methods, system, terminal installation and storage medium | |
Laishram et al. | A neural network based handwritten Meitei Mayek alphabet optical character recognition system | |
AlKhateeb et al. | Word-based handwritten Arabic scripts recognition using DCT features and neural network classifier | |
Fallah et al. | Detecting features of human personality based on handwriting using learning algorithms | |
CN111199208A (en) | Head portrait gender identification method and system based on deep learning framework | |
CN103136546A (en) | Multi-dimension authentication method and authentication device of on-line signature | |
CN107122653A (en) | A kind of picture validation code processing method and processing device | |
CN116564315A (en) | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium | |
CN114461779A (en) | Case writing element extraction method | |
CN115455144A (en) | Data enhancement method of completion type space filling type for small sample intention recognition | |
CN111460105B (en) | Topic mining method, system, equipment and storage medium based on short text | |
CN114281966A (en) | Question template generation method, question answering device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190709 |