CN106446897A - Identification method of hollow verification code - Google Patents

Identification method of hollow verification code Download PDF

Info

Publication number
CN106446897A
CN106446897A CN201610812124.9A CN201610812124A CN106446897A CN 106446897 A CN106446897 A CN 106446897A CN 201610812124 A CN201610812124 A CN 201610812124A CN 106446897 A CN106446897 A CN 106446897A
Authority
CN
China
Prior art keywords
character
hollow
identifying code
region
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610812124.9A
Other languages
Chinese (zh)
Inventor
王本强
郭运艳
陈安猛
衣秀
房善华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201610812124.9A priority Critical patent/CN106446897A/en
Publication of CN106446897A publication Critical patent/CN106446897A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method for identifying hollow verification codes, which comprises the following steps: firstly, extracting a character image, preprocessing the extracted character, and extracting a character area without character interference; cutting the character area to segment the characters; and training the characters to obtain a corresponding recognition model for recognition. Compared with the prior art, when a data analysis person needs to acquire a large amount of network data, the method for identifying the hollow verification code can automatically and rapidly identify the characters in the verification code and accurately acquire the hollow verification code, and cannot acquire data due to obstruction and identification errors of the verification code.

Description

A kind of recognition methodss of hollow identifying code
Technical field
The present invention relates to identifying code technology of identification field, specifically a kind of practical, identification of hollow identifying code Method.
Background technology
Identifying code distinguishes, as a kind of, the safe practice that user is computer or the mankind, is adopted by most of websites. The appearance of identifying code is the destruction in order to prevent some rogue programs to website, such as batch registration, post in batches, be also prevented from simultaneously Crawlers carry out resource crawl.The main character identifying code adopting based on image at present.Hollow character identifying code is as one kind Identifying code design, is designed as an interfering line and passes through hollow character, so that the visual experience of user is done unlike solid adhesion Disturb the Consumer's Experience causing to decline.Hollow character identifying code as a kind of relatively new design, by each major company net Station is applied in the modules such as its User logs in, mailbox checking, forum's comment.
Therefore, while identifying code brings safe to website, also carry to some people being engaged in data analysiss acquisition work Carry out inconvenience necessarily, identifying code recognizer is exactly to solve this problem.At present, hollow character identifying code compares as one kind Novel mode, there is presently no a kind of effective method to identify it.Based on this, now provide a kind of identification of hollow identifying code Method.
Content of the invention
The technical assignment of the present invention is for above weak point, provides a kind of practical, identification of hollow identifying code Method.
A kind of recognition methodss of hollow identifying code, the process of realizing of the method is:
Extract character picture first, pretreatment is carried out to the character extracting, extract the character zone removing character interference;
Character zone is cut, is partitioned into character;
Character is trained, obtains corresponding identification model, be identified.
Described character pre-processing process is completed by image pre-processing module, and the processing procedure of this module is:
Step one, first to extract character carry out binary conversion treatment, according to character feature extract interfering line region;
Step 2 then on interfering line region remove character zone impact, obtain interfering line position;
Step 3, along interfering line direction, according to the feature of character around, remove part interfering line;
Step 4, connected component labeling is carried out to it, find out all of character breaking portion and interfering line dry with what character was formed Disturb region, remove interference region.
The detailed process of described step one is:
To the character picture extracting, first by the filtering algorithm including gaussian filtering, medium filtering filters the noise in image Point;
Then according to the pixel characteristic of image, using adaptive threshold or fixed threshold algorithm, image is carried out at binaryzation Reason;
According to the generation feature of hollow identifying code, to the situation having character to have fracture, repair character outline line it is ensured that outside character The integrity of edge, thus complete the extraction in interfering line region.
In step 2, the pixel characteristic according to hollow identifying code is referred to according to interfering line extracted region interfering line, extract dry Disturb line, and record the position that it is located.
Remove part interfering line in step 3 to refer to the process of disconnect interfering line, that is, the pixel according to hollow identifying code is special Levy, the field pixel of interfering line is analyzed, disconnect and substantially interfere with line position.
In step 4, the connected region of labelling includes the connected region of the connected region, character and character formation of character of rupturing Connected region and the connected region of character inside itself that domain, character and interfering line are formed;
Remove interference connected region and refer to remove all of connected region in addition to fracture character, specially:By analyzing connected region Overseas distribution characteristicss, distinguish fracture character connected region and interference connected region, classified, remove.
The step connecting with rotational correction that ruptures also is included, specially in above-mentioned pre-treatment step:
Fracture Connection Step refers to that fracture character connected region part connects, the finite length effect feature to character for the Analysis interference line, Design corresponding connection template, connect breaking portion;
Rotational correction step refers to find out the minimum enclosed rectangle of the every part of every character, removes interference rectangle, and statistical computation draws The anglec of rotation, rotates identifying code image.
Character zone is carried out cut through with hollow character segmentation module complete, this module is by entering to each character zone Row merges classification, obtains the region at each character place, cuts the output of each character, specially:
To the character connected region obtaining, merge classification, calculate the center at each character place, attached in this position Nearly setting threshold value, calculates the distance with character position, if less than predetermined threshold value, is then included into this character, otherwise, belongs to In other characters, thus having obtained the particular location at each character place;Concrete by above-mentioned each character calculated Position, carries out cutting output, for the character recognition of next step to each character.
Character recognition is realized by character recognition module, and this module is trained character neutral net, trains symbol Close the model of this class character feature, enter line character with this model and judge to identify.
The recognition methodss of a kind of hollow identifying code of the present invention, have advantages below:
A kind of recognition methodss of the hollow identifying code of this invention, when data analyst needs to obtain a large amount of network data, pin To hollow identifying code, automatically can rapidly identify the character in identifying code, accurately be obtained, will not be because of identifying code Hinder and identification mistake, appearance cannot obtain data, further, since the character that this character model is aimed at this class is trained Arrive, discrimination greatly improves, practical, applied widely it is easy to promote.
Brief description
Accompanying drawing 1 realizes structural representation for the present invention's.
Accompanying drawing 2 is the flowchart of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment the invention will be further described.
As shown in accompanying drawing 1, Fig. 2, the present invention provides a kind of recognition methodss of hollow identifying code, for for interfering line The identifying code of hollow character is identified.The method utilizes binaryzation to extract the edge of character, extracts interference according to character feature Line region, removes character zone impact, obtains interfering line position, along interfering line direction, according to the feature of character around, Remove part interfering line, then connected component labeling is carried out to it, find out all of character breaking portion and interfering line and character The interference region being formed, using a kind of algorithm proposed by the present invention, removes interference region.Using several neighborhoods proposed by the present invention Model connects breaking portion, by the statistical analysiss to each connected region, calculates the anglec of rotation, rotates image, to each word Symbol region carries out classification and merges, and obtains the region at each character place, finally cuts the output of each character.Then character god It is trained through network, trains the model meeting this class character feature, to enter line character with this model and to judge to identify.
It realizes process:
Extract character picture first, pretreatment is carried out to the character extracting, extract the character zone removing character interference;
Character zone is cut, is partitioned into character;
Character is trained, obtains corresponding identification model, be identified.
Described character pre-processing process is completed by image pre-processing module, and the processing procedure of this module is:
Using filtering algorithm such as gaussian filtering, medium filtering filters the noise spot in image.
According to the pixel characteristic of image, using adaptive threshold or fixed threshold algorithm, image is carried out at binaryzation Reason.
According to the generation feature of hollow identifying code, to the situation having character to have fracture, need repair character outline line it is ensured that The outer peripheral integrity of character.
Extract interfering line, according to the pixel characteristic of hollow identifying code, extract interfering line, and record the position that it is located Put.
Disconnect interfering line, according to the pixel characteristic of hollow identifying code, the field pixel of interfering line is analyzed, disconnect bright Aobvious interference line position.
Connected component labeling, finds out all of connected region, and mark position, connected region now, not only includes The connected region of fracture character, also includes the connected region that character is formed, the connected region that character is formed with interfering line with character And the connected region that character itself is internal, it is required for being marked.
Go to disturb connected region, all of connected region in addition to the above-mentioned character except fracture of main inclusion, mainly pass through analysis and connect The external distribution characteristicss in logical region, according to certain strategy, distinguish fracture character connected region and interference connected region, carry out point Class, removes.
Fracture character connected region part connects, and the finite length effect feature to character for the Analysis interference line designs corresponding company Connect template, connect breaking portion.(This step typically can not couple together all of character breaking portion)
Rotational correction, finds out the minimum enclosed rectangle of the every part of every character, removes interference rectangle, and statistical computation draws the anglec of rotation Degree, rotates identifying code image.
Character zone is carried out cut through with hollow character segmentation module complete, this module is by entering to each character zone Row merges classification, obtains the region at each character place, cuts the output of each character, specially:
To character connected region obtained above, merge classification, usually calculate the general center at each character place Position, near this position, according to certain strategy, arranges threshold value, calculates the distance with character position, if less than Predetermined threshold value, then be included into this character, otherwise, belongs to other characters, thus having obtained the particular location at each character place.
By the particular location of above-mentioned each character calculated, each character is carried out with cutting output, for next The character recognition of step.
Character recognition is realized by character recognition module, and this module is trained character neutral net, trains symbol Close the model of this class character feature, enter line character with this model and judge to identify.This module is mainly responsible for obtained above every Individual character, is identified, and has many existing character recognition modules at present, but recognition effect is all not ideal, for this reason, using The neutral net that this class hollow character is learnt by training machine, obtains reflecting the mould of this class hollow character inherent feature Type, is identified with this model, defeated identification character, and ratio uses existing identification module, and character identification rate is much improved.
Above-mentioned specific embodiment is only the concrete case of the present invention, and the scope of patent protection of the present invention includes but is not limited to Above-mentioned specific embodiment, any recognition methodss of a kind of hollow identifying code meeting the present invention claims and any The suitable change or replacement that person of an ordinary skill in the technical field is done to it, all should fall into the patent protection model of the present invention Enclose.

Claims (9)

1. a kind of recognition methodss of hollow identifying code are it is characterised in that the process of realizing of the method is:
Extract character picture first, pretreatment is carried out to the character extracting, extract the character zone removing character interference;
Character zone is cut, is partitioned into character;
Character is trained, obtains corresponding identification model, be identified.
2. a kind of recognition methodss of hollow identifying code according to claim 1 are it is characterised in that described character pre-processing mistake Journey is completed by image pre-processing module, and the processing procedure of this module is:
Step one, first to extract character carry out binary conversion treatment, according to character feature extract interfering line region;
Step 2 then on interfering line region remove character zone impact, obtain interfering line position;
Step 3, along interfering line direction, according to the feature of character around, remove part interfering line;
Step 4, connected component labeling is carried out to it, find out all of character breaking portion and interfering line dry with what character was formed Disturb region, remove interference region.
3. a kind of hollow identifying code according to claim 2 recognition methodss it is characterised in that described step one concrete Process is:
To the character picture extracting, first by the filtering algorithm including gaussian filtering, medium filtering filters the noise in image Point;
Then according to the pixel characteristic of image, using adaptive threshold or fixed threshold algorithm, image is carried out at binaryzation Reason;
According to the generation feature of hollow identifying code, to the situation having character to have fracture, repair character outline line it is ensured that outside character The integrity of edge, thus complete the extraction in interfering line region.
4. a kind of recognition methodss of hollow identifying code according to claim 2 are it is characterised in that according to interference in step 2 Line extracted region interfering line refers to the pixel characteristic according to hollow identifying code, extracts interfering line, and records the position that it is located Put.
5. a kind of recognition methodss of hollow identifying code according to claim 2 are it is characterised in that remove part in step 3 Interfering line refers to the process of disconnect interfering line, that is, according to the pixel characteristic of hollow identifying code, the field pixel of interfering line is carried out Analysis, disconnects and substantially interferes with line position.
6. a kind of hollow identifying code according to claim 2 recognition methodss it is characterised in that in step 4 labelling company Logical region includes the connection of the connected region of connected region, character and character formation, character and interfering line formation of character of rupturing Region and the connected region of character inside itself;
Remove interference connected region and refer to remove all of connected region in addition to fracture character, specially:By analyzing connected region Overseas distribution characteristicss, distinguish fracture character connected region and interference connected region, classified, remove.
7. the recognition methodss according to the arbitrary described a kind of hollow identifying code of claim 2-6 are it is characterised in that above-mentioned pretreatment The step connecting with rotational correction that ruptures also is included, specially in step:
Fracture Connection Step refers to that fracture character connected region part connects, the finite length effect feature to character for the Analysis interference line, Design corresponding connection template, connect breaking portion;
Rotational correction step refers to find out the minimum enclosed rectangle of the every part of every character, removes interference rectangle, and statistical computation draws The anglec of rotation, rotates identifying code image.
8. a kind of recognition methodss of hollow identifying code according to claim 1 are it is characterised in that cut to character zone Cut and completed by hollow character segmentation module, this module, by merging classification to each character zone, obtains each character The region being located, cuts the output of each character, specially:
To the character connected region obtaining, merge classification, calculate the center at each character place, attached in this position Nearly setting threshold value, calculates the distance with character position, if less than predetermined threshold value, is then included into this character, otherwise, belongs to In other characters, thus having obtained the particular location at each character place;Concrete by above-mentioned each character calculated Position, carries out cutting output, for the character recognition of next step to each character.
9. a kind of recognition methodss of hollow identifying code according to claim 1 are it is characterised in that character is passed through in character recognition Identification module is realized, and this module is trained character neutral net, trains the model meeting this class character feature, with this Model come to enter line character judge identification.
CN201610812124.9A 2016-09-09 2016-09-09 Identification method of hollow verification code Pending CN106446897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610812124.9A CN106446897A (en) 2016-09-09 2016-09-09 Identification method of hollow verification code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610812124.9A CN106446897A (en) 2016-09-09 2016-09-09 Identification method of hollow verification code

Publications (1)

Publication Number Publication Date
CN106446897A true CN106446897A (en) 2017-02-22

Family

ID=58164575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610812124.9A Pending CN106446897A (en) 2016-09-09 2016-09-09 Identification method of hollow verification code

Country Status (1)

Country Link
CN (1) CN106446897A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067006A (en) * 2017-04-20 2017-08-18 金电联行(北京)信息技术有限公司 A kind of method for recognizing verification code and system for serving data acquisition
CN107292311A (en) * 2017-08-10 2017-10-24 河南科技大学 A kind of recognition methods of the Characters Stuck identifying code based on neutral net
CN107292307A (en) * 2017-07-21 2017-10-24 华中科技大学 One kind is inverted Chinese character identifying code automatic identifying method and system
CN107360137A (en) * 2017-06-15 2017-11-17 深圳市牛鼎丰科技有限公司 Construction method and device for the neural network model of identifying code identification
CN108038484A (en) * 2017-12-11 2018-05-15 中国人民解放军战略支援部队信息工程大学 Hollow identifying code method for quickly identifying
CN108171229A (en) * 2017-12-27 2018-06-15 广州多益网络股份有限公司 A kind of recognition methods of hollow adhesion identifying code and system
CN108537225A (en) * 2017-03-01 2018-09-14 重庆邮电大学 A method of for hollow character in automatic identification identifying code
CN109086772A (en) * 2018-08-16 2018-12-25 成都市映潮科技股份有限公司 A kind of recognition methods and system distorting adhesion character picture validation code
CN109101810A (en) * 2018-08-14 2018-12-28 电子科技大学 A kind of text method for recognizing verification code based on OCR technique
CN109150817A (en) * 2017-11-24 2019-01-04 新华三信息安全技术有限公司 A kind of web-page requests recognition methods and device
CN110069915A (en) * 2019-03-12 2019-07-30 杭州电子科技大学 A kind of nine grids figure method for recognizing verification code based on contours extract
CN110232375A (en) * 2018-03-05 2019-09-13 重庆邮电大学 A kind of method for recognizing verification code hollow end to end
CN110490056A (en) * 2019-07-08 2019-11-22 北京三快在线科技有限公司 The method and apparatus that image comprising formula is handled

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923702A (en) * 2010-08-25 2010-12-22 郝红卫 Image valid code generating method
CN104331688A (en) * 2014-11-05 2015-02-04 中北大学 Detonator shell dot character identifying method
CN105447508A (en) * 2015-11-10 2016-03-30 上海珍岛信息技术有限公司 Identification method and system for character image verification codes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923702A (en) * 2010-08-25 2010-12-22 郝红卫 Image valid code generating method
CN104331688A (en) * 2014-11-05 2015-02-04 中北大学 Detonator shell dot character identifying method
CN105447508A (en) * 2015-11-10 2016-03-30 上海珍岛信息技术有限公司 Identification method and system for character image verification codes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
柳红刚: "字符扭曲粘连验证码识别技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊 )》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537225A (en) * 2017-03-01 2018-09-14 重庆邮电大学 A method of for hollow character in automatic identification identifying code
CN107067006A (en) * 2017-04-20 2017-08-18 金电联行(北京)信息技术有限公司 A kind of method for recognizing verification code and system for serving data acquisition
CN107360137A (en) * 2017-06-15 2017-11-17 深圳市牛鼎丰科技有限公司 Construction method and device for the neural network model of identifying code identification
CN107292307B (en) * 2017-07-21 2019-12-17 华中科技大学 Automatic identification method and system for inverted Chinese character verification code
CN107292307A (en) * 2017-07-21 2017-10-24 华中科技大学 One kind is inverted Chinese character identifying code automatic identifying method and system
CN107292311A (en) * 2017-08-10 2017-10-24 河南科技大学 A kind of recognition methods of the Characters Stuck identifying code based on neutral net
CN109150817A (en) * 2017-11-24 2019-01-04 新华三信息安全技术有限公司 A kind of web-page requests recognition methods and device
CN109150817B (en) * 2017-11-24 2020-11-27 新华三信息安全技术有限公司 Webpage request identification method and device
CN108038484A (en) * 2017-12-11 2018-05-15 中国人民解放军战略支援部队信息工程大学 Hollow identifying code method for quickly identifying
CN108038484B (en) * 2017-12-11 2020-05-05 中国人民解放军战略支援部队信息工程大学 Method for quickly identifying hollow verification code
CN108171229A (en) * 2017-12-27 2018-06-15 广州多益网络股份有限公司 A kind of recognition methods of hollow adhesion identifying code and system
CN108171229B (en) * 2017-12-27 2021-11-16 广州多益网络股份有限公司 Method and system for identifying hollow adhesion verification code
CN110232375A (en) * 2018-03-05 2019-09-13 重庆邮电大学 A kind of method for recognizing verification code hollow end to end
CN109101810A (en) * 2018-08-14 2018-12-28 电子科技大学 A kind of text method for recognizing verification code based on OCR technique
CN109101810B (en) * 2018-08-14 2021-07-06 电子科技大学 Character verification code recognition method based on OCR technology
CN109086772A (en) * 2018-08-16 2018-12-25 成都市映潮科技股份有限公司 A kind of recognition methods and system distorting adhesion character picture validation code
CN110069915A (en) * 2019-03-12 2019-07-30 杭州电子科技大学 A kind of nine grids figure method for recognizing verification code based on contours extract
CN110069915B (en) * 2019-03-12 2021-04-13 杭州电子科技大学 Sudoku graphic verification code identification method based on contour extraction
CN110490056A (en) * 2019-07-08 2019-11-22 北京三快在线科技有限公司 The method and apparatus that image comprising formula is handled

Similar Documents

Publication Publication Date Title
CN106446897A (en) Identification method of hollow verification code
CN107967475B (en) Verification code identification method based on window sliding and convolutional neural network
CN104252620B (en) The graphical verification code recognition methods of Characters Stuck
CN107067006B (en) Verification code identification method and system serving for data acquisition
CN106778457A (en) The fingerprint identification method and system of fingerprint recognition rate can be improved
CN109145742B (en) Pedestrian identification method and system
CN103824055B (en) A kind of face identification method based on cascade neural network
CN107292311A (en) A kind of recognition methods of the Characters Stuck identifying code based on neutral net
CN104021376A (en) Verification code identifying method and device
US10430687B2 (en) Trademark graph element identification method, apparatus and system, and computer storage medium
CN106407980A (en) Image processing-based bank card number recognition method
CN101599124A (en) A kind of from video image the method and apparatus of separating character
CN107454118A (en) Identifying code acquisition methods and device, login method and system
CN110008931A (en) In conjunction with the mixing recognition methods of fingerprint and finger venous information
CN104933138A (en) Webpage crawler system and webpage crawling method
CN105184932A (en) Method and device for recognizing persons through intelligent access control machine
CN105117707A (en) Regional image-based facial expression recognition method
CN109101810A (en) A kind of text method for recognizing verification code based on OCR technique
CN109993142A (en) Two dimensional code identity identifying method based on finger portion multi-modal biological characteristic
CN103136522A (en) Finger vein identification technical scheme
CN110988137A (en) Abnormal sound detection system and method based on time-frequency domain characteristics
CN104715250B (en) cross laser detection method and device
CN108171229A (en) A kind of recognition methods of hollow adhesion identifying code and system
CN110502745A (en) Text information evaluation method, device, computer equipment and storage medium
CN108112026B (en) WiFi identification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170222