CN110909807A - Network verification code identification method and device based on deep learning and computer equipment - Google Patents

Network verification code identification method and device based on deep learning and computer equipment Download PDF

Info

Publication number
CN110909807A
CN110909807A CN201911179062.2A CN201911179062A CN110909807A CN 110909807 A CN110909807 A CN 110909807A CN 201911179062 A CN201911179062 A CN 201911179062A CN 110909807 A CN110909807 A CN 110909807A
Authority
CN
China
Prior art keywords
verification code
identification
data
sample
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911179062.2A
Other languages
Chinese (zh)
Inventor
邱富根
王彪
刘龙辉
赵海诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinlian Credit Reporting Co ltd
Original Assignee
Shenzhen Xinlian Credit Reporting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinlian Credit Reporting Co ltd filed Critical Shenzhen Xinlian Credit Reporting Co ltd
Priority to CN201911179062.2A priority Critical patent/CN110909807A/en
Publication of CN110909807A publication Critical patent/CN110909807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a network verification code identification method, a network verification code identification device and computer equipment based on deep learning, wherein the method comprises the following steps: acquiring verification code data to be identified; inputting verification code data to be recognized into a verification code recognition model for character recognition to obtain a recognition result; judging whether the identification result is correct or not; if the identification result is correct, automatically marking the corresponding verification code data to enter a training set and a test set; and if the identification result is incorrect, re-labeling the verification code data, putting the verification code data which are automatically labeled and the re-labeled verification code data into the verification code identification model for retraining so as to update the verification code identification model, and stopping until the identification rate of the verification code identification model reaches a preset threshold value. By taking a small amount of identifying code data as identifying code samples for training and updating the identifying code model according to the training result, the number of labeled samples required by the identifying code model for training/learning is effectively reduced, and the model identifying precision is improved.

Description

Network verification code identification method and device based on deep learning and computer equipment
Technical Field
The invention relates to the field of machine learning, in particular to a network verification code identification method and device based on deep learning and computer equipment.
Background
CAPTCHAs (CAPTCHA) is an abbreviation for "computer Automated Public training test tolls computers and Humans Apart", a common fully Automated program that distinguishes between a computer and a human. The research of the identifying code identifying technology is the category of artificial intelligence, and the machine can achieve the identifying effect of human eyes through the research of the identifying code identifying technology, so that the development of the artificial intelligence is greatly promoted. Scientific research institutions and scientific research personnel engaged in big data analysis need to acquire a large amount of data on the internet for scientific research, however, the existence of verification codes hinders smooth acquisition of data, and the development of verification code identification technology is beneficial to getting rid of the dilemma, so that the development of big data analysis related technology is promoted.
At present, before the verification code is deeply learned, enough marked sample numbers are needed. In the deep learning whole process, the data preprocessing occupies most of the time of the project process. For the network verification code, the most common four-letter + number combination has 36^4 ^ 1679616 combinations, for the distortion-adhered non-cutting verification code, one combination needs at least 10 pictures, so all samples need at least 1679616 × 10 ^ 16796160 pictures, and such labeling workload is huge, and a great cost is also needed by using a manual coding platform. If the sample size is less, the model identification accuracy rate is greatly influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a network verification code identification method and device based on deep learning and computer equipment.
In order to achieve the purpose, the invention adopts the following technical scheme: a network verification code identification method based on deep learning comprises the following steps:
acquiring verification code data to be identified;
inputting verification code data to be recognized into a verification code recognition model for character recognition to obtain a recognition result;
judging whether the identification result is correct or not;
if the identification result is correct, automatically marking the corresponding verification code data to enter a training set and a test set;
if the identification result is incorrect, re-marking the verification code data;
putting the automatically marked verification code data and the re-marked verification code data into a verification code identification model for retraining so as to update the verification code identification model until the identification rate of the verification code identification model reaches a preset threshold value;
the identifying code identifying model is obtained by training a convolutional neural network by using identifying code data with identification as sample data.
Further, the identifying code identification model is obtained by training a convolutional neural network by using identifying code data with identification as sample data, and comprises the following steps:
acquiring partial verification code data of a website as a verification code sample;
inputting the labeled sample into the existing convolutional neural network for training to obtain a sample output result;
inputting the sample output result and the image data with the identification into a loss function to obtain a loss value;
adjusting parameters of the convolutional neural network according to the loss value;
and learning the convolutional neural network by using sample data and a deep learning frame to obtain an identifying code identification model.
Further, the step of obtaining partial verification code data of the website as a verification code sample includes:
crawling website part verification code data as a verification code sample;
preprocessing the picture of the verification code sample;
and marking the verification code sample, and segmenting the training set and the test set.
Further, the step of pre-processing scaling of the picture of the captcha sample includes:
and carrying out scaling, noise reduction, binarization and normalization pretreatment on the picture of the verification code sample.
Further, the step of determining whether the recognition result is correct includes:
submitting the identification result to a website and receiving the return state of the website;
and judging the recognition result according to the return state.
The invention also provides a network identifying code identifying device based on deep learning, which comprises:
the data acquisition unit is used for acquiring verification code data to be identified;
the identification unit is used for inputting the verification code data to be identified into the verification code identification model for character identification so as to obtain an identification result;
a result judging unit for judging whether the recognition result is correct;
the marking unit is used for automatically marking the corresponding verification code data to enter a training set and a test set when the identification result is correct; when the identification result is incorrect, re-marking the verification code data;
and the retraining unit is used for putting the automatically marked verification code data and the re-marked verification code data into the verification code recognition model for retraining so as to obtain a new verification code recognition model, and repeating the steps by using the new model until the recognition rate of the verification code recognition model reaches an expected threshold value. Further, the method also comprises a model training unit, wherein the model training unit comprises:
the data acquisition subunit is used for acquiring partial verification code data of the website as a verification code sample;
the sample training subunit is used for inputting the labeled sample into the existing convolutional neural network for training to obtain a sample output result;
the loss value acquisition subunit is used for inputting the sample output result and the image data with the identification into a loss function to obtain a loss value;
the parameter adjusting subunit is used for adjusting the parameters of the convolutional neural network according to the loss value;
and the learning subunit is used for learning the convolutional neural network by using the sample data and adopting a deep learning framework to obtain an identifying code identification model.
Further, the data acquisition subunit comprises a data crawling module, a preprocessing module and a sample labeling module;
the data crawling module is used for crawling the verification code data of the website part as a verification code sample;
the preprocessing module is used for preprocessing the picture of the verification code sample;
and the sample marking module is used for marking the verification code sample and segmenting the training set and the test set.
The invention also provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the network authentication code identification method based on deep learning as described in any one of the above items when executing the computer program.
The present invention also provides a storage medium storing a computer program, which when executed by a processor can implement the deep learning-based network authentication code identification method as described in any one of the above.
Compared with the prior art, the invention has the beneficial effects that: according to the scheme, a small amount of verification code data is taken as the verification code sample to train the verification code recognition model, and the verification code recognition model is updated according to the training result, so that the number of labeled samples required by training/learning of the verification code recognition model is effectively reduced, and meanwhile, the model recognition precision is improved.
The invention is further described below with reference to the accompanying drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a network verification code identification method based on deep learning according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a network verification code identification method based on deep learning according to an embodiment of the present invention;
fig. 3 is a schematic sub-flow diagram of a network verification code identification method based on deep learning according to an embodiment of the present invention;
fig. 4 is a schematic sub-flow chart of a network verification code identification method based on deep learning according to an embodiment of the present invention;
fig. 5 is a schematic sub-flow chart of a network verification code identification method based on deep learning according to an embodiment of the present invention;
fig. 6 is a schematic block diagram of a network verification code recognition apparatus based on deep learning according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a model training unit of a deep learning-based network authentication code recognition apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of a data acquisition subunit of a deep learning-based network authentication code identification apparatus according to an embodiment of the present invention;
fig. 9 is a schematic block diagram of a result determination unit of a deep learning-based network authentication code identification apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Training set, which is used to fit the model, and train the classification model by setting the parameters of the classifier. When the verification set is subsequently combined, different values of the same parameter can be selected, and a plurality of classifiers are fitted.
And a Test set (Test set), and after an optimal model is obtained through the training set and the verification set, the Test set is used for model prediction. Used to measure the performance and classification capability of the optimal model. That is, the test set may be treated as a data set that never exists, and after the model parameters have been determined, the test set is used for model performance evaluation.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a network verification code identification method based on deep learning according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a network verification code identification method based on deep learning according to an embodiment of the present invention. The network identifying code identifying method based on deep learning is applied to a terminal, a server and the terminal carry out data interaction, the terminal obtains identifying code data to be identified from a corresponding website and transmits the identifying code data to the server, identifying is carried out by aligning identifying code identifying models in the server, and whether an identifying result is correct is verified.
Fig. 2 is a schematic flowchart of a network verification code identification method based on deep learning according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S10 to S60.
And S10, acquiring the verification code data to be identified.
In one embodiment, most web sites require authentication using an authentication code to distinguish the current requester from a person or computer program. Specifically, the verification code data acquisition may be to crawl the verification code data in the website through a crawler and identify the acquired verification code data, or may also be to use a part of the acquired verification code data as a labeling sample for training a verification code identification model.
And S20, inputting the verification code data to be recognized into the verification code recognition model for character recognition to obtain a recognition result.
The obtained verification code data can be recognized through the verification code recognition model, and a recognition result is output.
Specifically, in this embodiment, the above-mentioned verification code identification model is obtained by training a convolutional neural network with the verification code data with identification as sample data.
In one embodiment, referring to FIG. 3, the training step of the captcha recognition model includes steps S210-S250.
S210, acquiring partial verification code data of the website as a verification code sample.
In this embodiment, the identifying code sample refers to identifying code image data with text identifiers, the identifying code sample can be divided into a plurality of training sets and a small part of test sets, the convolutional neural network is trained by using the plurality of training sets to select the convolutional neural network with a smaller loss value, and the test set is used for testing.
Referring to FIG. 4, in one embodiment, step S210 includes steps S211-S213.
And S211, crawling the verification code data of the website part as a verification code sample.
S212, preprocessing the picture of the verification code sample;
and S213, marking the verification code sample, and segmenting the training set and the test set.
In this embodiment, a part/a small amount of the verification code data is automatically crawled by a crawler program to be used as a verification code sample, the verification code sample is labeled, a training set and a testing set are segmented, and then the picture of the verification code sample is preprocessed, so that the verification code picture can be directly used for training a verification code recognition model. In the early stage, a small number of verification code samples are added into an existing convolutional neural network for training, verification code identification model parameters are continuously updated by using a training result, and when verification code data are identified by using a verification code model subsequently, the verification code data which are successfully identified are automatically marked and divided into training samples to train the verification code identification model again, so that the use of the verification code sample data in the model training/learning process can be reduced, the verification code identification rate is ensured, and the time cost and the resource cost of model training/learning are reduced.
In one embodiment, step S212 specifically includes: and carrying out scaling, noise reduction, binarization and normalization pretreatment on the picture of the verification code sample.
The image of the identifying code sample is subjected to zooming, noise reduction, binarization and normalization preprocessing, so that the identifying efficiency of the image of the identifying code sample can be improved, and the identifying accuracy of the identifying code can be improved.
And S220, inputting the verification code sample into the existing convolutional neural network for training to obtain a sample output result.
In this embodiment, the loss function and the convolution nerve need to be constructed first. The method comprises the steps of constructing a convolutional neural network to carry out convolutional calculation on image data so as to achieve the effects of classification and target positioning, wherein each network needs to adopt a loss function to carry out loss value calculation in the training process, the loss value represents the difference between an output result and an actual result, the smaller the loss value is, the smaller the difference is, the better the network is trained, and vice versa. The convolutional neural network is widely applied to computer vision tasks such as target detection, semantic segmentation, object classification and the like, obtains a very good effect, and shows good adaptability to the vision tasks.
And S230, inputting the sample output result and the image data with the identification into a loss function to obtain a loss value.
In this embodiment, the sample output result refers to a probability sequence, that is, a text sequence number predicted by sample data. The loss value is calculated through a loss function, the loss value represents the difference between the output result and the actual result, and the smaller the loss value is, the smaller the difference is, the better the network training is, and vice versa. The convolutional neural network is widely applied to computer vision tasks such as target detection, semantic segmentation, object classification and the like, obtains a very good effect, and shows good adaptability to the vision tasks.
And S240, adjusting parameters of the convolutional neural network according to the loss value.
In this embodiment, the parameter of the convolutional neural network is continuously adjusted, and learning and training are performed for multiple times to obtain the convolutional neural network meeting the requirement, specifically, the convolutional neural network is trained by tensierflow and is very easily deployed on a server or a terminal through tensierflow tflite and tensierflow map after being converted into a corresponding character recognition model. The method not only supports normal controller operation, but also can perform controller acceleration on corresponding equipment through open computing language (openc).
And S250, learning the convolutional neural network by using sample data and a deep learning frame to obtain an identifying code identifying model.
In this embodiment, the verification code identification model obtained through learning by the convolutional neural network can be used for verifying the website verification code and outputting a verification code identification result in the following process.
And S30, judging whether the identification result is correct.
In this embodiment, the verification data to be recognized is input into the verification code recognition model for recognition, a recognition result is output, and the recognition result is submitted to the corresponding website, so that whether the recognition result is correct or not can be determined, that is, whether the parameters of the verification code recognition model need to be adjusted or whether the verification code data needs to be re-recognized or not can be determined.
Referring to fig. 5, in one embodiment, step S30 includes steps S310 and S320.
S310, submitting the identification result to a website and receiving the return state of the website;
and S320, judging the identification result according to the return state.
In an embodiment, the verification code data of the website is acquired as the verification code data to be recognized, and after the recognized recognition result is submitted to the corresponding website, the website can judge the submitted recognition result and return a judgment result, namely the return state, wherein the return state can be a return verification success state or a return verification failure state, so that whether the recognition result is correct can be directly judged according to the return state.
And S40, if the identification result is correct, automatically marking the corresponding verification code data to enter a training set and a test set.
In this embodiment, when the recognition result is correct, automatically labeling the corresponding verification code data into the training set and the test set, specifically, automatically labeling the corresponding verification code data, performing corresponding image processing, scaling, denoising, normalizing, and then entering the training set and the test set. And marking successfully identified verification code data to be used as verification code samples for subsequently training a verification code identification model, and reducing the number of verification code samples for training while ensuring the identification rate of the verification code identification model by using the successfully identified verification code data as model training sample data, thereby reducing the model training/learning cost and improving the model training efficiency.
And when the identification accuracy of the identifying code identification model reaches a preset value, a final identifying code identification model can be generated according to the existing parameters and is used for directly identifying the identifying code subsequently.
And S50, if the recognition result is incorrect, re-labeling the verification code data.
In this embodiment, when the recognition result is incorrect, the verification code data is re-labeled, and the re-labeled verification code data is used as a sample to retrain the verification code recognition model.
And S60, putting the automatically marked verification code data and the re-marked verification code data into the verification code recognition model for retraining so as to update the verification code recognition model until the recognition rate of the verification code recognition model reaches a preset threshold value.
In this embodiment, after the identification is completed, according to a feedback result (whether the identification result is correct) of the website, the verification code data is labeled to obtain a labeled verification code sample, the verification code data which is automatically labeled and the verification code data which is labeled again are placed in the verification code identification model to be retrained, so as to update the verification code identification model, the updated verification code identification model is used for identifying new verification code data to obtain an identification result, and after the identification rate of the verification code identification model reaches a preset threshold value, the verification code identification model is directly used as a final identification model, and the training is stopped. The specific preset threshold is set in advance, for example, the preset threshold is 95% or other specific values.
According to the scheme, a small amount of verification code data is taken as a verification code sample to train the verification code recognition model, and the verification code recognition model is updated according to the training result, so that labeled samples required by the verification code recognition model training/learning are effectively reduced; meanwhile, the verification code data identified is marked according to the identification result so as to be used for subsequently training a verification code identification model, and the verification code identification success rate can be improved.
Fig. 6 is a schematic block diagram of a network authentication code recognition apparatus based on deep learning according to an embodiment of the present invention. As shown in fig. 6, the present invention also provides a network authentication code identification device based on deep learning, corresponding to the above network authentication code identification method based on deep learning. The device for identifying the network verification code based on the deep learning comprises a unit for executing the method for identifying the network verification code based on the deep learning, and the device can be configured in a desktop computer, a tablet computer, a portable computer, and the like. Specifically, referring to fig. 6, in the present embodiment, an automatic download driving apparatus includes a data obtaining unit 10, a recognition unit 20, a result determining unit 30, a labeling unit 40, and a retraining unit 60.
And the data acquisition unit 10 is used for acquiring the verification code data to be identified.
In one embodiment, most web sites require authentication using an authentication code to distinguish the current requester from a person or computer program. Specifically, the verification code data acquisition may be to crawl the verification code data in the website through a crawler and identify the acquired verification code data, or may also be to use a part of the acquired verification code data as a labeling sample for training a verification code identification model.
And the identification unit 20 is used for inputting the verification code data to be identified into the verification code identification model for character identification so as to obtain an identification result.
The obtained verification code data can be recognized through the verification code recognition model, and a recognition result is output.
Specifically, in this embodiment, the above-mentioned verification code identification model is obtained by training a convolutional neural network with the verification code data with identification as sample data.
In this embodiment, the network verification code recognition apparatus based on deep learning of the present invention further includes a model training unit 50, referring to fig. 7, the model training unit 50 includes a data obtaining subunit 51, a sample training subunit 52, a loss value obtaining subunit 53, a parameter adjusting subunit 54, and a learning subunit 55.
And a data acquiring subunit 51, configured to acquire partial verification code data of the website as a verification code sample.
In this embodiment, the identifying code sample refers to identifying code image data with text identifiers, the identifying code sample can be divided into a plurality of training sets and a small part of test sets, the convolutional neural network is trained by using the plurality of training sets to select the convolutional neural network with a smaller loss value, and the test set is used for testing.
Referring to fig. 8, the data acquisition subunit 51 includes a data crawling module 511, a preprocessing module 512, and a sample labeling module 513.
And the data crawling module 511 is used for crawling the website part verification code data as a verification code sample.
And the preprocessing module 512 is configured to preprocess the picture of the verification code sample.
And the sample labeling module 513 is configured to label the captcha sample, and segment the training set and the test set.
In this embodiment, a part/a small amount of the verification code data is automatically crawled by a crawler program to be used as a verification code sample, the verification code sample is labeled, a training set and a testing set are segmented, and then the picture of the verification code sample is preprocessed, so that the verification code picture can be directly used for training a verification code recognition model. In the early stage, a small number of verification code samples are added into an existing convolutional neural network for training, verification code identification model parameters are continuously updated by using a training result, and when verification code data are identified by using a verification code model subsequently, the verification code data which are successfully identified are automatically marked and divided into training samples to train the verification code identification model again, so that the use of the verification code sample data in the model training/learning process can be reduced, the verification code identification rate is ensured, and the time cost and the resource cost of model training/learning are reduced.
In an embodiment, the preprocessing module 512 is specifically configured to: and carrying out scaling, noise reduction, binarization and normalization pretreatment on the picture of the verification code sample. The image of the identifying code sample is subjected to zooming, noise reduction, binarization and normalization preprocessing, so that the identifying efficiency of the image of the identifying code sample can be improved, and the identifying accuracy of the identifying code can be improved.
And the sample training subunit 52 is configured to input the labeled sample into an existing convolutional neural network for training, so as to obtain a sample output result.
In this embodiment, the loss function and the convolution nerve need to be constructed first. The method comprises the steps of constructing a convolutional neural network to carry out convolutional calculation on image data so as to achieve the effects of classification and target positioning, wherein each network needs to adopt a loss function to carry out loss value calculation in the training process, the loss value represents the difference between an output result and an actual result, the smaller the loss value is, the smaller the difference is, the better the network is trained, and vice versa. The convolutional neural network is widely applied to computer vision tasks such as target detection, semantic segmentation, object classification and the like, obtains a very good effect, and shows good adaptability to the vision tasks.
A loss value obtaining subunit 53, configured to input the sample output result and the identified image data into a loss function to obtain a loss value.
In this embodiment, the sample output result refers to a probability sequence, that is, a text sequence number predicted by sample data. The loss value is calculated through a loss function, the loss value represents the difference between the output result and the actual result, and the smaller the loss value is, the smaller the difference is, the better the network training is, and vice versa. The convolutional neural network is widely applied to computer vision tasks such as target detection, semantic segmentation, object classification and the like, obtains a very good effect, and shows good adaptability to the vision tasks.
And a parameter adjusting subunit 54, configured to adjust a parameter of the convolutional neural network according to the loss value.
In this embodiment, the parameter of the convolutional neural network is continuously adjusted, and learning and training are performed for multiple times to obtain the convolutional neural network meeting the requirement, specifically, the convolutional neural network is trained by tensierflow and is very easily deployed on a server or a terminal through tensierflow tflite and tensierflow map after being converted into a corresponding character recognition model. The method not only supports normal controller operation, but also can perform controller acceleration on corresponding equipment through open computing language (openc).
And the learning subunit 55 is configured to learn the convolutional neural network by using the sample data and using a deep learning framework to obtain the verification code identification model.
In this embodiment, the verification code identification model obtained through learning by the convolutional neural network can be used for verifying the website verification code and outputting a verification code identification result in the following process.
And a result judging unit 30 for judging whether the recognition result is correct.
In this embodiment, the verification data to be recognized is input into the verification code recognition model for recognition, a recognition result is output, and the recognition result is submitted to the corresponding website, so that whether the recognition result is correct or not can be determined, that is, whether the parameters of the verification code recognition model need to be adjusted or whether the verification code data needs to be re-recognized or not can be determined.
Referring to fig. 9, in one embodiment, the result determination unit 30 includes a result submitting module 31 and a status determining module 32.
And the result submitting module 31 is used for submitting the identification result to the website and receiving the return state of the website.
And the state judgment module 32 is used for judging the identification result according to the return state.
In an embodiment, the verification code data of the website is acquired as the verification code data to be recognized, and after the recognized recognition result is submitted to the corresponding website, the website can judge the submitted recognition result and return a judgment result, namely the return state, wherein the return state can be a return verification success state or a return verification failure state, so that whether the recognition result is correct can be directly judged according to the return state.
The marking unit 40 is used for automatically marking the corresponding verification code data to enter a training set and a test set when the identification result is correct; and when the identification result is incorrect, re-marking the verification code data.
In the embodiment, when the identification result is correct, the corresponding verification code data is automatically marked to enter the training set and the test set, the verification code data which is successfully identified is marked to be used as the verification code sample for subsequent training of the verification code identification model, and the verification code data which is successfully identified is used as the model training sample data, so that the number of the verification code samples for training can be reduced while the identification rate of the verification code identification model is ensured, the model training/learning cost is reduced, and the model training efficiency is improved. And in addition, when the identification result is incorrect, the verification code data is re-labeled, and the re-labeled verification code data is used as a sample for re-identification.
And when the identification accuracy of the identifying code identification model reaches a preset value, a final identifying code identification model can be generated according to the existing parameters and is used for directly identifying the identifying code subsequently.
And the retraining unit 60 is configured to place the automatically labeled verification code data and the relabeled verification code data into the verification code recognition model for retraining to obtain a new verification code recognition model, and repeat the above steps using the new model until the recognition rate of the verification code recognition model reaches an expected threshold.
In this embodiment, after the identification is completed, according to a feedback result (whether the identification result is correct) of the website, the verification code data is labeled to obtain a labeled verification code sample, the verification code data which is automatically labeled and the verification code data which is labeled again are placed in the verification code identification model to be retrained, so as to update the verification code identification model, the updated verification code identification model is used for identifying new verification code data to obtain an identification result, and after the identification rate of the verification code identification model reaches a preset threshold value, the verification code identification model is directly used as a final identification model, and the training is stopped. The specific preset threshold is set in advance, and may be, for example, 95% or other specific values.
According to the scheme, a small amount of verification code data is taken as a verification code sample to train the verification code recognition model, and the verification code recognition model is updated according to the training result, so that labeled samples required by the verification code recognition model training/learning are effectively reduced; meanwhile, the verification code data identified is marked according to the identification result so as to be used for subsequently training a verification code identification model, and the verification code identification success rate can be improved.
It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation processes of the automatic download driving apparatus and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, no further description is provided herein.
Referring to fig. 10, fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal or a server, where the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device. The server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 10, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 include program instructions that, when executed, cause the processor 502 to perform a deep learning-based network authentication code identification method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute a deep learning-based network authentication code identification method.
The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 10 is a block diagram of only a portion of the configuration relevant to the present teachings and is not intended to limit the computing device 500 to which the present teachings may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A network verification code identification method based on deep learning is characterized by comprising the following steps:
acquiring verification code data to be identified;
inputting verification code data to be recognized into a verification code recognition model for character recognition to obtain a recognition result;
judging whether the identification result is correct or not;
if the identification result is correct, automatically marking the corresponding verification code data to enter a training set and a test set;
if the identification result is incorrect, re-marking the verification code data;
putting the automatically marked verification code data and the re-marked verification code data into a verification code identification model for retraining so as to update the verification code identification model until the identification rate of the verification code identification model reaches a preset threshold value;
the identifying code identifying model is obtained by training a convolutional neural network by using identifying code data with identification as sample data.
2. The deep learning-based network authentication code identification method according to claim 1, wherein the authentication code identification model is obtained by training a convolutional neural network by using identified authentication code data as sample data, and comprises:
acquiring partial verification code data of a website as a verification code sample;
inputting the verification code sample into an existing convolutional neural network for training to obtain a sample output result;
inputting the sample output result and the image data with the identification into a loss function to obtain a loss value;
adjusting parameters of the convolutional neural network according to the loss value;
and learning the convolutional neural network by using sample data and a deep learning frame to obtain an identifying code identification model.
3. The deep learning-based network authentication code identification method according to claim 2, wherein the step of obtaining the partial authentication code data of the website as the authentication code sample comprises:
crawling website part verification code data as a verification code sample;
preprocessing the picture of the verification code sample;
and marking the verification code sample, and segmenting the training set and the test set.
4. The deep learning-based network authentication code identification method according to claim 3, wherein the step of pre-processing and scaling the picture of the authentication code sample comprises:
and carrying out scaling, noise reduction, binarization and normalization pretreatment on the picture of the verification code sample.
5. The method for identifying the network verification code based on the deep learning of claim 1, wherein the step of judging whether the identification result is correct comprises the following steps:
submitting the identification result to a website and receiving the return state of the website;
and judging the recognition result according to the return state.
6. A network verification code recognition device based on deep learning is characterized by comprising:
the data acquisition unit is used for acquiring verification code data to be identified;
the identification unit is used for inputting the verification code data to be identified into the verification code identification model for character identification so as to obtain an identification result;
a result judging unit for judging whether the recognition result is correct;
the marking unit is used for automatically marking the corresponding verification code data to enter a training set and a test set when the identification result is correct; when the identification result is incorrect, re-marking the verification code data;
and the retraining unit is used for putting the automatically marked verification code data and the re-marked verification code data into the verification code recognition model for retraining so as to obtain a new verification code recognition model, and repeating the steps by using the new model until the recognition rate of the verification code recognition model reaches an expected threshold value.
7. The deep learning based network authentication code recognition device according to claim 6, further comprising a model training unit, wherein the model training unit comprises:
the data acquisition subunit is used for acquiring partial verification code data of the website as a verification code sample;
the sample training subunit is used for inputting the verification code sample into the existing convolutional neural network for training to obtain a sample output result;
the loss value acquisition subunit is used for inputting the sample output result and the image data with the identification into a loss function to obtain a loss value;
the parameter adjusting subunit is used for adjusting the parameters of the convolutional neural network according to the loss value;
and the learning subunit is used for learning the convolutional neural network by using the sample data and adopting a deep learning framework to obtain an identifying code identification model.
8. The deep learning-based network authentication code recognition device according to claim 7, wherein the data acquisition subunit comprises a data crawling module, a sample labeling module and a preprocessing module;
the data crawling module is used for crawling the verification code data of the website part as a verification code sample;
the preprocessing module is used for preprocessing the picture of the verification code sample;
and the sample marking module is used for marking the verification code sample and segmenting the training set and the test set.
9. A computer device comprising a memory having a computer program stored thereon and a processor that implements the deep learning based network authentication code identification method according to any one of claims 1 to 5 when the processor executes the computer program.
10. A storage medium storing a computer program which, when executed by a processor, implements the deep learning-based network authentication code identification method according to any one of claims 1 to 5.
CN201911179062.2A 2019-11-26 2019-11-26 Network verification code identification method and device based on deep learning and computer equipment Pending CN110909807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911179062.2A CN110909807A (en) 2019-11-26 2019-11-26 Network verification code identification method and device based on deep learning and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911179062.2A CN110909807A (en) 2019-11-26 2019-11-26 Network verification code identification method and device based on deep learning and computer equipment

Publications (1)

Publication Number Publication Date
CN110909807A true CN110909807A (en) 2020-03-24

Family

ID=69819927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911179062.2A Pending CN110909807A (en) 2019-11-26 2019-11-26 Network verification code identification method and device based on deep learning and computer equipment

Country Status (1)

Country Link
CN (1) CN110909807A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626309A (en) * 2020-05-26 2020-09-04 北京墨云科技有限公司 Website fingerprint identification method based on deep learning
CN111753845A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 AI-based verification code picture identification method, device, equipment and storage medium
CN111753846A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Website verification method, device, equipment and storage medium based on RPA and AI
CN111767380A (en) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 Model adaptive retraining method and device, electronic equipment and storage medium
CN112417918A (en) * 2020-11-13 2021-02-26 珠海格力电器股份有限公司 Two-dimensional code identification method and device, storage medium and electronic equipment
CN112434806A (en) * 2020-11-18 2021-03-02 浙江大华技术股份有限公司 Deep learning training method and device, computer equipment and storage medium
CN112487398A (en) * 2020-12-15 2021-03-12 厦门市美亚柏科信息股份有限公司 Automatic character type identifying code identifying method, terminal equipment and storage medium
CN112561902A (en) * 2020-12-23 2021-03-26 天津光电通信技术有限公司 Chip inverse reduction method and system based on deep learning
CN113158842A (en) * 2021-03-31 2021-07-23 中国工商银行股份有限公司 Identification method, system, device and medium
CN113360881A (en) * 2021-07-22 2021-09-07 大象慧云信息科技(江苏)有限公司 Verification code identification method and system based on deep learning, electronic equipment and medium
CN115001771A (en) * 2022-05-25 2022-09-02 武汉极意网络科技有限公司 Verification code defense method, system, equipment and storage medium based on automatic updating
CN117764101A (en) * 2024-02-22 2024-03-26 成都普什信息自动化有限公司 RFID (radio frequency identification) tag-based wine product anti-counterfeiting verification method, system and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107085730A (en) * 2017-03-24 2017-08-22 深圳爱拼信息科技有限公司 A kind of deep learning method and device of character identifying code identification
CN107360137A (en) * 2017-06-15 2017-11-17 深圳市牛鼎丰科技有限公司 Construction method and device for the neural network model of identifying code identification
US20190080475A1 (en) * 2016-03-14 2019-03-14 Siemens Aktiengesellschaft Method and system for efficiently mining dataset essentials with bootstrapping strategy in 6dof pose estimate of 3d objects
CN109919160A (en) * 2019-03-04 2019-06-21 深圳先进技术研究院 Method for recognizing verification code, device, terminal and storage medium
CN109933975A (en) * 2019-03-20 2019-06-25 山东浪潮云信息技术有限公司 A kind of method for recognizing verification code and system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080475A1 (en) * 2016-03-14 2019-03-14 Siemens Aktiengesellschaft Method and system for efficiently mining dataset essentials with bootstrapping strategy in 6dof pose estimate of 3d objects
CN107085730A (en) * 2017-03-24 2017-08-22 深圳爱拼信息科技有限公司 A kind of deep learning method and device of character identifying code identification
CN107360137A (en) * 2017-06-15 2017-11-17 深圳市牛鼎丰科技有限公司 Construction method and device for the neural network model of identifying code identification
CN109919160A (en) * 2019-03-04 2019-06-21 深圳先进技术研究院 Method for recognizing verification code, device, terminal and storage medium
CN109933975A (en) * 2019-03-20 2019-06-25 山东浪潮云信息技术有限公司 A kind of method for recognizing verification code and system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杜绪晗: "基于少量样本的深度学习图像识别算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
杨汉宏等: "《露天矿交通运输安全预警预控原理及实践》", 31 January 2017 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626309A (en) * 2020-05-26 2020-09-04 北京墨云科技有限公司 Website fingerprint identification method based on deep learning
CN111767380A (en) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 Model adaptive retraining method and device, electronic equipment and storage medium
CN111753845A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 AI-based verification code picture identification method, device, equipment and storage medium
CN111753846A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Website verification method, device, equipment and storage medium based on RPA and AI
CN112417918A (en) * 2020-11-13 2021-02-26 珠海格力电器股份有限公司 Two-dimensional code identification method and device, storage medium and electronic equipment
CN112434806A (en) * 2020-11-18 2021-03-02 浙江大华技术股份有限公司 Deep learning training method and device, computer equipment and storage medium
CN112487398A (en) * 2020-12-15 2021-03-12 厦门市美亚柏科信息股份有限公司 Automatic character type identifying code identifying method, terminal equipment and storage medium
CN112561902A (en) * 2020-12-23 2021-03-26 天津光电通信技术有限公司 Chip inverse reduction method and system based on deep learning
CN113158842A (en) * 2021-03-31 2021-07-23 中国工商银行股份有限公司 Identification method, system, device and medium
CN113360881A (en) * 2021-07-22 2021-09-07 大象慧云信息科技(江苏)有限公司 Verification code identification method and system based on deep learning, electronic equipment and medium
CN115001771A (en) * 2022-05-25 2022-09-02 武汉极意网络科技有限公司 Verification code defense method, system, equipment and storage medium based on automatic updating
CN115001771B (en) * 2022-05-25 2024-01-26 武汉极意网络科技有限公司 Verification code defending method, system, equipment and storage medium based on automatic updating
CN117764101A (en) * 2024-02-22 2024-03-26 成都普什信息自动化有限公司 RFID (radio frequency identification) tag-based wine product anti-counterfeiting verification method, system and medium
CN117764101B (en) * 2024-02-22 2024-05-07 成都普什信息自动化有限公司 RFID (radio frequency identification) tag-based wine product anti-counterfeiting verification method, system and medium

Similar Documents

Publication Publication Date Title
CN110909807A (en) Network verification code identification method and device based on deep learning and computer equipment
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
CN111160569A (en) Application development method and device based on machine learning model and electronic equipment
EP2806374A1 (en) Method and system for automatic selection of one or more image processing algorithm
CN110598686B (en) Invoice identification method, system, electronic equipment and medium
CN111428485B (en) Judicial document paragraph classifying method, device, computer equipment and storage medium
CN108491866B (en) Pornographic picture identification method, electronic device and readable storage medium
CN111626177B (en) PCB element identification method and device
US20210390370A1 (en) Data processing method and apparatus, storage medium and electronic device
CN111222588B (en) Back door sample detection method, system and device
CN111680753A (en) Data labeling method and device, electronic equipment and storage medium
CN115810135A (en) Method, electronic device, storage medium, and program product for sample analysis
CN111046394A (en) Method and system for enhancing anti-attack capability of model based on confrontation sample
CN111444986A (en) Building drawing component classification method and device, electronic equipment and storage medium
CN111382740A (en) Text picture analysis method and device, computer equipment and storage medium
CN110796210A (en) Method and device for identifying label information
CN112989256B (en) Method and device for identifying web fingerprint in response information
CN113221601A (en) Character recognition method, device and computer readable storage medium
CN110490056A (en) The method and apparatus that image comprising formula is handled
CN114386013A (en) Automatic student status authentication method and device, computer equipment and storage medium
CN114022738A (en) Training sample acquisition method and device, computer equipment and readable storage medium
CN112560463A (en) Text multi-labeling method, device, equipment and storage medium
CN111507420A (en) Tire information acquisition method, tire information acquisition device, computer device, and storage medium
CN109657710B (en) Data screening method and device, server and storage medium
CN110321883A (en) Method for recognizing verification code and device, readable storage medium storing program for executing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200324

RJ01 Rejection of invention patent application after publication