CN112270325A - Character verification code recognition model training method, recognition method, system, device and medium - Google Patents

Character verification code recognition model training method, recognition method, system, device and medium Download PDF

Info

Publication number
CN112270325A
CN112270325A CN202011238297.7A CN202011238297A CN112270325A CN 112270325 A CN112270325 A CN 112270325A CN 202011238297 A CN202011238297 A CN 202011238297A CN 112270325 A CN112270325 A CN 112270325A
Authority
CN
China
Prior art keywords
character
verification code
target
characters
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011238297.7A
Other languages
Chinese (zh)
Other versions
CN112270325B (en
Inventor
魏小文
何晓力
李可玮
张芸蜻
孙晨阳
黄小云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Travel Network Technology Shanghai Co Ltd
Original Assignee
Ctrip Travel Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Travel Network Technology Shanghai Co Ltd filed Critical Ctrip Travel Network Technology Shanghai Co Ltd
Priority to CN202011238297.7A priority Critical patent/CN112270325B/en
Publication of CN112270325A publication Critical patent/CN112270325A/en
Application granted granted Critical
Publication of CN112270325B publication Critical patent/CN112270325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a training method, a recognition method, a system, equipment and a medium for a character verification code recognition model, wherein the training method comprises the following steps: establishing a standard character category library, wherein the standard character category library comprises a plurality of characters and category vectors corresponding to the characters; acquiring a plurality of character verification code sample images, and naming the characters contained in the corresponding images; acquiring a target position offset of each character in each character verification code sample image through a character position prediction model; matching characters in the names of the character identifying code sample images with characters in a standard character type library to obtain target type vectors of the characters in the character identifying code sample images; and training the character verification code recognition model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code recognition model. The method and the device can improve the accuracy and efficiency of character verification code identification and improve the generation efficiency of training samples.

Description

Character verification code recognition model training method, recognition method, system, device and medium
Technical Field
The invention relates to the field of deep learning, in particular to a training method, a recognition method, a system, equipment and a medium for a character verification code recognition model.
Background
Authentication codes are a common fully automated program that distinguishes whether a user is a computer or a person. The character verification code is widely applied to internet services and is used as a tool for judging whether a network request comes from a legal user, so that a large number of automatic requests of a machine are prevented, and the stable operation of a website server is guaranteed. Character authentication codes are one of the most commonly used types of authentication codes currently in use. This type of authentication code typically requires a user to perform a text recognition task, and the user needs to correctly recognize individual characters in a character image generated by computer graphics technology to be authenticated. In order to increase the difficulty of machine identification of the verification code, noise, interference lines, etc. are usually accompanied in the image of the verification code as interference. In addition, some captchas employ chinese characters as the characters to be recognized. Because the strokes of Chinese characters are more and the line structure is more complex, the traditional image recognition method is difficult to effectively segment the foreground and the interference background of the characters, so that the recognition success rate is low and the time consumption is long. In addition, the traditional image recognition method usually depends on a large amount of labeled data for training, each image is labeled and verified manually, the whole process is time-consuming and labor-consuming, and a large amount of funds are consumed.
Disclosure of Invention
In view of the above-mentioned deficiencies of the prior art, an object of the present invention is to provide a method, a system, a device and a medium for training a character verification code recognition model, so as to improve the accuracy and efficiency of character verification code recognition and improve the generation efficiency of training samples.
In order to achieve the above object, the present invention provides a training method for a character verification code recognition model, comprising:
establishing a standard character category library, wherein the standard character category library comprises a plurality of characters and category vectors corresponding to the characters;
acquiring a plurality of character verification code sample images, wherein each character verification code sample image is named by characters contained in a corresponding image;
acquiring a target position offset of each character in each character verification code sample image through a pre-trained character position prediction model;
matching characters in the names of the character identifying code sample images with characters in the standard character type library to obtain target type vectors of the characters in the character identifying code sample images;
and training a character verification code recognition model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code recognition model.
In a preferred embodiment of the present invention, the step of establishing a standard character category library comprises:
acquiring a plurality of first character verification code images, wherein each first character verification code image is named by characters contained in a corresponding image;
preprocessing characters contained in the names of the first character verification code images to obtain a plurality of target characters;
and establishing the standard character category library according to each target character and the category vector corresponding to each target character.
In a preferred embodiment of the present invention, the step of preprocessing the characters included in the names of the first character verification code images to obtain a plurality of target characters includes:
carrying out duplication elimination processing on characters contained in the names of the first character identifying code images;
counting the occurrence frequency of each character contained in the names of the first character verification code images, and filtering the characters with the frequency lower than a preset threshold value;
and after the duplication removal and the filtering, taking the residual characters in the names of the first character identifying code images as the target characters.
In a preferred embodiment of the present invention, the training process of the character position prediction model is as follows:
acquiring a plurality of second character verification code images, and marking the positions of all characters in the second character verification code images;
and training to obtain a character position prediction model according to the second character verification code image and the position of each character marked in the second character verification code image.
In a preferred embodiment of the present invention, the step of training the character verification code recognition model according to the target category vector and the target position offset of each character in each of the character verification code sample images includes:
dividing the plurality of character verification code sample images into a training set and a verification set;
inputting each character verification code sample image in the training set into a pre-established character verification code recognition model for processing, calculating a loss function value based on a class vector prediction result and a position offset prediction result of each character output by the character verification code recognition model, a corresponding target class vector and a target position offset, and then adjusting the weight of the character verification code recognition model based on the loss function value until the loss function value meets a preset condition;
and verifying the current character verification code recognition model according to the verification set, finishing training when the verification is passed, and taking the current character verification code recognition model as a target character verification code recognition model.
In order to achieve the above object, the present invention further provides a character verification code recognition method, including:
acquiring a target character verification code image to be recognized;
inputting the target character verification code image into the target character verification code recognition model obtained by training according to the method for processing to obtain a category vector prediction result and a position offset prediction result of each character in the target character verification code image;
acquiring a character recognition result of each character in the target character verification code image according to the class vector prediction result of each character in the target character verification code image and the class vector corresponding to each character in the standard character library;
and acquiring the absolute position of each character in the target character verification code image according to the position offset prediction result of each character in the target character verification code image.
In order to achieve the above object, the present invention provides a training system for a character verification code recognition model, comprising:
the standard character category library establishing module is used for establishing a standard character category library, and the standard character category library comprises a plurality of characters and category vectors corresponding to the characters;
the system comprises a sample image acquisition module, a character verification code analysis module and a character verification code analysis module, wherein the sample image acquisition module is used for acquiring a plurality of character verification code sample images, and each character verification code sample image is named by characters contained in a corresponding image;
the target position offset obtaining module is used for obtaining the target position offset of each character in each character verification code sample image through a pre-trained character position prediction model;
a target category vector obtaining module, configured to match characters in the name of each character verification code sample image with characters in the standard character category library, so as to obtain a target category vector of the characters in each character verification code sample image;
and the character verification code identification model training module is used for training the character verification code identification model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code identification model.
In a preferred embodiment of the present invention, the standard character category library creating module is specifically configured to:
acquiring a plurality of first character verification code images, wherein each first character verification code image is named by characters contained in a corresponding image;
preprocessing characters contained in the names of the first character verification code images to obtain a plurality of target characters;
and establishing the standard character category library according to each target character and the category vector corresponding to each target character.
In a preferred embodiment of the present invention, the pre-processing comprises:
carrying out duplication elimination processing on characters contained in the names of the first character identifying code images;
and counting the occurrence frequency of each character contained in the names of the first character verification code images, and filtering the characters with the frequency lower than a preset threshold value.
In a preferred embodiment of the present invention, the system further includes a character position prediction model training module, and the character position prediction model training module is specifically configured to:
acquiring a plurality of second character verification code images, and marking the positions of all characters in the second character verification code images;
and training to obtain a character position prediction model according to the second character verification code image and the position of each character marked in the second character verification code image.
In a preferred embodiment of the present invention, the training module of the character verification code recognition model is specifically configured to:
dividing the plurality of character verification code sample images into a training set and a verification set;
inputting each character verification code sample image in the training set into a pre-established character verification code recognition model for processing, calculating a loss function value based on a class vector prediction result and a position offset prediction result of each character output by the character verification code recognition model, a corresponding target class vector and a target position offset, and then adjusting the weight of the character verification code recognition model based on the loss function value until the loss function value meets a preset condition;
and verifying the current character verification code recognition model according to the verification set, finishing training when the verification is passed, and taking the current character verification code recognition model as a target character verification code recognition model.
In order to achieve the above object, the present invention also provides a character verification code recognition system, including:
the target image acquisition module is used for acquiring a target character verification code image to be identified;
the model processing module is used for inputting the target character verification code image into the target character verification code recognition model obtained by the system training and processing the target character verification code image to obtain a category vector prediction result and a position offset prediction result of each character in the target character verification code image;
the character recognition module is used for acquiring the character recognition result of each character in the target character verification code image according to the class vector prediction result of each character in the target character verification code image and the class vector corresponding to each character in the standard character library;
and the position acquisition module is used for acquiring the absolute position of each character in the target character verification code image according to the position offset prediction result of each character in the target character verification code image.
In order to achieve the above object, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the character verification code recognition model training method or the character verification code recognition method when executing the computer program.
In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned steps of the character verification code recognition model training method or the character verification code recognition method.
By adopting the technical scheme, the invention has the following beneficial effects:
firstly, establishing a standard character category library, and acquiring a plurality of character verification code sample images, wherein each character verification code sample image is named by characters contained in a corresponding image; then, acquiring a target position offset of each character in each character verification code sample image through a pre-trained character position prediction model, and matching characters in the name of each character verification code sample image with characters in the standard character class library to acquire a target class vector of the characters in each character verification code sample image; and finally, training a character verification code recognition model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code recognition model. The target character identifying code identifying model obtained through the training of the scheme can improve the accuracy and efficiency of character identifying code identification, meanwhile, the scheme can improve the generation efficiency of training samples, and each sample image is not required to be manually marked and verified.
Drawings
FIG. 1 is a flowchart of a method for training a character verification code recognition model according to embodiment 1 of the present invention;
FIG. 2 is a flow chart of a character verification code identification method according to embodiment 2 of the present invention;
fig. 3 is a block diagram of a training system of a character verification code recognition model according to embodiment 3 of the present invention;
fig. 4 is a block diagram of a character verification code recognition system according to embodiment 4 of the present invention;
fig. 5 is a hardware architecture diagram of an electronic device according to embodiment 5 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Example 1
The present embodiment provides a method for training a character verification code recognition model, which is used to train a character verification code recognition model for recognizing each character (for example, chinese character) in a character verification code image. As shown in fig. 1, the method comprises the steps of:
and S11, establishing a standard character category library, wherein the standard character category library comprises a plurality of (for example, 1000) characters and category vectors corresponding to the characters.
In this embodiment, the process of creating the standard character category library is as follows:
s111, a plurality of first character verification code images are obtained, wherein each first character verification code image is named by characters contained in a corresponding image.
For example, when a first character verification code image includes four characters, "then", "will", "plus", the image is named as "then will plus", and the positions of the characters in the name correspond to the positions of the characters in the image one by one.
And S112, preprocessing characters contained in the names of the first character verification code images to obtain a plurality of target characters.
Wherein, the preprocessing process can comprise the following steps: carrying out duplication elimination processing on characters contained in the names of the first character identifying code images so as to eliminate duplicate characters; and counting the occurrence frequency of each character contained in the names of the first character verification code images, and filtering the characters with the frequency lower than a preset threshold value or the characters with the lowest frequency ranking. After the deduplication and filtering processing, the remaining characters in the names of the first character verification code images are the target characters.
S113, building the standard character category library according to each target character and the category vector corresponding to each target character, so that the standard character category library includes a plurality of characters and category vectors corresponding to each character.
For example, in the standard character category library, the character "less" corresponds to category vector "196" and the character "only" corresponds to category vector "254".
In this embodiment, the standard character category library is preferably stored as a json file in a dictionary form.
S12, training a character position prediction model, wherein the training process of the character position prediction model is as follows:
s121, acquiring a plurality of (for example, 1000) second character verification code images, and marking the positions of the characters in the second character verification code images.
In this embodiment, the position of each character in the second captcha image may be noted by labelImg software. When the software marks images, the sample type and position need to be determined, the position of each character in the corresponding image is marked, and the character type is defaulted to be 0.
Preferably, the second character verification code image may be identical to the first character verification code image, so as to improve utilization of the image.
And S122, training to obtain a character position prediction model according to the second character verification code image and the positions of the characters marked in the second character verification code image.
In the present embodiment, the character position prediction model preferably employs a convolutional neural network.
S13, obtaining a plurality of character verification code sample images, wherein each character verification code sample image is named by characters contained in the corresponding image.
In the present embodiment, the number of the character verification code sample images generally far exceeds the number of the second character verification code images, for example, about 8000 character verification code sample images are obtained.
And S14, acquiring the target position offset of each character in each character verification code sample image through a pre-trained character position prediction model.
Specifically, the target position offset of each character in the character verification code sample image can be obtained by inputting each character verification code sample image into the character position prediction model.
And S15, matching the characters in the names of the character identifying code sample images with the characters in the standard character class library to obtain target class vectors of the characters in the character identifying code sample images.
For example, when the name of a certain character verification code sample image is "you-me-they", it indicates that four characters of "you", "me", "he" and "people" are included in the image. Assuming that in the standard character category library, the category vector corresponding to "you" is "48", the category vector corresponding to "i" is "39", the category vector corresponding to "he" is "68", and the category vector corresponding to "s" is "30", the target category vector of each character in the certain character verification code sample image is "48", "39", "68", "30".
And S16, training the character verification code recognition model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code recognition model. The character verification code recognition model adopts a target detection model, preferably a YOLOv3 model.
In this embodiment, the process of training the character verification code recognition model is as follows:
s161, dividing the plurality of character verification code sample images into a training set and a verification set.
For example, data with a ratio of M may be selected from the plurality of character verification code sample images as a training set, and the rest of the data may be used as a verification set, for example, when M is 0.8, 80% of the data in the plurality of character verification code sample images may be used as the training set, and the rest of the data may be used as the verification set.
And S162, inputting the sample images of the character verification codes in the training set into a pre-established character verification code recognition model for processing, calculating a loss function value based on the class vector prediction result and the position offset prediction result of each character output by the character verification code recognition model and the corresponding target class vector and target position offset, and then adjusting the weight of the character verification code recognition model based on the loss function value until the loss function value meets a preset condition. In this embodiment, the predetermined condition is that the loss function value converges or is smaller than a predetermined value, and the loss function value is weighted by a position loss and a category loss.
And S163, verifying the current character verification code recognition model according to the verification set, finishing training when the verification is passed, taking the current character verification code recognition model as a target character verification code recognition model, and increasing the number of images in the training set for retraining when the verification is not passed.
Preferably, before step S16 is executed, the present embodiment performs normalization processing on the target position offset amount of each character in the character verification code sample image in advance, that is, normalizes the target position offset amount of each character in the character verification code sample image to be within a range of (0,1), so as to reduce the importance of the non-important feature and improve the accuracy of the operation.
In the present embodiment, the number of characters in the first character captcha image, the second character captcha image, and the character captcha sample image is the same.
The target character identifying code identifying model obtained through the training of the scheme can improve the accuracy and efficiency of character identifying code identification, meanwhile, the scheme can improve the generation efficiency of training samples, and each sample image is not required to be manually marked and verified.
Example 2
The embodiment provides a character verification code identification method, which is used for identifying characters in a verification code image and is particularly suitable for identifying Chinese characters. As shown in fig. 2, the method comprises the steps of:
and S21, acquiring the target character verification code image to be recognized. In the present embodiment, the number of characters in the target character captcha image is the same as the number of characters in the first character captcha image, the second character captcha image, and the character captcha sample image.
And S22, inputting the target character verification code image into the target character verification code recognition model obtained by training according to the method in embodiment 1, and processing the target character verification code image to obtain a class vector prediction result and a position offset prediction result of each character in the target character verification code image.
And S23, acquiring the character recognition result of each character in the target character verification code image according to the class vector prediction result of each character in the target character verification code image and the class vector corresponding to each character in the standard character library.
Specifically, firstly, a category vector matched with a category vector prediction result of each character in the target character verification code image is searched from the standard character library, and then a character corresponding to the searched category vector in the standard character library is used as a character recognition result of each character in the target character verification code image. For example, when the result of prediction of the type vector of each character in the target character verification code image is "76", "35", "168" and "243" in this order, assuming that the type vector corresponding to "square" is "76", "the type vector corresponding to" large "is" 35 "," the type vector corresponding to "pack" is "168" and the type vector corresponding to "how" is "243" in the standard character type library, the result of character recognition of each character in the target character verification code image is "square", "large", "pack" and "how" in this order.
And S24, acquiring the absolute position of each character in the target character verification code image according to the position offset prediction result of each character in the target character verification code image.
By the technical scheme, the character recognition result and the absolute position of each character in the target character verification code image can be obtained, and the recognition efficiency and the accuracy are high.
Example 3
The present embodiment provides a training system for a character verification code recognition model, which is used for training a character verification code recognition model for recognizing each character (for example, chinese character) in a character verification code image. As shown in fig. 3, the system includes: the system comprises a standard character category library establishing module 11, a character position prediction model training module 12, a sample image obtaining module 13, a target position offset obtaining module 14, a target category vector obtaining module 15 and a character verification code recognition model training module 16. Each module is described in detail below:
the standard character category library establishing module 11 is configured to establish a standard character category library, where the standard character category library includes a plurality of (e.g., 1000) characters and a category vector corresponding to each character.
In this embodiment, the standard character category library creating module 11 specifically uses:
first, a plurality of first character verification code images are obtained, wherein each first character verification code image is named by characters contained in a corresponding image.
For example, when a first character verification code image includes four characters, "then", "will", "plus", the image is named as "then will plus", and the positions of the characters in the name correspond to the positions of the characters in the image one by one.
And then, preprocessing the characters contained in the names of the first character verification code images to obtain a plurality of target characters.
Wherein, the preprocessing process can comprise the following steps: carrying out duplication elimination processing on characters contained in the names of the first character identifying code images so as to eliminate duplicate characters; and counting the occurrence frequency of each character contained in the names of the first character verification code images, and filtering the characters with the frequency lower than a preset threshold value or the characters with the lowest frequency ranking. After the deduplication and filtering processing, the remaining characters in the names of the first character verification code images are the target characters.
And finally, establishing the standard character category library according to each target character and the category vector corresponding to each target character, so that the standard character category library comprises a plurality of characters and category vectors corresponding to the characters one by one.
For example, in the standard character category library, the character "less" corresponds to category vector "196" and the character "only" corresponds to category vector "254".
In this embodiment, the standard character category library is preferably stored as a json file in a dictionary form.
The character position prediction model training module 12 is configured to train a character position prediction model, where the character position prediction model is specifically configured to:
first, a plurality of (for example, 1000) second character verification code images are acquired, and the positions of the characters are marked in the second character verification code images.
In this embodiment, the position of each character in the second captcha image may be noted by labelImg software. When the software marks images, the sample type and position need to be determined, the position of each character in the corresponding image is marked, and the character type is defaulted to be 0.
Preferably, the second character verification code image may be identical to the first character verification code image, so as to improve utilization of the image.
And then, training to obtain a character position prediction model according to the second character verification code image and the positions of the characters marked in the second character verification code image.
In the present embodiment, the character position prediction model preferably employs a convolutional neural network.
The sample image obtaining module 13 is configured to obtain a plurality of character verification code sample images, where each character verification code sample image is named by a character included in a corresponding image.
In the present embodiment, the number of the character verification code sample images generally far exceeds the number of the second character verification code images, for example, about 8000 character verification code sample images are obtained.
The target position offset obtaining module 14 is configured to obtain a target position offset of each character in each character verification code sample image through a pre-trained character position prediction model.
Specifically, the target position offset of each character in the character verification code sample image can be obtained by inputting each character verification code sample image into the character position prediction model.
The target category vector obtaining module 15 is configured to match characters in the names of the character verification code sample images with characters in the standard character category library, so as to obtain target category vectors of the characters in the character verification code sample images.
For example, when the name of a certain character verification code sample image is "you-me-they", it indicates that four characters of "you", "me", "he" and "people" are included in the image. Assuming that in the standard character category library, the category vector corresponding to "you" is "48", the category vector corresponding to "i" is "39", the category vector corresponding to "he" is "68", and the category vector corresponding to "s" is "30", the target category vector of each character in the certain character verification code sample image is "48", "39", "68", "30".
The character verification code recognition model training module 16 is configured to train the character verification code recognition model according to the target category vector and the target position offset of each character in each of the character verification code sample images, so as to obtain a target character verification code recognition model. The character verification code recognition model adopts a target detection model, preferably a YOLOv3 model.
In this embodiment, the character verification code recognition model training module 16 is specifically configured to:
first, the character verification code sample images are divided into a training set and a verification set.
For example, data with a ratio of M may be selected from the plurality of character verification code sample images as a training set, and the rest of the data may be used as a verification set, for example, when M is 0.8, 80% of the data in the plurality of character verification code sample images may be used as the training set, and the rest of the data may be used as the verification set.
Then, inputting the sample images of the character verification codes in the training set into a character verification code recognition model established in advance for processing, calculating a loss function value based on the class vector prediction result and the position offset prediction result of each character output by the character verification code recognition model, and the corresponding target class vector and the target position offset, and then adjusting the weight of the character verification code recognition model based on the loss function value until the loss function value meets a preset condition. In this embodiment, the predetermined condition is that the loss function value converges or is smaller than a predetermined value, and the loss function value is weighted by a position loss and a category loss.
And finally, verifying the current character verification code recognition model according to the verification set, finishing training when the verification is passed, taking the current character verification code recognition model as a target character verification code recognition model, and increasing the number of images in the training set for retraining when the verification is not passed.
Preferably, the system of this embodiment further includes a normalization module, configured to perform normalization processing on the target position offset of each character in the character verification code sample image in advance before the character verification code recognition model training module 16 performs corresponding operations, that is, normalize the target position offset of each character in the character verification code sample image to be within a range of (0,1), so as to reduce the importance of the non-important features and improve the accuracy of the operation.
In the present embodiment, the number of characters in the first character captcha image, the second character captcha image, and the character captcha sample image is the same.
The target character identifying code identifying model obtained through the training of the scheme can improve the accuracy and efficiency of character identifying code identification, meanwhile, the scheme can improve the generation efficiency of training samples, and each sample image is not required to be manually marked and verified.
Example 4
The embodiment provides a character verification code recognition system, which is used for recognizing characters in a verification code image and is particularly suitable for recognizing Chinese characters. As shown in fig. 4, the system includes a target image acquisition module 21, a model processing module 22, a character recognition module 23, and a position acquisition module 24. Each module is described in detail below:
the target image obtaining module 21 is configured to obtain a target character verification code image to be recognized. In the present embodiment, the number of characters in the target character captcha image is the same as the number of characters in the first character captcha image, the second character captcha image, and the character captcha sample image.
The model processing module 22 is configured to input the target character verification code image into the target character verification code recognition model obtained through the system training according to embodiment 3, and process the target character verification code image, so as to obtain a class vector prediction result and a position offset prediction result of each character in the target character verification code image.
The character recognition module 23 is configured to obtain a character recognition result of each character in the target character verification code image according to the class vector prediction result of each character in the target character verification code image and the class vector corresponding to each character in the standard character library.
Specifically, firstly, a category vector matched with a category vector prediction result of each character in the target character verification code image is searched from the standard character library, and then a character corresponding to the searched category vector in the standard character library is used as a character recognition result of each character in the target character verification code image. For example, when the result of prediction of the type vector of each character in the target character verification code image is "76", "35", "168" and "243" in this order, assuming that the type vector corresponding to "square" is "76", "the type vector corresponding to" large "is" 35 "," the type vector corresponding to "pack" is "168" and the type vector corresponding to "how" is "243" in the standard character type library, the result of character recognition of each character in the target character verification code image is "square", "large", "pack" and "how" in this order.
The position obtaining module 24 is configured to obtain an absolute position of each character in the target character verification code image according to a result of predicting a position offset of each character in the target character verification code image.
By the technical scheme, the character recognition result and the absolute position of each character in the target character verification code image can be obtained, and the recognition efficiency and the accuracy are high.
Example 5
The present embodiment provides an electronic device, which may be represented in the form of a computing device (for example, may be a server device), and includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor may implement the character verification code recognition model training method or the character verification code recognition method provided in embodiment 1 or 2 when executing the computer program.
Fig. 5 shows a schematic diagram of a hardware structure of the present embodiment, and as shown in fig. 5, the electronic device 9 specifically includes:
at least one processor 91, at least one memory 92, and a bus 93 for connecting the various system components (including the processor 91 and the memory 92), wherein:
the bus 93 includes a data bus, an address bus, and a control bus.
Memory 92 includes volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and can further include Read Only Memory (ROM) 923.
Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 91 executes various functional applications and data processing, such as a character verification code recognition model training method or a character verification code recognition method provided in embodiment 1 or 2 of the present invention, by executing the computer program stored in the memory 92.
The electronic device 9 may further communicate with one or more external devices 94 (e.g., a keyboard, a pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 9 via the bus 93. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the character verification code recognition model training method or the character verification code recognition method provided in embodiment 1 or 2.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the character verification code recognition model training method or the character verification code recognition method described in embodiment 1 or 2 when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (14)

1. A training method for a character verification code recognition model is characterized by comprising the following steps:
establishing a standard character category library, wherein the standard character category library comprises a plurality of characters and category vectors corresponding to the characters;
acquiring a plurality of character verification code sample images, wherein each character verification code sample image is named by characters contained in a corresponding image;
acquiring a target position offset of each character in each character verification code sample image through a pre-trained character position prediction model;
matching characters in the names of the character identifying code sample images with characters in the standard character type library to obtain target type vectors of the characters in the character identifying code sample images;
and training a character verification code recognition model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code recognition model.
2. The method for training a character verification code recognition model according to claim 1, wherein the step of establishing a standard character class library comprises:
acquiring a plurality of first character verification code images, wherein each first character verification code image is named by characters contained in a corresponding image;
preprocessing characters contained in the names of the first character verification code images to obtain a plurality of target characters;
and establishing the standard character category library according to each target character and the category vector corresponding to each target character.
3. The method for training the character verification code recognition model according to claim 2, wherein the step of preprocessing the characters included in the names of the first character verification code images to obtain the target characters comprises:
carrying out duplication elimination processing on characters contained in the names of the first character identifying code images;
counting the occurrence frequency of each character contained in the names of the first character verification code images, and filtering the characters with the frequency lower than a preset threshold value;
and after the duplication removal and the filtering, taking the residual characters in the names of the first character identifying code images as the target characters.
4. The method for training the character verification code recognition model according to claim 1, wherein the training process of the character position prediction model is as follows:
acquiring a plurality of second character verification code images, and marking the positions of all characters in the second character verification code images;
and training to obtain a character position prediction model according to the second character verification code image and the position of each character marked in the second character verification code image.
5. The method for training a character verification code recognition model according to claim 1, wherein the step of training a character verification code recognition model according to the target category vector and the target position offset of each character in each of the character verification code sample images comprises:
dividing the plurality of character verification code sample images into a training set and a verification set;
inputting each character verification code sample image in the training set into a pre-established character verification code recognition model for processing, calculating a loss function value based on a class vector prediction result and a position offset prediction result of each character output by the character verification code recognition model, a corresponding target class vector and a target position offset, and then adjusting the weight of the character verification code recognition model based on the loss function value until the loss function value meets a preset condition;
and verifying the current character verification code recognition model according to the verification set, finishing training when the verification is passed, and taking the current character verification code recognition model as a target character verification code recognition model.
6. A character verification code recognition method is characterized by comprising the following steps:
acquiring a target character verification code image to be recognized;
inputting the target character verification code image into the target character verification code recognition model obtained by training according to the method of any one of claims 1-5, and processing to obtain a category vector prediction result and a position offset prediction result of each character in the target character verification code image;
acquiring a character recognition result of each character in the target character verification code image according to the class vector prediction result of each character in the target character verification code image and the class vector corresponding to each character in the standard character library;
and acquiring the absolute position of each character in the target character verification code image according to the position offset prediction result of each character in the target character verification code image.
7. A character verification code recognition model training system is characterized by comprising:
the standard character category library establishing module is used for establishing a standard character category library, and the standard character category library comprises a plurality of characters and category vectors corresponding to the characters;
the system comprises a sample image acquisition module, a character verification code analysis module and a character verification code analysis module, wherein the sample image acquisition module is used for acquiring a plurality of character verification code sample images, and each character verification code sample image is named by characters contained in a corresponding image;
the target position offset obtaining module is used for obtaining the target position offset of each character in each character verification code sample image through a pre-trained character position prediction model;
a target category vector obtaining module, configured to match characters in the name of each character verification code sample image with characters in the standard character category library, so as to obtain a target category vector of the characters in each character verification code sample image;
and the character verification code identification model training module is used for training the character verification code identification model according to the target category vector and the target position offset of each character in each character verification code sample image to obtain a target character verification code identification model.
8. The system for training a character verification code recognition model according to claim 7, wherein the standard character class library establishing module is specifically configured to:
acquiring a plurality of first character verification code images, wherein each first character verification code image is named by characters contained in a corresponding image;
preprocessing characters contained in the names of the first character verification code images to obtain a plurality of target characters;
and establishing the standard character category library according to each target character and the category vector corresponding to each target character.
9. The character validation code recognition model training system of claim 8, wherein the preprocessing comprises:
carrying out duplication elimination processing on characters contained in the names of the first character identifying code images;
and counting the occurrence frequency of each character contained in the names of the first character verification code images, and filtering the characters with the frequency lower than a preset threshold value.
10. The system for training a character validation code recognition model according to claim 7, further comprising a character position prediction model training module, the character position prediction model training module being specifically configured to:
acquiring a plurality of second character verification code images, and marking the positions of all characters in the second character verification code images;
and training to obtain a character position prediction model according to the second character verification code image and the position of each character marked in the second character verification code image.
11. The system for training a character validation code recognition model according to claim 7, wherein the training module is specifically configured to:
dividing the plurality of character verification code sample images into a training set and a verification set;
inputting each character verification code sample image in the training set into a pre-established character verification code recognition model for processing, calculating a loss function value based on a class vector prediction result and a position offset prediction result of each character output by the character verification code recognition model, a corresponding target class vector and a target position offset, and then adjusting the weight of the character verification code recognition model based on the loss function value until the loss function value meets a preset condition;
and verifying the current character verification code recognition model according to the verification set, finishing training when the verification is passed, and taking the current character verification code recognition model as a target character verification code recognition model.
12. A character validation code recognition system, comprising:
the target image acquisition module is used for acquiring a target character verification code image to be identified;
a model processing module, configured to input the target character verification code image into the target character verification code recognition model obtained through the system training according to any one of claims 7 to 11, and process the target character verification code image to obtain a category vector prediction result and a position offset prediction result of each character in the target character verification code image;
the character recognition module is used for acquiring the character recognition result of each character in the target character verification code image according to the class vector prediction result of each character in the target character verification code image and the class vector corresponding to each character in the standard character library;
and the position acquisition module is used for acquiring the absolute position of each character in the target character verification code image according to the position offset prediction result of each character in the target character verification code image.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the character validation code recognition model training method according to any one of claims 1 to 5 or the steps of the character validation code recognition method according to claim 6 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the character validation code recognition model training method according to any one of claims 1 to 5 or the steps of the character validation code recognition method according to claim 6.
CN202011238297.7A 2020-11-09 2020-11-09 Character verification code recognition model training method, recognition method, system, equipment and medium Active CN112270325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011238297.7A CN112270325B (en) 2020-11-09 2020-11-09 Character verification code recognition model training method, recognition method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011238297.7A CN112270325B (en) 2020-11-09 2020-11-09 Character verification code recognition model training method, recognition method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN112270325A true CN112270325A (en) 2021-01-26
CN112270325B CN112270325B (en) 2024-05-24

Family

ID=74339780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011238297.7A Active CN112270325B (en) 2020-11-09 2020-11-09 Character verification code recognition model training method, recognition method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN112270325B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627395A (en) * 2021-09-17 2021-11-09 平安银行股份有限公司 Text recognition method, text recognition device, text recognition medium and electronic equipment
CN114022887A (en) * 2022-01-04 2022-02-08 北京世纪好未来教育科技有限公司 Text recognition model training and text recognition method and device, and electronic equipment
CN115909019A (en) * 2022-10-26 2023-04-04 吉林省吉林祥云信息技术有限公司 Scheduling method in multi-model node scene of identifying code image

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014071813A (en) * 2012-10-01 2014-04-21 Fuji Xerox Co Ltd Character recognition device and program
CN104021376A (en) * 2014-06-05 2014-09-03 北京乐动卓越科技有限公司 Verification code identifying method and device
CN105760891A (en) * 2016-03-02 2016-07-13 上海源庐加佳信息科技有限公司 Chinese character verification code recognition method
CN107067006A (en) * 2017-04-20 2017-08-18 金电联行(北京)信息技术有限公司 A kind of method for recognizing verification code and system for serving data acquisition
CN107360137A (en) * 2017-06-15 2017-11-17 深圳市牛鼎丰科技有限公司 Construction method and device for the neural network model of identifying code identification
CN109086772A (en) * 2018-08-16 2018-12-25 成都市映潮科技股份有限公司 A kind of recognition methods and system distorting adhesion character picture validation code
CN110555298A (en) * 2019-08-30 2019-12-10 阿里巴巴(中国)有限公司 Verification code recognition model training and recognition method, medium, device and computing equipment
WO2019237549A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Verification code recognition method and apparatus, computer device, and storage medium
CN110674488A (en) * 2019-09-06 2020-01-10 深圳壹账通智能科技有限公司 Verification code identification method and system based on neural network and computer equipment
CN111259366A (en) * 2020-01-22 2020-06-09 支付宝(杭州)信息技术有限公司 Verification code recognizer training method and device based on self-supervision learning
CN111461979A (en) * 2020-03-30 2020-07-28 招商局金融科技有限公司 Verification code image denoising and identifying method, electronic device and storage medium
WO2020215573A1 (en) * 2019-04-22 2020-10-29 平安科技(深圳)有限公司 Captcha identification method and apparatus, and computer device and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014071813A (en) * 2012-10-01 2014-04-21 Fuji Xerox Co Ltd Character recognition device and program
CN104021376A (en) * 2014-06-05 2014-09-03 北京乐动卓越科技有限公司 Verification code identifying method and device
CN105760891A (en) * 2016-03-02 2016-07-13 上海源庐加佳信息科技有限公司 Chinese character verification code recognition method
CN107067006A (en) * 2017-04-20 2017-08-18 金电联行(北京)信息技术有限公司 A kind of method for recognizing verification code and system for serving data acquisition
CN107360137A (en) * 2017-06-15 2017-11-17 深圳市牛鼎丰科技有限公司 Construction method and device for the neural network model of identifying code identification
WO2019237549A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Verification code recognition method and apparatus, computer device, and storage medium
CN109086772A (en) * 2018-08-16 2018-12-25 成都市映潮科技股份有限公司 A kind of recognition methods and system distorting adhesion character picture validation code
WO2020215573A1 (en) * 2019-04-22 2020-10-29 平安科技(深圳)有限公司 Captcha identification method and apparatus, and computer device and storage medium
CN110555298A (en) * 2019-08-30 2019-12-10 阿里巴巴(中国)有限公司 Verification code recognition model training and recognition method, medium, device and computing equipment
CN110674488A (en) * 2019-09-06 2020-01-10 深圳壹账通智能科技有限公司 Verification code identification method and system based on neural network and computer equipment
CN111259366A (en) * 2020-01-22 2020-06-09 支付宝(杭州)信息技术有限公司 Verification code recognizer training method and device based on self-supervision learning
CN111461979A (en) * 2020-03-30 2020-07-28 招商局金融科技有限公司 Verification code image denoising and identifying method, electronic device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何福泉,林培, 李俊华等: "验证码的识别技术分析与研究", 甘肃科技纵横, vol. 48, no. 2, 25 February 2019 (2019-02-25), pages 2 - 4 *
刘欢;邵蔚元;郭跃飞;: "卷积神经网络在验证码识别上的应用与研究", 计算机工程与应用, no. 18, 15 September 2016 (2016-09-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627395A (en) * 2021-09-17 2021-11-09 平安银行股份有限公司 Text recognition method, text recognition device, text recognition medium and electronic equipment
CN113627395B (en) * 2021-09-17 2023-11-17 平安银行股份有限公司 Text recognition method, device, medium and electronic equipment
CN114022887A (en) * 2022-01-04 2022-02-08 北京世纪好未来教育科技有限公司 Text recognition model training and text recognition method and device, and electronic equipment
CN115909019A (en) * 2022-10-26 2023-04-04 吉林省吉林祥云信息技术有限公司 Scheduling method in multi-model node scene of identifying code image
CN115909019B (en) * 2022-10-26 2024-02-09 吉林省吉林祥云信息技术有限公司 Scheduling method in multi-model node scene for identifying verification code image

Also Published As

Publication number Publication date
CN112270325B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN112270325B (en) Character verification code recognition model training method, recognition method, system, equipment and medium
CN107729300B (en) Text similarity processing method, device and equipment and computer storage medium
US20190311114A1 (en) Man-machine identification method and device for captcha
CN110825857B (en) Multi-round question and answer identification method and device, computer equipment and storage medium
CN110472675A (en) Image classification method, image classification device, storage medium and electronic equipment
CN111125317A (en) Model training, classification, system, device and medium for conversational text classification
CN111125658B (en) Method, apparatus, server and storage medium for identifying fraudulent user
CN112148937B (en) Method and system for pushing dynamic epidemic prevention knowledge
CN112990035A (en) Text recognition method, device, equipment and storage medium
CN113239702A (en) Intention recognition method and device and electronic equipment
CN113707157B (en) Voiceprint recognition-based identity verification method and device, electronic equipment and medium
CN112989829B (en) Named entity recognition method, device, equipment and storage medium
CN111460811A (en) Crowdsourcing task answer verification method and device, computer equipment and storage medium
CN111723583A (en) Statement processing method, device, equipment and storage medium based on intention role
KR102363958B1 (en) Method, apparatus and program for analyzing customer perception based on double clustering
US20230186668A1 (en) Polar relative distance transformer
CN113555005B (en) Model training method, model training device, confidence determining method, confidence determining device, electronic equipment and storage medium
CN115730590A (en) Intention recognition method and related equipment
CN115858776A (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN112733645A (en) Handwritten signature verification method and device, computer equipment and storage medium
CN113159107A (en) Exception handling method and device
CN112381458A (en) Project evaluation method, project evaluation device, equipment and storage medium
CN111914536B (en) Viewpoint analysis method, viewpoint analysis device, viewpoint analysis equipment and storage medium
CN113542202B (en) Domain name identification method, device, equipment and computer readable storage medium
CN113408661B (en) Method, apparatus, device and medium for determining mismatching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant