CN108932533A - Identification model construction method and device, character identifying method and device - Google Patents

Identification model construction method and device, character identifying method and device Download PDF

Info

Publication number
CN108932533A
CN108932533A CN201810763049.0A CN201810763049A CN108932533A CN 108932533 A CN108932533 A CN 108932533A CN 201810763049 A CN201810763049 A CN 201810763049A CN 108932533 A CN108932533 A CN 108932533A
Authority
CN
China
Prior art keywords
character
identification model
obtains
image
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810763049.0A
Other languages
Chinese (zh)
Inventor
闫博飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Papaya Mobile Technology Co Ltd
Original Assignee
Beijing Papaya Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Papaya Mobile Technology Co Ltd filed Critical Beijing Papaya Mobile Technology Co Ltd
Priority to CN201810763049.0A priority Critical patent/CN108932533A/en
Publication of CN108932533A publication Critical patent/CN108932533A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The embodiment of the present invention provides a kind of identification model construction method and device, character identifying method and device, the identification model construction method includes: a. acquisition training dataset, it includes multiple images that the training data, which is concentrated, includes character string segment to be identified in each image;B. the image that the training data is concentrated is cut out, obtains processing data set;C. the processing data set is inputted in neural network model and is calculated, obtain calculated result, include parameter to be determined in the neural network model;D. the calculated result and the annotation results corresponding with the training dataset prestored are compared to obtain and calculates error;E. the parameter to be determined is adjusted according to the calculating error;Repetition step c-e is being limited in range until the calculating error, obtains Model of Target Recognition.

Description

Identification model construction method and device, character identifying method and device
Technical field
The present invention relates to field of image processings, know in particular to a kind of identification model construction method and device, character Other method and device.
Background technique
The rogue attacks of the website malicious script of non-real real user in order to prevent, as robot is frequently accessed, Brute Force is close Code, brush ticket etc. can carry out turing test to user in inlet setting identifying code (CAPTCHA) recognizer.This is in certain journey The authenticity of visitor can be determined on degree, so that it is guaranteed that the safety of website, also reduces user experience to a certain extent.
In order to increase the difficulty of identification, now with miscellaneous identifying code, such as letter, picture etc..What the present invention mentioned Identifying code is the identifying code of English alphabet and number combination.With most of character identifying code the difference is that, identifying code Interior character number is indefinite, character distortion, adhesion, such as google.
For some well-meant crawlers or script, identifying code is then one of obstacle, to further crawl data band Many inconveniences are carried out.Therefore, many methods and interface are provided on network to identify identifying code (antiCAPTCHA).
Summary of the invention
In view of this, the embodiment of the present invention be designed to provide a kind of identification model construction method and device, character are known Other method and device.
In a first aspect, a kind of identification model construction method provided in an embodiment of the present invention, comprising:
A. training dataset is obtained, it includes word to be identified in each image that it includes multiple images that the training data, which is concentrated, Symbol string segment;
B. the image that the training data is concentrated is cut out, obtains processing data set;
C. the processing data set is inputted in neural network model and is calculated, obtain calculated result, the nerve net It include parameter to be determined in network model;
D. the calculated result and the annotation results corresponding with the training dataset prestored are compared and is counted Calculate error;
E. the parameter to be determined is adjusted according to the calculating error;
Repetition step c-e is being limited in range until the calculating error, obtains Model of Target Recognition.
Second aspect, the embodiment of the present invention also provide a kind of character identifying method, use above-mentioned identification model building side The identification model identification string that method obtains, which comprises
Images to be recognized is cut out, standard picture is obtained;
The standard picture is inputted the identification model to calculate, obtains recognition result.
The third aspect, the embodiment of the present invention also provide a kind of identification model construction device, comprising:
Module is obtained, for obtaining training dataset, it includes multiple images that the training data, which is concentrated, is wrapped in each image Containing character string segment to be identified;
Module is cut out, the image for concentrating the training data is cut out, and obtains processing data set;
Training module is used for:
The processing data set is inputted in neural network model and is calculated, calculated result, the neural network are obtained It include parameter to be determined in model;
The calculated result and the annotation results corresponding with the training dataset prestored are compared and calculated Error;
The parameter to be determined is adjusted according to the calculating error;
Repeat above process until the calculatings error restriction range in, obtain Model of Target Recognition.
Fourth aspect, the embodiment of the present invention also provide a kind of character recognition device, for using above-mentioned identification model structure The identification model identification string that construction method obtains, described device include:
Image obtains module, for images to be recognized to be cut out, obtains standard picture;
Identification module calculates for the standard picture to be inputted the identification model, obtains recognition result.
Compared with prior art, the identification model construction method and device, character identifying method and dress of the embodiment of the present invention It sets, obtaining identification model by training neural network model can enable identification model better adapt to character in character string Number is indefinite, and character distorts the figure to be formed, and further, in training, the image first concentrated to training data is cut out energy The demand of neural network model is enough better adapted to, the identification model recognition accuracy that training obtains is improved.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, special embodiment below, and appended by cooperation Attached drawing is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided in an embodiment of the present invention.
Fig. 2 is the flow chart of identification model construction method provided in an embodiment of the present invention.
Fig. 3 is neural network model schematic diagram used in identification model construction method provided in an embodiment of the present invention.
Fig. 4 is the detail flowchart of the step S202 of identification model construction method provided in an embodiment of the present invention.
Fig. 5 is the flow chart of character identifying method provided in an embodiment of the present invention.
Fig. 6 is the functional block diagram of identification model construction device provided in an embodiment of the present invention.
Fig. 7 is the functional block diagram of character recognition device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
As shown in Figure 1, being the block diagram of an electronic equipment 100.The electronic equipment 100 includes memory 111, deposits Store up controller 112, processor 113, Peripheral Interface 114, input-output unit 115, display unit 116.Ordinary skill Personnel are appreciated that structure shown in FIG. 1 is only to illustrate, and do not cause to limit to the structure of electronic equipment 100.For example, electric Sub- equipment 100 may also include than shown in Fig. 1 more perhaps less component or with the configuration different from shown in Fig. 1.This Electronic equipment 100 described in embodiment can be personal computer, image processing server, mobile unit or mobile electron and set It is standby to wait the calculating equipment with image-capable.
The memory 111, storage control 112, processor 113, Peripheral Interface 114, input-output unit 115 and aobvious Show that each element of unit 116 is directly or indirectly electrically connected between each other, to realize the transmission or interaction of data.For example, these Element can be realized by one or more communication bus or signal wire be electrically connected between each other.It is stored in the memory 111 At least one is with the operation system of the software function module of the form of software or firmware (Firmware) or the electronic equipment 100 Solidification has software function module in system (Operating System, OS).The processor 113 is stored for executing in memory Executable module.
Wherein, the memory 111 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 111 is for storing program, the processor 113 after receiving and executing instruction, Described program is executed, method performed by the electronic equipment 100 that the process that any embodiment of the embodiment of the present invention discloses defines can To be applied in processor 113, or realized by processor 113.
The processor 113 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 113 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processes Device (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general Processor can be microprocessor or the processor is also possible to any conventional processor etc..
Various input/output devices are couple processor 113 and memory 111 by the Peripheral Interface 114.Some In embodiment, Peripheral Interface 114, processor 113 and storage control 112 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.
The input-output unit 115 is for being supplied to user input data.The input-output unit 115 can be, But it is not limited to, mouse and keyboard etc..
The display unit 116 provided between the electronic equipment 100 and user an interactive interface (such as user behaviour Make interface) or for display image data give user reference.In the present embodiment, the display unit can be liquid crystal display Or touch control display.It can be the capacitance type touch control screen or resistance of support single-point and multi-point touch operation if touch control display Formula touch screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one Or the touch control operation generated simultaneously at multiple positions, and the touch control operation that this is sensed transfers to processor to be calculated and located Reason.
In an example, the electronic equipment 100 can be following configuration: operating system: ubuntu 16.04;Software Frame: tensorflow 1.7;Hardware: two 1080 TGPU of GeForce GTX;8 core Intel (R) Core (TM) i7- 7700K CPU@4.20GHz;2 8G memory bars.Certainly, the electronic equipment 100 is also possible to other higher or same rank Configuration equipment.
Inventor is based on graph processing technique to the method for traditional broken identification identifying code and carries out the study found that its principle class It is similar to license plate recognition technology or OCR (Optical Character Recognition, optical character identification) technology.These Method flow is as follows:
The first step pre-processes picture, such as Character segmentation, rotation and skew control;
Second step manually carries out feature extraction, such as edge, texture information to character;
Third step selects suitable sorter model (random forest, SVM classifier etc.), the feature extracted with second step Model is trained;
Finally, being predicted with trained model identifying code picture.
Different from traditional recognition methods, the application then uses nerual network technique to identify identifying code.
Traditional identifying code identification technology, pretreatment work is not only more many and diverse, and when character number is indefinite, position is not It is fixed, when the degree of adhesion or distortion is bigger, such as google identifying code, can not often it be split, and cause recognition result Decline.In addition, manual extraction feature also compares consuming resource.To sum up, although traditional method can crack identifying code, But it is more many and diverse, adaptability is not strong.
Based on defect in the prior art, present inventor can effectively be solved by research by following example Certainly above-mentioned technical problem is described in detail below.
Embodiment one
Referring to Fig. 2, being the process of the identification model construction method provided in an embodiment of the present invention applied to electronic equipment Figure.Detailed process shown in Fig. 2 will be described in detail below.
Step S201 obtains training dataset.
In the present embodiment, the training dataset can be the identifying code figure obtained on network in history designated time period Picture.The training dataset can also be to be collected and stored in designated memory space in advance, need using when from described specified It is obtained in memory space.
In the present embodiment, it includes character string figure to be identified in each image that it includes multiple images that the training data, which is concentrated, Block.
Data volume needed for training dataset in the present embodiment can meet training pattern, for example, the training dataset In include character picture up to a million.
In the present embodiment, the training dataset can be divided into multiple groups, wherein including training group, validation group and Test group.Wherein, the amount of images of the training group is greater than the quantity of the validation group and the quantity of test group.In a reality In example, the ratio of the training group, validation group and test group can be 1:0.3:0.3, certainly, be also possible to it according to demand Its ratio, such as can be 1:0.2:0.3,1:0.3:0.2,1:0.4:0.4 etc..
In one embodiment, the training dataset can be the figure formed by character string segment and white space Picture.
The image that the training data is concentrated is cut out by step S202, obtains processing data set.
In the present embodiment, the white space in the image that the training data is concentrated can be punctured.
The processing data set is inputted in neural network model and is calculated, obtains calculated result by step S203.
In the present embodiment, the neural network model is convolutional neural networks model.
In the present embodiment, the convolutional neural networks model may include: one layer of input layer, four layers of hidden layer, one layer it is complete Articulamentum, one layer dropout layers, one layer of output layer.
Further, the convolution kernel number of four layers of hidden layer is respectively as follows: 32,64,96,128;The convolution kernel it is big Small is 5*5, and pond layer is 2*2.
As shown in figure 3, neural network model includes:
One input layer, in an example, the input layer dimension are that can be 64*160.
4 layers of hidden layer respectively indicate are as follows: layer1, layer2, layer3, layer4.Wherein, the structure packet of layer1 It includes: convolution kernel (filter): 32*5*5, max_pool:2*2, activation:relu.The output of Layer1 indicates are as follows: Layer1out:32*80*32.The structure of Layer2 includes: convolution kernel (filter): 64*5*5, max_pool:2*2, Activation:relu.The output of Layer2 indicates are as follows: layer2out:16*40*64.The structure of Layer3 includes: convolution kernel (filter): 96*5*5, max_pool:2*2, activation:relu.The output of Layer3 indicates are as follows: layer3out:8* 20*96.The structure of Layer4 includes: convolution kernel (filter): 128*5*5, max_pool:2*2, activation:relu. The output of Layer4 indicates are as follows: layer2out:4*10*128.
One flatten layers: for that the multidimensional tensor result of convolutional layer layer4 will be launched into one-dimensional vector, Flatten layers of output indicates are as follows: flatten out:4*10*128=5120.
One full articulamentum (dense1): neuron number 2048.
One dropout layers: dropout rate (turnover rate) is 0.4.
One output layer: neuron number is (character N* (M+ spcial character 1));Wherein N indicates that the longest of setting is tested The length of code is demonstrate,proved, M expression may be as the character of identifying code.For example, when only generating identifying code from 26 letters, institute State the 36 of M value.When generating identifying code from ten numbers and 26 letters, the M can be with value for 36.In figure, with N Value is for 8, M value is 26.
Loss function: cross entropy loss function tf.losses.softmax_cross_entropy.
Optimizer: Adam.
It include parameter to be determined in the neural network model in the present embodiment.
Step S204 compares the calculated result and the annotation results corresponding with the training dataset prestored It obtains calculating error.
In the present embodiment, corresponding word after the available calculating identification by the neural network model of the calculated result Symbol string, carries out the character string of character string annotation results corresponding with the training dataset to match the available calculating Error.
Step S205 is adjusted the parameter to be determined according to the calculating error.
Neural network model can be enable more preferably to identify to the character string in image by adjusting parameter.
Repetition step S203-S205 is being limited in range until the calculating error, obtains Model of Target Recognition.
In the present embodiment, the restriction range can be what those skilled in the art set according to actual demand, the limit Determining range can be less than 0.03% equal numerical value.
The identification model construction method of the embodiment of the present invention, obtaining identification model by training neural network model can make Identification model can better adapt to that character number in character string is indefinite, and character distorts the figure to be formed, further, in training When, the image first concentrated to training data is cut out the demand of neural network model of capable of better adapting to, and improves trained The identification model recognition accuracy arrived.
Further, it before training it is not necessary to the training dataset without complicated pretreatment, does not need pair Character is split;Without manually extracting feature;Without the selection sort device as conventional machines study;Feature extraction and classifying All by all being completed by neural network oneself, it is only necessary to serve data to model.
As shown in figure 4, the step S202 may include following steps.
The training data is concentrated each image to be converted into grayscale image by step S2021.
Step S2022 cuts out the white space in every grayscale image, obtains the corresponding character figure of every grayscale image.
In the present embodiment, the white space cut out in every grayscale image obtains the corresponding character figure of every grayscale image The step of, comprising: it from top to bottom to every grayscale image, from left to right projects, obtains two grey-level sequences;It cuts described two Pixel value is lower than all the points of given threshold in grey-level sequence.
In one embodiment, the boundary point of available non-empty white region, can according to the boundary point of non-empty white region To puncture the white space on boundary, cutting out for white space can be completed.For example, the non-empty white region of a grayscale image is all The min coordinates value of x-axis direction and maximum coordinate value among are 2cm and 5.2cm, the min coordinates value and maximum in y-axis direction respectively in point Coordinate value is 0.9cm and 2.1cm respectively.Therefore, it is necessary to puncture the coordinate of x-axis direction in the grayscale image less than 2cm and seat Mark is greater than all the points of 5.2cm, and the coordinate for needing to puncture y-axis direction in the grayscale image is less than 0.9cm and coordinate is greater than The all the points of 2.1cm.Certainly, the min coordinates value and maximum coordinate value among of the x-axis direction of non-empty white region, the minimum in y-axis direction Coordinate value and maximum coordinate value among can be different according to different pictures.
The character figure is carried out size conversion, keeps the size of all character figures identical by step S2023, formation processing Data set.
The step S2023 includes:
Obtain the longest setting character length of character string in multiple described character figures;
The character figure that character length in the character figure is less than the setting character length is used into designated character polishing, with Make in every character figure that character length is identical as the setting character length, obtains the identical character figure of size.
In an example, the longest setting character length of character string can be eight in multiple character figures.At this point, described defeated Layer out: neuron number is (character 8* (M+ spcial character 1)).For another example the M can be with value for 26.
In the present embodiment, when it is described set character length as eight when, detect in image that other training datas are concentrated When character is less than eight, then eight are supplemented to the designated character.For example, when character quantity is five in a wherein image, then Three designated characters can be supplemented in the picture to form the character string of eight characters.
Embodiment two
Method in the present embodiment identifies word using the identification model that the identification model construction method in embodiment one obtains Symbol string, referring to Fig. 5, being the flow chart of the character identifying method provided in an embodiment of the present invention applied to electronic equipment.Below Detailed process shown in fig. 5 will be described in detail.
Step S301, images to be recognized is cut out, and obtains standard picture.
The standard picture is inputted the identification model and calculated, obtains recognition result by step S302.
The character identifying method of the embodiment of the present invention, obtaining identification model by training neural network model can make to identify Model can better adapt to that character number in character string is indefinite, and character distorts the figure to be formed, further, in training, The image first concentrated to training data is cut out the demand that can better adapt to neural network model, improves what training obtained Identification model recognition accuracy.
Embodiment three
Referring to Fig. 6, being the functional block diagram of identification model construction device provided in an embodiment of the present invention.This implementation The modules of identification model construction device in example are used to execute each step in embodiment one.The identification model building Device comprises the following modules.
Module 401 is obtained, for obtaining training dataset, it includes multiple images, each image that the training data, which is concentrated, In include character string segment to be identified.
Module 402 is cut out, the image for concentrating the training data is cut out, and obtains processing data set.
Training module 403, is used for:
The processing data set is inputted in neural network model and is calculated, calculated result, the neural network are obtained It include parameter to be determined in model;
The calculated result and the annotation results corresponding with the training dataset prestored are compared and calculated Error;
The parameter to be determined is adjusted according to the calculating error.
Repeat above process until the calculatings error restriction range in, obtain Model of Target Recognition.
In the present embodiment, the module 402 of cutting out is also used to:
Each image is concentrated to be converted into grayscale image the training data;
The white space in every grayscale image is cut out, the corresponding character figure of every grayscale image is obtained;
The character figure is subjected to size conversion, keeps the size of all character figures identical, forms processing data set.
In the present embodiment, the module 402 of cutting out is also used to:
Obtain the longest setting character length of character string in multiple described character figures;
The character figure that character length in the character figure is less than the setting character length is used into designated character polishing, with Make in every character figure that character length is identical as the setting character length, obtains the identical character figure of size.
In the present embodiment, the module 402 of cutting out is also used to:
From top to bottom to every grayscale image, it from left to right projects, obtains two grey-level sequences;
Cut all the points that pixel value in described two grey-level sequences is lower than given threshold.
In the present embodiment, the neural network model is convolutional neural networks model.
In the present embodiment, the convolutional neural networks model includes: one layer of input layer, four layers of hidden layer, one layer of full connection Layer, one layer dropout layers, an output layer.
In the present embodiment, the convolution kernel number of four layers of hidden layer is respectively as follows: 32,64,96,128;The convolution kernel Size is 5*5, and pond layer is 2*2.
Other details about the present embodiment can be further with reference to the description in several embodiments above, herein no longer It repeats.
The identification model construction device of the embodiment of the present invention, obtaining identification model by training neural network model can make Identification model can better adapt to that character number in character string is indefinite, and character distorts the figure to be formed, further, in training When, the image first concentrated to training data is cut out the demand of neural network model of capable of better adapting to, and improves trained The identification model recognition accuracy arrived.
Example IV
Referring to Fig. 7, being the functional block diagram of identification model construction device provided in an embodiment of the present invention.This implementation The identification model that the identification model construction method that identification model construction device in example is used to provide using embodiment one obtains is known Other character string.The modules of character recognition device in the present embodiment are used to execute each step in embodiment one.It is described Character recognition device comprises the following modules.
Image obtains module 501, for images to be recognized to be cut out, obtains standard picture.
Identification module 502 calculates for the standard picture to be inputted the identification model, obtains recognition result.
Other details about the present embodiment can be further with reference to the description in several embodiments above, herein no longer It repeats.
The character recognition device of the embodiment of the present invention, obtaining identification model by training neural network model can make to identify Model can better adapt to that character number in character string is indefinite, and character distorts the figure to be formed, further, in training, The image first concentrated to training data is cut out the demand that can better adapt to neural network model, improves what training obtained Identification model recognition accuracy.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.It needs Illustrate, herein, relational terms such as first and second and the like be used merely to by an entity or operation with Another entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this realities The relationship or sequence on border.Moreover, the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device. In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Process, method, article or equipment in there is also other identical elements.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims (10)

1. a kind of identification model construction method characterized by comprising
A. training dataset is obtained, it includes character string to be identified in each image that it includes multiple images that the training data, which is concentrated, Segment;
B. the image that the training data is concentrated is cut out, obtains processing data set;
C. the processing data set is inputted in neural network model and is calculated, obtain calculated result, the neural network mould It include parameter to be determined in type;
D. the calculated result and the annotation results corresponding with the training dataset prestored are compared to obtain calculating mistake Difference;
E. the parameter to be determined is adjusted according to the calculating error;
Repetition step c-e is being limited in range until the calculating error, obtains Model of Target Recognition.
2. identification model construction method as described in claim 1, which is characterized in that step b, comprising:
Each image is concentrated to be converted into grayscale image the training data;
The white space in every grayscale image is cut out, the corresponding character figure of every grayscale image is obtained;
The character figure is subjected to size conversion, keeps the size of all character figures identical, forms processing data set.
3. identification model construction method as claimed in claim 2, which is characterized in that described that the character figure is carried out size turn The step of changing, obtain the identical character figure of size, forming processing data set, comprising:
Obtain the longest setting character length of character string in multiple described character figures;
The character figure that character length in the character figure is less than the setting character length is used into designated character polishing, so that often Character length is identical as the setting character length in character figure, obtains the identical character figure of size.
4. identification model construction method as claimed in claim 2, which is characterized in that the blank cut out in every grayscale image Region, the step of obtaining every grayscale image corresponding character figure, comprising:
From top to bottom to every grayscale image, it from left to right projects, obtains two grey-level sequences;
Cut all the points that pixel value in described two grey-level sequences is lower than given threshold.
5. identification model construction method as described in claim 1, which is characterized in that the neural network model is convolutional Neural Network model.
6. identification model construction method as claimed in claim 5, which is characterized in that the convolutional neural networks model includes: One layer of input layer, four layers of hidden layer, one layer of full articulamentum, one layer dropout layers, an output layer.
7. identification model construction method as claimed in claim 6, which is characterized in that the convolution kernel number of four layers of hidden layer It is respectively as follows: 32,64,96,128;The size of the convolution kernel is 5*5, and pond layer is 2*2.
8. a kind of character identifying method, which is characterized in that use identification model building side described in claim 1-7 any one The identification model identification string that method obtains, which comprises
Images to be recognized is cut out, standard picture is obtained;
The standard picture is inputted the identification model to calculate, obtains recognition result.
9. a kind of identification model construction device characterized by comprising
Obtain module, for obtaining training dataset, it includes multiple images that the training data, which is concentrated, in each image comprising to Identification string segment;
Module is cut out, the image for concentrating the training data is cut out, and obtains processing data set;
Training module is used for:
The processing data set is inputted in neural network model and is calculated, calculated result, the neural network model are obtained In include parameter to be determined;
The calculated result and the annotation results corresponding with the training dataset prestored are compared to obtain and calculate error;
The parameter to be determined is adjusted according to the calculating error;
Repeat above process until the calculatings error restriction range in, obtain Model of Target Recognition.
10. a kind of character recognition device, which is characterized in that for using identification model described in claim 1-7 any one The identification model identification string that construction method obtains, described device include:
Image obtains module, for images to be recognized to be cut out, obtains standard picture;
Identification module calculates for the standard picture to be inputted the identification model, obtains recognition result.
CN201810763049.0A 2018-07-12 2018-07-12 Identification model construction method and device, character identifying method and device Pending CN108932533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810763049.0A CN108932533A (en) 2018-07-12 2018-07-12 Identification model construction method and device, character identifying method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810763049.0A CN108932533A (en) 2018-07-12 2018-07-12 Identification model construction method and device, character identifying method and device

Publications (1)

Publication Number Publication Date
CN108932533A true CN108932533A (en) 2018-12-04

Family

ID=64447462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810763049.0A Pending CN108932533A (en) 2018-07-12 2018-07-12 Identification model construction method and device, character identifying method and device

Country Status (1)

Country Link
CN (1) CN108932533A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399912A (en) * 2019-07-12 2019-11-01 广东浪潮大数据研究有限公司 A kind of method of character recognition, system, equipment and computer readable storage medium
CN110956133A (en) * 2019-11-29 2020-04-03 上海眼控科技股份有限公司 Training method of single character text normalization model, text recognition method and device
CN112149668A (en) * 2020-09-23 2020-12-29 北京智通云联科技有限公司 Method and system for identifying code spraying with edge marks
CN112598114A (en) * 2020-12-17 2021-04-02 海光信息技术股份有限公司 Power consumption model construction method, power consumption measurement method and device and electronic equipment
WO2022179361A1 (en) * 2021-02-24 2022-09-01 嘉楠明芯(北京)科技有限公司 Model translation method and device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654127A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 End-to-end-based picture character sequence continuous recognition method
CN107085730A (en) * 2017-03-24 2017-08-22 深圳爱拼信息科技有限公司 A kind of deep learning method and device of character identifying code identification
CN107967475A (en) * 2017-11-16 2018-04-27 广州探迹科技有限公司 A kind of method for recognizing verification code based on window sliding and convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654127A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 End-to-end-based picture character sequence continuous recognition method
CN107085730A (en) * 2017-03-24 2017-08-22 深圳爱拼信息科技有限公司 A kind of deep learning method and device of character identifying code identification
CN107967475A (en) * 2017-11-16 2018-04-27 广州探迹科技有限公司 A kind of method for recognizing verification code based on window sliding and convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘欢 等: "卷积神经网络在验证码识别上的应用与研究", 《计算机工程与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399912A (en) * 2019-07-12 2019-11-01 广东浪潮大数据研究有限公司 A kind of method of character recognition, system, equipment and computer readable storage medium
CN110956133A (en) * 2019-11-29 2020-04-03 上海眼控科技股份有限公司 Training method of single character text normalization model, text recognition method and device
CN112149668A (en) * 2020-09-23 2020-12-29 北京智通云联科技有限公司 Method and system for identifying code spraying with edge marks
CN112598114A (en) * 2020-12-17 2021-04-02 海光信息技术股份有限公司 Power consumption model construction method, power consumption measurement method and device and electronic equipment
CN112598114B (en) * 2020-12-17 2023-11-03 海光信息技术股份有限公司 Power consumption model construction method, power consumption measurement method, device and electronic equipment
WO2022179361A1 (en) * 2021-02-24 2022-09-01 嘉楠明芯(北京)科技有限公司 Model translation method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN108932533A (en) Identification model construction method and device, character identifying method and device
US10685462B2 (en) Automatic data extraction from a digital image
CN108416198B (en) Device and method for establishing human-machine recognition model and computer readable storage medium
US10445569B1 (en) Combination of heterogeneous recognizer for image-based character recognition
KR102326395B1 (en) System and method and product for recognizing multiple object inputs
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
CN106156766B (en) Method and device for generating text line classifier
CN108399386A (en) Information extracting method in pie chart and device
EP4172803A1 (en) Computerized information extraction from tables
CN110046622B (en) Targeted attack sample generation method, device, equipment and storage medium
CN111414888A (en) Low-resolution face recognition method, system, device and storage medium
CN105809090A (en) Method and system for face sex characteristic extraction
CN107862785A (en) Bill authentication method and device
CN111860309A (en) Face recognition method and system
CN110135889A (en) Method, server and the storage medium of intelligent recommendation book list
CN116311214B (en) License plate recognition method and device
Lv et al. Chinese character CAPTCHA recognition based on convolution neural network
CN111951283A (en) Medical image identification method and system based on deep learning
CN107330430A (en) Tibetan character recognition apparatus and method
CN111492407B (en) System and method for map beautification
CN114386013A (en) Automatic student status authentication method and device, computer equipment and storage medium
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN109446780A (en) A kind of identity identifying method, device and its storage medium
WO2022126917A1 (en) Deep learning-based face image evaluation method and apparatus, device, and medium
CN111931229B (en) Data identification method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181204