CN107368831B - English words and digit recognition method in a kind of natural scene image - Google Patents

English words and digit recognition method in a kind of natural scene image Download PDF

Info

Publication number
CN107368831B
CN107368831B CN201710592890.3A CN201710592890A CN107368831B CN 107368831 B CN107368831 B CN 107368831B CN 201710592890 A CN201710592890 A CN 201710592890A CN 107368831 B CN107368831 B CN 107368831B
Authority
CN
China
Prior art keywords
layer
character
image
feature
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710592890.3A
Other languages
Chinese (zh)
Other versions
CN107368831A (en
Inventor
张军
涂丹
李硕豪
陈旭
雷军
郭强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710592890.3A priority Critical patent/CN107368831B/en
Publication of CN107368831A publication Critical patent/CN107368831A/en
Application granted granted Critical
Publication of CN107368831B publication Critical patent/CN107368831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The present invention provides the English words and digit recognition method in a kind of natural scene image, the identification problem of English words and number in natural scene is divided into feature extraction, feature focuses and feature identifies three steps, feature extraction is carried out to input picture using convolutional neural networks, attention mechanism is focused the useful information in characteristic sequence, long memory network in short-term identifies feature vector, to which deep neural network and attention mechanism be combined, when input picture is to deep neural network, final recognition result can be immediately arrived at.The present invention is not needed to carry out input picture sliding window operation and be identified to the character in window;The character string that the present invention exports simultaneously is final recognition result, does not need merging algorithm and integrates to the character string after identification.

Description

English words and digit recognition method in a kind of natural scene image
Technical field
The invention belongs to technical field of character recognition, relates to the use of deep neural network and attention mechanism carries out nature field English words and digit recognition method in scape image.
Background technique
Text in natural scene often carries very important information, it can be used to describe the interior of the image Hold.The text information automatically obtained in image can help people more effectively to understand image and store, press to image The processing such as contracting, retrieval.Relative to natural scene character detecting method, natural scene character recognition method is to having been detected by Character area is identified.English and number are used as a kind of universal language, occur extensively in the scene of countries in the world, know Other English words and numerical significance are great.However, the position of natural scene Chinese and English text and number different from Handwritten Digits Recognition Set, size, font, illumination, visual angle, shape there is variability, and the background of natural scene character is also considerably complicated, institute With the English words in natural scene, there are many technological difficulties for needing to capture with number identification.
Existing natural scene Text region algorithm be usually all the bottom of from and on algorithm, see document [Neumann L, Matas J.'Real-time lexicon-free scene text localization and recognition',IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38,(9), Pp.1872-1885], that is, first with sliding window operation and traditional classifier in image English words and number each of Character is identified, due to not necessarily there is character in window, then after also needing that merging algorithm is recycled to identify these Character string is integrated.But there are two limitations for this method: 1. identify character using sliding window method and traditional classifier Accuracy rate is not high;2. character recognition and merging algorithm are to separate training, the error that each generates will be directly delivered to In final recognition result, cause Text region precision not high.
Summary of the invention
It is an object of the invention to solve these limitations, deep neural network and attention mechanism are combined, and will combine Model is trained and identifies neural network afterwards as a whole, on the basis of the real operation there are currently no sliding window, gives One image comprising English words and number directly exports recognition result.
The principle of the present invention is as follows: firstly, extracting using in the widely applied convolutional neural networks of computer vision field The two dimensional character matrix of input picture, under the action of convolutional neural networks, each column in matrix are represented in input picture The depth characteristic of corresponding region is serialized to obtain characteristic sequence according to column direction to two dimensional character matrix;Then, note is utilized Power mechanism of anticipating extracts the information relevant to character in characteristic sequence, filters redundancy, obtains feature vector, so-called attention Mechanism is exactly to observe things according to the observing pattern of human vision with the mode focused, filters out garbage, is deep learning In common model;Finally, successively identifying the English in image according to spatial order from left to right using long memory network in short-term Text and number.
The technical scheme is that English words and digit recognition method in a kind of natural scene image, for defeated Entering image is the gray level image comprising English words and number, and this method combines deep neural network and attention mechanism, and By the neural network in conjunction with after, model is trained and identifies as a whole, and real there are currently no the bases of sliding window operation On, a given image comprising English words and number directly exports recognition result, specifically include the following steps:
Step (1), carry out feature extraction to the image of input: the present invention is using the convolutional Neural in deep neural network Network carries out feature extraction to input picture, by the output of convolutional neural networks as feature extraction as a result, with traditional volume Product neural network output three-dimensional feature matrix is different, and the output for the convolutional neural networks that the present invention designs is two dimensional character matrix. Convolutional neural networks are from output is input to successively by convolutional layer 1, batch normalization layer 1, pond layer 1, convolutional layer 2, batch standard Change layer 2, pond layer 2, convolutional layer 3, batch normalization layer 3, convolutional layer 4, batch normalization layer 4, pond layer 4, convolutional layer 5, criticize Normalization layer 5, convolutional layer 6, batch normalization layer 6, pond layer 6, convolutional layer 7, batch normalization layer 7 is measured to form.Wherein convolution The parameter of layer is spaced according to convolution kernel size, number of active lanes, sliding and expands the sequence of size successively are as follows: and (3,64,1,1), (3, 128,1,1), (3,256,1,1), (3,256,1,1), (3,512,1,1), (3,512,1,1) and (2,512,1,0).Batch is marked The purpose of standardization layer is to adjust the distribution of intermediate result data, without parameter.The parameter of pond layer is slided according to pond window, left and right Dynamic interval, slides up and down interval, and left and right expands size and expands the sequence of size successively up and down are as follows: (2*2,2,2,0,0), (2* 2,2,2,0,0), (1*2,1,2,0,0) and (1*2,1,2,0,0).Image is differentiated before being input to convolutional neural networks Rate is adjusted to 80 × 32, then the two-dimensional matrix size of convolutional neural networks output is 512 × 19, by this two dimensional character matrix sequence The characteristic sequence comprising 19 sizes for 1 × 512 vector is obtained after change, is indicated are as follows: S={ s1,s2,...sL, wherein si∈ R512, R512Indicate that 1 × 512 vector, i=1,2 ..., L, L indicate the length of sequence, size 19.
Step (2) uses attention mechanism to carry out feature to comprising 19 sizes for the characteristic sequence S of 1 × 512 vector It focuses, the result that the set for the feature vector that attention mechanism is exported is focused as feature.The present invention is according to from left to right Spatial order successively identify the character in image, and training dataset Synth of the invention [Jaderberg M, Simonyan K,Vedaldi A,et al.'Reading text in the wild with convolutional neural networks',International Journal of Computer Vision,2016,116,(1),pp.1- 20] character length is up to 24 in, therefore output of the invention is the English words sum number combinatorics on words that length is 24, so algorithm It needs to carry out 24 features to focus, feature each time was focused as a moment.Final output be exactly 24 focus after Feature vector set Vf, Vf={ V1,V2,…VT, T=24.Feature vector V in settIndicate what the t times feature focused As a result, indicating are as follows:
WhereinAnd When representing the t times feature focusing The coefficient of attention mechanism.Element in this coefficient is obtained by the following formula:
Wherein ht-1Indicate the hidden variable of t-1 moment long memory unit in short-term in third step.wT, Wa, UaAnd baIt is note The parameter of meaning power model, is trained by the Back Propagation Algorithm based on stochastic gradient descent.
Step (3) identifies the feature vector after focusing: the present invention utilizes the length in deep neural network in short-term Memory network identifies the feature vector after focusing.According to character string maximum length it is assumed that long memory network in short-term contains There are 24 units, the output of long memory unit in short-term is exactly the character identified, and each character has 37 classes (26 English words Mother, 0~9 totally 10 numbers, end mark "-", end mark indicate character string end of identification.), the long short-term memory list of t moment The input of member is exactly the feature vector V after the t times feature focusest, output is exactly the character class J identifiedt。JtThere are 37 classes Not (26 English alphabets, 0~9 totally ten numbers, end mark "-"), each moment choose the classification of maximum probability as at this time The output for carving long memory unit in short-term, chooses mode such as following formula:
zi=softmax (ht)
Wherein htIndicate the hidden variable of the long memory unit in short-term of t moment, specific explanations are shown in Fig. 3 explanation.After end of identification The output of whole network is exactly the combination of 24 characters.The present invention takes the character string before end mark as final identification knot Fruit.
The input of step (1) is the image comprising English words and number, and output is characterized sequence, and characteristic sequence passes through That crosses step (2) is calculated feature vector required for step (3) inputs, finally by the character of step (3) output identification String.After three steps are integrated into a frame, the parameter to entire model is needed to be trained, if X={ Ii,LiBe Training dataset, IiIndicate i-th of image, LiFor its corresponding label, that is, in image character string true value.So exist Objective function in training process can indicate are as follows:
Wherein W indicates the parameter of entire model, contains convolutional neural networks, attention mechanism and long memory network in short-term Parameter, W*Indicate the optimum value of these parameters.J={ J1,…JTIndicate model identification string as a result, being by 24 The character string of character composition, entire character string identify that correct probability is equal to each character recognition correct probability in character string and multiplies Product, then-logp (J=Li|Ii) form can be expressed as:
Wherein Li,tIndicate t-th of character in the corresponding label of i-th of image, then objective function can indicate are as follows:
After obtaining objective function, the present invention is using the Back Propagation Algorithm based on stochastic gradient descent to network parameter W It is trained, sees document [Shi B, Bai X, Yao C.'An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition',arXiv preprint arXiv:1507.05717,2015]。
If input picture is color image, then above step will be executed after color image gray processing.
Compared with prior art, the beneficial effects of the present invention are:
The present invention combines deep neural network and attention mechanism, can be with when input picture is to deep neural network Immediately arrive at final recognition result.Therefore, the present invention does not need to carry out input picture sliding window operation and to the word in window Symbol is identified.Meanwhile the character string that the present invention exports is final recognition result, after not needing to merge algorithm to identification Character string is integrated.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.
Fig. 1 is overview flow chart of the present invention;
Fig. 2 is the design framework figure of convolutional neural networks in the present invention;
Fig. 3 is the internal structure chart of long memory network unit in short-term;
Fig. 4 is the result example one of present invention identification English words and number;
Fig. 5 is the result example two of present invention identification English words and number.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Overview flow chart such as Fig. 1 institute of the present invention " a kind of natural scene image in English words and digit recognition method " Show, the identification problem of English words and number in natural scene is divided into feature extraction, feature focuses and feature identifies three Step.
Step (1): feature extraction.The present invention carries out feature extraction to input picture using convolutional neural networks, and input is Image under natural scene comprising English character and number, image is adjusted to 80 before being input to convolutional neural networks × 32 size.It is different that three-dimensional feature matrix can only be exported with traditional convolutional neural networks as shown in Figure 2, designed by the present invention Convolutional neural networks can export two-dimensional eigenmatrix.As shown, convolutional neural networks are from top to bottom successively by convolutional layer 1, batch normalization layer 1, pond layer 1, convolutional layer 2, batch normalization layer 2, pond layer 2, convolutional layer 3, batch normalization layer 3, Convolutional layer 4, batch normalization layer 4, pond layer 4, convolutional layer 5, batch normalization layer 5, convolutional layer 6, batch normalization layer 6, pond Change layer 6, convolutional layer 7, batch normalization layer 7 to form.Wherein the parameter of convolutional layer is according to convolution kernel size, number of active lanes, sliding The sequence of interval and expansion size is successively are as follows: and (3,64,1,1), (3,128,1,1), (3,256,1,1), (3,256,1,1), (3, 512,1,1), (3,512,1,1) and (2,512,1,0).The purpose of batch normalization layer is the distribution for adjusting intermediate result data, is not had There is parameter.The parameter of pond layer according to pond window, horizontally slip interval, slides up and down interval, expand size and up and down for left and right Expand size sequence successively are as follows: (2*2,2,2,0,0), (2*2,2,2,0,0), (1*2,1,2,0,0) and (1*2,1,2,0, 0).Output is two-dimensional 512 × 19 eigenmatrix, is obtained after it is serialized according to column direction comprising 19 sizes being 1 × 512 The characteristic sequence of vector indicates are as follows: S={ s1,s2,…sL, wherein si∈R512, i=1,2 ..., L, the length of L expression sequence, Size is 19.
Step (2): feature focuses.The present invention is focused the useful information in characteristic sequence using attention mechanism, Input is the characteristic sequence comprising 19 sizes for 1 × 512 vector that feature extraction phases obtain, and output is feature vector.It calculates Method is successively identified to the character in image according to spatial order from left to right, and character string most greatly enhances in setting image Degree is 24, is focused then algorithm will carry out T=24 feature, and final output is exactly the feature vector after 24 focusing Set Vf, Vf={ V1,V2,...VT}.Feature vector VtIndicate that the t times feature focuses as a result, indicating are as follows:
WhereinAnd Represent note when the t times feature focuses The coefficient for power mechanism of anticipating.Element in this coefficient can be obtained by the following formula:
Wherein wT, Wa, UaAnd baThe parameter of attention model, by the Back Propagation Algorithm based on stochastic gradient descent into Row training.ht-1Indicate the hidden variable of t-1 moment long memory unit in short-term in third step, specific explanations are as shown in Figure 3.
Fig. 3 is the internal structure chart of long memory unit in short-term.The long one kind of memory network as recurrent neural network in short-term Network is improved, the generation of conventional recursive neural network gradient extinction tests in the training process is limited by door operation.Such as figure Showing the length of t moment, memory unit, one long memory unit in short-term are by a memory unit c in short-termtWith three door operations it,ot,ftComposition.Wherein, itIt is input gate, it indicates that how many information content of current time can be input in unit;otIt is defeated It gos out, it indicates this moment unit outwardly exports how many information content;ftIt is to forget door, it is indicated in the reception of current time unit The number of one moment unit output information;Its specific calculating process is as follows:
it=σ (WixVt+Wimht-1+bi)
ft=σ (WfxVt+Wfmht-1+bf)
ot=σ (WoxVt+Womht-1+bo)
ct=ft⊙ct-1+it⊙gt
htFor the hidden variable of the long memory unit in short-term of t moment, σ indicates sigmoid function,⊙ indicates dot product Operation.Wix, Wim, Wfx, Wfm, Wox, Wom, Wgx, Wgm, bi, bf, bo, bgThe parameter for indicating long memory unit in short-term, due in length When memory network in the parameters of all units be shared, so these parameters can also be used as the ginseng of long memory network in short-term Number is trained these parameters using the Back Propagation Algorithm based on stochastic gradient descent in the training stage present invention.
Step (3): character recognition.The present invention identifies that input is 24 to feature vector using long memory network in short-term Feature vector after a focusing, output are the character strings of 24 length.In the present invention, long memory network in short-term includes 24 long Short-term memory unit, that is, the identification process of entire character string have 24 moment, the length of t moment memory unit in short-term it is defeated Entering is exactly feature vector V after the t times feature focusest, output is exactly the character class J identifiedt。JtThere are 37 classifications (26 English alphabet, 0~9 totally ten numbers, end mark "-"), when each moment chooses the classification of maximum probability as this moment length Mode such as following formula is chosen in the output of memory unit:
zi=softmax (ht)
Wherein htIndicate the hidden variable of the long memory unit in short-term of t moment.Whole network after end of identification as shown in Figure 1 Output is exactly the combination of 24 characters, and such as ' a ' ' d ' ' o ' ' n ' ' i ' ' s ' '-' '-' '-' ..., final recognition result is ‘adonis’。
Fig. 4 is the result one of correct identification English words and number of the invention, and true value and predicted value are brutalities.As can be seen that the present invention can identify the biggish image of Character deformation, robustness is higher.
Fig. 5 is present invention identification English words and digital result two, true value recapitaliozes, predicted value are Regapitaliozes, the third letter character wrong for identification.It can be seen that the noise of image is larger, for error Character, human eye, which is substantially all, to be differentiated.

Claims (3)

1. English words and digit recognition method in a kind of natural scene image, include the following steps:
Step (1) carries out feature extraction using image of the convolutional neural networks in deep neural network to input, by convolution Result of the output of neural network as feature extraction;The convolutional neural networks from be input to output successively by: convolutional layer 1, Batch normalization layer 1, pond layer 1, convolutional layer 2, batch normalization layer 2, pond layer 2, convolutional layer 3, batch normalization layer 3, volume Lamination 4, batch normalization layer 4, pond layer 4, convolutional layer 5, batch normalization layer 5, convolutional layer 6, batch normalization layer 6, Chi Hua Layer 6, convolutional layer 7, batch normalization layer 7 form;Wherein the parameter of convolutional layer 1~7 is according to convolution kernel size, number of active lanes, cunning The sequence of dynamic interval and expansion size is successively are as follows: and (3,64,1,1), (3,128,1,1), (3,256,1,1), (3,256,1,1), (3,512,1,1), (3,512,1,1) and (2,512,1,0);The purpose of batch normalization layer 1~7 is adjustment intermediate result data Distribution, without parameter;The parameter of pond layer 1,2,4,6 according to pond window, horizontally slip interval, slides up and down interval, left It is right to expand size and expand the sequence of size successively up and down are as follows: (2*2,2,2,0,0), (2*2,2,2,0,0), (1*2,1,2,0, And (1*2,1,2,0,0) 0);Image needs before being input to convolutional neural networks by the resolution adjustment of image to be 80 × 32, The output of the convolutional neural networks is the two dimensional character matrix that size is 512 × 19;By the two dimensional character matrix sequence The characteristic sequence comprising 19 sizes for 1 × 512 vector is obtained afterwards, is indicated are as follows: S={ s1,s2,...sL, wherein si∈R512, i =1,2 ..., L;L=19 indicates the length of sequence;
Step (2) uses attention mechanism to carry out feature focusing to comprising 19 sizes for the characteristic sequence S of 1 × 512 vector: Successively identify that the character in image, the character length that setting training data is concentrated are up to according to spatial order from left to right 24,24 features are carried out to characteristic sequence S and are focused, feature each time was focused as a moment;Export feature vector Set Vf, Vf={ V1,V2,...VT, T=24;Wherein feature vector VtIndicate the result that the t times feature focuses:And Represent attention when the t times feature focuses The coefficient of mechanism, wherein Wherein ht-1Indicate third step The hidden variable of t-1 moment long memory unit in short-term in rapid;WT, Wa, UaAnd baIt is the parameter of attention model, by based on random The Back Propagation Algorithm of gradient decline is trained;
Step (3), using the length in deep neural network, memory network identifies the feature vector after focusing in short-term: long Short-term memory network contains 24 units, and the input of the length of t moment memory unit in short-term is exactly the spy after the t times feature focuses Levy vector Vt, output is exactly the character class J identifiedt;The character class that each moment chooses maximum probability is long as this moment Mode is chosen in the output of short-term memory unit are as follows:Wherein zi=softmax (ht);The htIndicate t The hidden variable of moment long memory unit in short-term;The output of whole network is exactly the combination of 24 characters after end of identification, takes end Character string before symbol is as final recognition result;The wherein JtThere are 37 classifications, comprising: 26 English alphabets, 0~9 Totally 10 numbers, end mark "-";The end mark indicates character string end of identification.
2. the method as described in claim 1, which is characterized in that the method being trained to the parameter in this method are as follows: set X= {Ii,LiIt is training dataset, IiIndicate i-th of image, LiFor the true value of character string in i-th of image;In training process Objective function are as follows:Wherein W indicates convolutional neural networks, The parameter of attention mechanism and long memory network in short-term, W*Indicate the optimum value of the parameter, Li,tIndicate i-th of image pair T-th of character in the label answered, p (Jt=Li,t|Ii,J1,…Jt-1) it is t-th of word in known preceding t-1 character value The value label L of symboli,tProbability;Network parameter W is trained using the Back Propagation Algorithm based on stochastic gradient descent.
3. the method as described in claim 1, which is characterized in that the image of the input is grayscale image.
CN201710592890.3A 2017-07-19 2017-07-19 English words and digit recognition method in a kind of natural scene image Active CN107368831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710592890.3A CN107368831B (en) 2017-07-19 2017-07-19 English words and digit recognition method in a kind of natural scene image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710592890.3A CN107368831B (en) 2017-07-19 2017-07-19 English words and digit recognition method in a kind of natural scene image

Publications (2)

Publication Number Publication Date
CN107368831A CN107368831A (en) 2017-11-21
CN107368831B true CN107368831B (en) 2019-08-02

Family

ID=60308319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710592890.3A Active CN107368831B (en) 2017-07-19 2017-07-19 English words and digit recognition method in a kind of natural scene image

Country Status (1)

Country Link
CN (1) CN107368831B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229469A (en) * 2017-11-22 2018-06-29 北京市商汤科技开发有限公司 Recognition methods, device, storage medium, program product and the electronic equipment of word
CN108154136B (en) * 2018-01-15 2022-04-05 众安信息技术服务有限公司 Method, apparatus and computer readable medium for recognizing handwriting
CN110321755A (en) * 2018-03-28 2019-10-11 中移(苏州)软件技术有限公司 A kind of recognition methods and device
CN110659641B (en) * 2018-06-28 2023-05-26 杭州海康威视数字技术股份有限公司 Text recognition method and device and electronic equipment
CN109242140A (en) * 2018-07-24 2019-01-18 浙江工业大学 A kind of traffic flow forecasting method based on LSTM_Attention network
CN109117846B (en) * 2018-08-22 2021-11-16 北京旷视科技有限公司 Image processing method and device, electronic equipment and computer readable medium
CN111027555B (en) * 2018-10-09 2023-09-26 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
CN109522600B (en) * 2018-10-16 2020-10-16 浙江大学 Complex equipment residual service life prediction method based on combined deep neural network
CN109446187B (en) * 2018-10-16 2021-01-15 浙江大学 Method for monitoring health state of complex equipment based on attention mechanism and neural network
CN109389091B (en) * 2018-10-22 2022-05-03 重庆邮电大学 Character recognition system and method based on combination of neural network and attention mechanism
CN109726712A (en) * 2018-11-13 2019-05-07 平安科技(深圳)有限公司 Character recognition method, device and storage medium, server
CN111222589B (en) * 2018-11-27 2023-07-18 中国移动通信集团辽宁有限公司 Image text recognition method, device, equipment and computer storage medium
CN111352827A (en) * 2018-12-24 2020-06-30 中移信息技术有限公司 Automatic testing method and device
CN109858420A (en) * 2019-01-24 2019-06-07 国信电子票据平台信息服务有限公司 A kind of bill processing system and processing method
CN109992686A (en) * 2019-02-24 2019-07-09 复旦大学 Based on multi-angle from the image-text retrieval system and method for attention mechanism
CN109977969A (en) * 2019-03-27 2019-07-05 北京经纬恒润科技有限公司 A kind of image-recognizing method and device
CN110135427B (en) * 2019-04-11 2021-07-27 北京百度网讯科技有限公司 Method, apparatus, device and medium for recognizing characters in image
CN110197227B (en) * 2019-05-30 2023-10-27 成都中科艾瑞科技有限公司 Multi-model fusion intelligent instrument reading identification method
CN112101395A (en) * 2019-06-18 2020-12-18 上海高德威智能交通系统有限公司 Image identification method and device
CN110555462A (en) * 2019-08-02 2019-12-10 深圳索信达数据技术有限公司 non-fixed multi-character verification code identification method based on convolutional neural network
CN111027562B (en) * 2019-12-06 2023-07-18 中电健康云科技有限公司 Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism
CN113033249A (en) * 2019-12-09 2021-06-25 中兴通讯股份有限公司 Character recognition method, device, terminal and computer storage medium thereof
CN111242113B (en) * 2020-01-08 2022-07-08 重庆邮电大学 Method for recognizing natural scene text in any direction
CN111523539A (en) * 2020-04-15 2020-08-11 北京三快在线科技有限公司 Character detection method and device
CN111553290A (en) * 2020-04-30 2020-08-18 北京市商汤科技开发有限公司 Text recognition method, device, equipment and storage medium
CN113688822A (en) * 2021-09-07 2021-11-23 河南工业大学 Time sequence attention mechanism scene image identification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654130A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 Recurrent neural network-based complex image character sequence recognition system
CN106022363A (en) * 2016-05-12 2016-10-12 南京大学 Method for recognizing Chinese characters in natural scene
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN106650813A (en) * 2016-12-27 2017-05-10 华南理工大学 Image understanding method based on depth residual error network and LSTM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654130A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 Recurrent neural network-based complex image character sequence recognition system
CN106022363A (en) * 2016-05-12 2016-10-12 南京大学 Method for recognizing Chinese characters in natural scene
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN106650813A (en) * 2016-12-27 2017-05-10 华南理工大学 Image understanding method based on depth residual error network and LSTM

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition;Qiang Guo et al;《https://arxiv.org/abs/1601.01100》;20160106;第1-6页
Real-Time Lexicon-Free Scene Text Localization and Recognition;Lukas Neumann et al;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20160930;第38卷(第9期);第1872-1885页
基于多重卷积神经网络的大模式联机手写文字识别;葛明涛 等;《现代电子技术》;20141015;第37卷(第20期);第19-21,26页

Also Published As

Publication number Publication date
CN107368831A (en) 2017-11-21

Similar Documents

Publication Publication Date Title
CN107368831B (en) English words and digit recognition method in a kind of natural scene image
Bheda et al. Using deep convolutional networks for gesture recognition in american sign language
Shivashankara et al. American sign language recognition system: an optimal approach
CN105138998B (en) Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again
Latif et al. An automatic Arabic sign language recognition system based on deep CNN: an assistive system for the deaf and hard of hearing
Hossain et al. Recognition and solution for handwritten equation using convolutional neural network
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
Talukder et al. Real-time bangla sign language detection with sentence and speech generation
Alom et al. Digit recognition in sign language based on convolutional neural network and support vector machine
CN109508640A (en) A kind of crowd's sentiment analysis method, apparatus and storage medium
Truong et al. Vietnamese handwritten character recognition using convolutional neural network
Giridharan et al. Identification of Tamil ancient characters and information retrieval from temple epigraphy using image zoning
Aksoy et al. Detection of Turkish sign language using deep learning and image processing methods
Inunganbi et al. Recognition of handwritten Meitei Mayek script based on texture feature
Ismail et al. Static hand gesture recognition of Arabic sign language by using deep CNNs
Rawf et al. A comparative technique using 2D CNN and transfer learning to detect and classify Arabic-script-based sign language
Antony et al. Haar features based handwritten character recognition system for Tulu script
Patel et al. Multiresolution technique to handwritten English character recognition using learning rule and Euclidean distance metric
Jindal et al. Sign Language Detection using Convolutional Neural Network (CNN)
Singh et al. A comprehensive survey on Bangla handwritten numeral recognition
Reddy et al. A three-dimensional neural network model for unconstrained handwritten numeral recognition: a new approach
Srininvas et al. A framework to recognize the sign language system for deaf and dumb using mining techniques
Magrina Convolution Neural Network based Ancient Tamil Character Recognition from Epigraphical Inscriptions
Katti et al. Character and Word Level Gesture Recognition of Indian Sign Language
Nadgeri et al. An Image Texture based approach in understanding and classifying Baby Sign Language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant