CN107368831B - English words and digit recognition method in a kind of natural scene image - Google Patents
English words and digit recognition method in a kind of natural scene image Download PDFInfo
- Publication number
- CN107368831B CN107368831B CN201710592890.3A CN201710592890A CN107368831B CN 107368831 B CN107368831 B CN 107368831B CN 201710592890 A CN201710592890 A CN 201710592890A CN 107368831 B CN107368831 B CN 107368831B
- Authority
- CN
- China
- Prior art keywords
- layer
- character
- image
- feature
- short
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The present invention provides the English words and digit recognition method in a kind of natural scene image, the identification problem of English words and number in natural scene is divided into feature extraction, feature focuses and feature identifies three steps, feature extraction is carried out to input picture using convolutional neural networks, attention mechanism is focused the useful information in characteristic sequence, long memory network in short-term identifies feature vector, to which deep neural network and attention mechanism be combined, when input picture is to deep neural network, final recognition result can be immediately arrived at.The present invention is not needed to carry out input picture sliding window operation and be identified to the character in window;The character string that the present invention exports simultaneously is final recognition result, does not need merging algorithm and integrates to the character string after identification.
Description
Technical field
The invention belongs to technical field of character recognition, relates to the use of deep neural network and attention mechanism carries out nature field
English words and digit recognition method in scape image.
Background technique
Text in natural scene often carries very important information, it can be used to describe the interior of the image
Hold.The text information automatically obtained in image can help people more effectively to understand image and store, press to image
The processing such as contracting, retrieval.Relative to natural scene character detecting method, natural scene character recognition method is to having been detected by
Character area is identified.English and number are used as a kind of universal language, occur extensively in the scene of countries in the world, know
Other English words and numerical significance are great.However, the position of natural scene Chinese and English text and number different from Handwritten Digits Recognition
Set, size, font, illumination, visual angle, shape there is variability, and the background of natural scene character is also considerably complicated, institute
With the English words in natural scene, there are many technological difficulties for needing to capture with number identification.
Existing natural scene Text region algorithm be usually all the bottom of from and on algorithm, see document [Neumann L,
Matas J.'Real-time lexicon-free scene text localization and recognition',IEEE
Transactions on Pattern Analysis and Machine Intelligence,2015,38,(9),
Pp.1872-1885], that is, first with sliding window operation and traditional classifier in image English words and number each of
Character is identified, due to not necessarily there is character in window, then after also needing that merging algorithm is recycled to identify these
Character string is integrated.But there are two limitations for this method: 1. identify character using sliding window method and traditional classifier
Accuracy rate is not high;2. character recognition and merging algorithm are to separate training, the error that each generates will be directly delivered to
In final recognition result, cause Text region precision not high.
Summary of the invention
It is an object of the invention to solve these limitations, deep neural network and attention mechanism are combined, and will combine
Model is trained and identifies neural network afterwards as a whole, on the basis of the real operation there are currently no sliding window, gives
One image comprising English words and number directly exports recognition result.
The principle of the present invention is as follows: firstly, extracting using in the widely applied convolutional neural networks of computer vision field
The two dimensional character matrix of input picture, under the action of convolutional neural networks, each column in matrix are represented in input picture
The depth characteristic of corresponding region is serialized to obtain characteristic sequence according to column direction to two dimensional character matrix;Then, note is utilized
Power mechanism of anticipating extracts the information relevant to character in characteristic sequence, filters redundancy, obtains feature vector, so-called attention
Mechanism is exactly to observe things according to the observing pattern of human vision with the mode focused, filters out garbage, is deep learning
In common model;Finally, successively identifying the English in image according to spatial order from left to right using long memory network in short-term
Text and number.
The technical scheme is that English words and digit recognition method in a kind of natural scene image, for defeated
Entering image is the gray level image comprising English words and number, and this method combines deep neural network and attention mechanism, and
By the neural network in conjunction with after, model is trained and identifies as a whole, and real there are currently no the bases of sliding window operation
On, a given image comprising English words and number directly exports recognition result, specifically include the following steps:
Step (1), carry out feature extraction to the image of input: the present invention is using the convolutional Neural in deep neural network
Network carries out feature extraction to input picture, by the output of convolutional neural networks as feature extraction as a result, with traditional volume
Product neural network output three-dimensional feature matrix is different, and the output for the convolutional neural networks that the present invention designs is two dimensional character matrix.
Convolutional neural networks are from output is input to successively by convolutional layer 1, batch normalization layer 1, pond layer 1, convolutional layer 2, batch standard
Change layer 2, pond layer 2, convolutional layer 3, batch normalization layer 3, convolutional layer 4, batch normalization layer 4, pond layer 4, convolutional layer 5, criticize
Normalization layer 5, convolutional layer 6, batch normalization layer 6, pond layer 6, convolutional layer 7, batch normalization layer 7 is measured to form.Wherein convolution
The parameter of layer is spaced according to convolution kernel size, number of active lanes, sliding and expands the sequence of size successively are as follows: and (3,64,1,1), (3,
128,1,1), (3,256,1,1), (3,256,1,1), (3,512,1,1), (3,512,1,1) and (2,512,1,0).Batch is marked
The purpose of standardization layer is to adjust the distribution of intermediate result data, without parameter.The parameter of pond layer is slided according to pond window, left and right
Dynamic interval, slides up and down interval, and left and right expands size and expands the sequence of size successively up and down are as follows: (2*2,2,2,0,0), (2*
2,2,2,0,0), (1*2,1,2,0,0) and (1*2,1,2,0,0).Image is differentiated before being input to convolutional neural networks
Rate is adjusted to 80 × 32, then the two-dimensional matrix size of convolutional neural networks output is 512 × 19, by this two dimensional character matrix sequence
The characteristic sequence comprising 19 sizes for 1 × 512 vector is obtained after change, is indicated are as follows: S={ s1,s2,...sL, wherein si∈
R512, R512Indicate that 1 × 512 vector, i=1,2 ..., L, L indicate the length of sequence, size 19.
Step (2) uses attention mechanism to carry out feature to comprising 19 sizes for the characteristic sequence S of 1 × 512 vector
It focuses, the result that the set for the feature vector that attention mechanism is exported is focused as feature.The present invention is according to from left to right
Spatial order successively identify the character in image, and training dataset Synth of the invention [Jaderberg M,
Simonyan K,Vedaldi A,et al.'Reading text in the wild with convolutional
neural networks',International Journal of Computer Vision,2016,116,(1),pp.1-
20] character length is up to 24 in, therefore output of the invention is the English words sum number combinatorics on words that length is 24, so algorithm
It needs to carry out 24 features to focus, feature each time was focused as a moment.Final output be exactly 24 focus after
Feature vector set Vf, Vf={ V1,V2,…VT, T=24.Feature vector V in settIndicate what the t times feature focused
As a result, indicating are as follows:
WhereinAnd When representing the t times feature focusing
The coefficient of attention mechanism.Element in this coefficient is obtained by the following formula:
Wherein ht-1Indicate the hidden variable of t-1 moment long memory unit in short-term in third step.wT, Wa, UaAnd baIt is note
The parameter of meaning power model, is trained by the Back Propagation Algorithm based on stochastic gradient descent.
Step (3) identifies the feature vector after focusing: the present invention utilizes the length in deep neural network in short-term
Memory network identifies the feature vector after focusing.According to character string maximum length it is assumed that long memory network in short-term contains
There are 24 units, the output of long memory unit in short-term is exactly the character identified, and each character has 37 classes (26 English words
Mother, 0~9 totally 10 numbers, end mark "-", end mark indicate character string end of identification.), the long short-term memory list of t moment
The input of member is exactly the feature vector V after the t times feature focusest, output is exactly the character class J identifiedt。JtThere are 37 classes
Not (26 English alphabets, 0~9 totally ten numbers, end mark "-"), each moment choose the classification of maximum probability as at this time
The output for carving long memory unit in short-term, chooses mode such as following formula:
zi=softmax (ht)
Wherein htIndicate the hidden variable of the long memory unit in short-term of t moment, specific explanations are shown in Fig. 3 explanation.After end of identification
The output of whole network is exactly the combination of 24 characters.The present invention takes the character string before end mark as final identification knot
Fruit.
The input of step (1) is the image comprising English words and number, and output is characterized sequence, and characteristic sequence passes through
That crosses step (2) is calculated feature vector required for step (3) inputs, finally by the character of step (3) output identification
String.After three steps are integrated into a frame, the parameter to entire model is needed to be trained, if X={ Ii,LiBe
Training dataset, IiIndicate i-th of image, LiFor its corresponding label, that is, in image character string true value.So exist
Objective function in training process can indicate are as follows:
Wherein W indicates the parameter of entire model, contains convolutional neural networks, attention mechanism and long memory network in short-term
Parameter, W*Indicate the optimum value of these parameters.J={ J1,…JTIndicate model identification string as a result, being by 24
The character string of character composition, entire character string identify that correct probability is equal to each character recognition correct probability in character string and multiplies
Product, then-logp (J=Li|Ii) form can be expressed as:
Wherein Li,tIndicate t-th of character in the corresponding label of i-th of image, then objective function can indicate are as follows:
After obtaining objective function, the present invention is using the Back Propagation Algorithm based on stochastic gradient descent to network parameter W
It is trained, sees document [Shi B, Bai X, Yao C.'An end-to-end trainable neural network for
image-based sequence recognition and its application to scene text
recognition',arXiv preprint arXiv:1507.05717,2015]。
If input picture is color image, then above step will be executed after color image gray processing.
Compared with prior art, the beneficial effects of the present invention are:
The present invention combines deep neural network and attention mechanism, can be with when input picture is to deep neural network
Immediately arrive at final recognition result.Therefore, the present invention does not need to carry out input picture sliding window operation and to the word in window
Symbol is identified.Meanwhile the character string that the present invention exports is final recognition result, after not needing to merge algorithm to identification
Character string is integrated.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art
To obtain other drawings based on these drawings.
Fig. 1 is overview flow chart of the present invention;
Fig. 2 is the design framework figure of convolutional neural networks in the present invention;
Fig. 3 is the internal structure chart of long memory network unit in short-term;
Fig. 4 is the result example one of present invention identification English words and number;
Fig. 5 is the result example two of present invention identification English words and number.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Overview flow chart such as Fig. 1 institute of the present invention " a kind of natural scene image in English words and digit recognition method "
Show, the identification problem of English words and number in natural scene is divided into feature extraction, feature focuses and feature identifies three
Step.
Step (1): feature extraction.The present invention carries out feature extraction to input picture using convolutional neural networks, and input is
Image under natural scene comprising English character and number, image is adjusted to 80 before being input to convolutional neural networks ×
32 size.It is different that three-dimensional feature matrix can only be exported with traditional convolutional neural networks as shown in Figure 2, designed by the present invention
Convolutional neural networks can export two-dimensional eigenmatrix.As shown, convolutional neural networks are from top to bottom successively by convolutional layer
1, batch normalization layer 1, pond layer 1, convolutional layer 2, batch normalization layer 2, pond layer 2, convolutional layer 3, batch normalization layer 3,
Convolutional layer 4, batch normalization layer 4, pond layer 4, convolutional layer 5, batch normalization layer 5, convolutional layer 6, batch normalization layer 6, pond
Change layer 6, convolutional layer 7, batch normalization layer 7 to form.Wherein the parameter of convolutional layer is according to convolution kernel size, number of active lanes, sliding
The sequence of interval and expansion size is successively are as follows: and (3,64,1,1), (3,128,1,1), (3,256,1,1), (3,256,1,1), (3,
512,1,1), (3,512,1,1) and (2,512,1,0).The purpose of batch normalization layer is the distribution for adjusting intermediate result data, is not had
There is parameter.The parameter of pond layer according to pond window, horizontally slip interval, slides up and down interval, expand size and up and down for left and right
Expand size sequence successively are as follows: (2*2,2,2,0,0), (2*2,2,2,0,0), (1*2,1,2,0,0) and (1*2,1,2,0,
0).Output is two-dimensional 512 × 19 eigenmatrix, is obtained after it is serialized according to column direction comprising 19 sizes being 1 × 512
The characteristic sequence of vector indicates are as follows: S={ s1,s2,…sL, wherein si∈R512, i=1,2 ..., L, the length of L expression sequence,
Size is 19.
Step (2): feature focuses.The present invention is focused the useful information in characteristic sequence using attention mechanism,
Input is the characteristic sequence comprising 19 sizes for 1 × 512 vector that feature extraction phases obtain, and output is feature vector.It calculates
Method is successively identified to the character in image according to spatial order from left to right, and character string most greatly enhances in setting image
Degree is 24, is focused then algorithm will carry out T=24 feature, and final output is exactly the feature vector after 24 focusing
Set Vf, Vf={ V1,V2,...VT}.Feature vector VtIndicate that the t times feature focuses as a result, indicating are as follows:
WhereinAnd Represent note when the t times feature focuses
The coefficient for power mechanism of anticipating.Element in this coefficient can be obtained by the following formula:
Wherein wT, Wa, UaAnd baThe parameter of attention model, by the Back Propagation Algorithm based on stochastic gradient descent into
Row training.ht-1Indicate the hidden variable of t-1 moment long memory unit in short-term in third step, specific explanations are as shown in Figure 3.
Fig. 3 is the internal structure chart of long memory unit in short-term.The long one kind of memory network as recurrent neural network in short-term
Network is improved, the generation of conventional recursive neural network gradient extinction tests in the training process is limited by door operation.Such as figure
Showing the length of t moment, memory unit, one long memory unit in short-term are by a memory unit c in short-termtWith three door operations
it,ot,ftComposition.Wherein, itIt is input gate, it indicates that how many information content of current time can be input in unit;otIt is defeated
It gos out, it indicates this moment unit outwardly exports how many information content;ftIt is to forget door, it is indicated in the reception of current time unit
The number of one moment unit output information;Its specific calculating process is as follows:
it=σ (WixVt+Wimht-1+bi)
ft=σ (WfxVt+Wfmht-1+bf)
ot=σ (WoxVt+Womht-1+bo)
ct=ft⊙ct-1+it⊙gt
htFor the hidden variable of the long memory unit in short-term of t moment, σ indicates sigmoid function,⊙ indicates dot product
Operation.Wix, Wim, Wfx, Wfm, Wox, Wom, Wgx, Wgm, bi, bf, bo, bgThe parameter for indicating long memory unit in short-term, due in length
When memory network in the parameters of all units be shared, so these parameters can also be used as the ginseng of long memory network in short-term
Number is trained these parameters using the Back Propagation Algorithm based on stochastic gradient descent in the training stage present invention.
Step (3): character recognition.The present invention identifies that input is 24 to feature vector using long memory network in short-term
Feature vector after a focusing, output are the character strings of 24 length.In the present invention, long memory network in short-term includes 24 long
Short-term memory unit, that is, the identification process of entire character string have 24 moment, the length of t moment memory unit in short-term it is defeated
Entering is exactly feature vector V after the t times feature focusest, output is exactly the character class J identifiedt。JtThere are 37 classifications (26
English alphabet, 0~9 totally ten numbers, end mark "-"), when each moment chooses the classification of maximum probability as this moment length
Mode such as following formula is chosen in the output of memory unit:
zi=softmax (ht)
Wherein htIndicate the hidden variable of the long memory unit in short-term of t moment.Whole network after end of identification as shown in Figure 1
Output is exactly the combination of 24 characters, and such as ' a ' ' d ' ' o ' ' n ' ' i ' ' s ' '-' '-' '-' ..., final recognition result is
‘adonis’。
Fig. 4 is the result one of correct identification English words and number of the invention, and true value and predicted value are
brutalities.As can be seen that the present invention can identify the biggish image of Character deformation, robustness is higher.
Fig. 5 is present invention identification English words and digital result two, true value recapitaliozes, predicted value are
Regapitaliozes, the third letter character wrong for identification.It can be seen that the noise of image is larger, for error
Character, human eye, which is substantially all, to be differentiated.
Claims (3)
1. English words and digit recognition method in a kind of natural scene image, include the following steps:
Step (1) carries out feature extraction using image of the convolutional neural networks in deep neural network to input, by convolution
Result of the output of neural network as feature extraction;The convolutional neural networks from be input to output successively by: convolutional layer 1,
Batch normalization layer 1, pond layer 1, convolutional layer 2, batch normalization layer 2, pond layer 2, convolutional layer 3, batch normalization layer 3, volume
Lamination 4, batch normalization layer 4, pond layer 4, convolutional layer 5, batch normalization layer 5, convolutional layer 6, batch normalization layer 6, Chi Hua
Layer 6, convolutional layer 7, batch normalization layer 7 form;Wherein the parameter of convolutional layer 1~7 is according to convolution kernel size, number of active lanes, cunning
The sequence of dynamic interval and expansion size is successively are as follows: and (3,64,1,1), (3,128,1,1), (3,256,1,1), (3,256,1,1),
(3,512,1,1), (3,512,1,1) and (2,512,1,0);The purpose of batch normalization layer 1~7 is adjustment intermediate result data
Distribution, without parameter;The parameter of pond layer 1,2,4,6 according to pond window, horizontally slip interval, slides up and down interval, left
It is right to expand size and expand the sequence of size successively up and down are as follows: (2*2,2,2,0,0), (2*2,2,2,0,0), (1*2,1,2,0,
And (1*2,1,2,0,0) 0);Image needs before being input to convolutional neural networks by the resolution adjustment of image to be 80 × 32,
The output of the convolutional neural networks is the two dimensional character matrix that size is 512 × 19;By the two dimensional character matrix sequence
The characteristic sequence comprising 19 sizes for 1 × 512 vector is obtained afterwards, is indicated are as follows: S={ s1,s2,...sL, wherein si∈R512, i
=1,2 ..., L;L=19 indicates the length of sequence;
Step (2) uses attention mechanism to carry out feature focusing to comprising 19 sizes for the characteristic sequence S of 1 × 512 vector:
Successively identify that the character in image, the character length that setting training data is concentrated are up to according to spatial order from left to right
24,24 features are carried out to characteristic sequence S and are focused, feature each time was focused as a moment;Export feature vector
Set Vf, Vf={ V1,V2,...VT, T=24;Wherein feature vector VtIndicate the result that the t times feature focuses:And Represent attention when the t times feature focuses
The coefficient of mechanism, wherein Wherein ht-1Indicate third step
The hidden variable of t-1 moment long memory unit in short-term in rapid;WT, Wa, UaAnd baIt is the parameter of attention model, by based on random
The Back Propagation Algorithm of gradient decline is trained;
Step (3), using the length in deep neural network, memory network identifies the feature vector after focusing in short-term: long
Short-term memory network contains 24 units, and the input of the length of t moment memory unit in short-term is exactly the spy after the t times feature focuses
Levy vector Vt, output is exactly the character class J identifiedt;The character class that each moment chooses maximum probability is long as this moment
Mode is chosen in the output of short-term memory unit are as follows:Wherein zi=softmax (ht);The htIndicate t
The hidden variable of moment long memory unit in short-term;The output of whole network is exactly the combination of 24 characters after end of identification, takes end
Character string before symbol is as final recognition result;The wherein JtThere are 37 classifications, comprising: 26 English alphabets, 0~9
Totally 10 numbers, end mark "-";The end mark indicates character string end of identification.
2. the method as described in claim 1, which is characterized in that the method being trained to the parameter in this method are as follows: set X=
{Ii,LiIt is training dataset, IiIndicate i-th of image, LiFor the true value of character string in i-th of image;In training process
Objective function are as follows:Wherein W indicates convolutional neural networks,
The parameter of attention mechanism and long memory network in short-term, W*Indicate the optimum value of the parameter, Li,tIndicate i-th of image pair
T-th of character in the label answered, p (Jt=Li,t|Ii,J1,…Jt-1) it is t-th of word in known preceding t-1 character value
The value label L of symboli,tProbability;Network parameter W is trained using the Back Propagation Algorithm based on stochastic gradient descent.
3. the method as described in claim 1, which is characterized in that the image of the input is grayscale image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710592890.3A CN107368831B (en) | 2017-07-19 | 2017-07-19 | English words and digit recognition method in a kind of natural scene image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710592890.3A CN107368831B (en) | 2017-07-19 | 2017-07-19 | English words and digit recognition method in a kind of natural scene image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107368831A CN107368831A (en) | 2017-11-21 |
CN107368831B true CN107368831B (en) | 2019-08-02 |
Family
ID=60308319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710592890.3A Active CN107368831B (en) | 2017-07-19 | 2017-07-19 | English words and digit recognition method in a kind of natural scene image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107368831B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229469A (en) * | 2017-11-22 | 2018-06-29 | 北京市商汤科技开发有限公司 | Recognition methods, device, storage medium, program product and the electronic equipment of word |
CN108154136B (en) * | 2018-01-15 | 2022-04-05 | 众安信息技术服务有限公司 | Method, apparatus and computer readable medium for recognizing handwriting |
CN110321755A (en) * | 2018-03-28 | 2019-10-11 | 中移(苏州)软件技术有限公司 | A kind of recognition methods and device |
CN110659641B (en) * | 2018-06-28 | 2023-05-26 | 杭州海康威视数字技术股份有限公司 | Text recognition method and device and electronic equipment |
CN109242140A (en) * | 2018-07-24 | 2019-01-18 | 浙江工业大学 | A kind of traffic flow forecasting method based on LSTM_Attention network |
CN109117846B (en) * | 2018-08-22 | 2021-11-16 | 北京旷视科技有限公司 | Image processing method and device, electronic equipment and computer readable medium |
CN111027555B (en) * | 2018-10-09 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | License plate recognition method and device and electronic equipment |
CN109522600B (en) * | 2018-10-16 | 2020-10-16 | 浙江大学 | Complex equipment residual service life prediction method based on combined deep neural network |
CN109446187B (en) * | 2018-10-16 | 2021-01-15 | 浙江大学 | Method for monitoring health state of complex equipment based on attention mechanism and neural network |
CN109389091B (en) * | 2018-10-22 | 2022-05-03 | 重庆邮电大学 | Character recognition system and method based on combination of neural network and attention mechanism |
CN109726712A (en) * | 2018-11-13 | 2019-05-07 | 平安科技(深圳)有限公司 | Character recognition method, device and storage medium, server |
CN111222589B (en) * | 2018-11-27 | 2023-07-18 | 中国移动通信集团辽宁有限公司 | Image text recognition method, device, equipment and computer storage medium |
CN111352827A (en) * | 2018-12-24 | 2020-06-30 | 中移信息技术有限公司 | Automatic testing method and device |
CN109858420A (en) * | 2019-01-24 | 2019-06-07 | 国信电子票据平台信息服务有限公司 | A kind of bill processing system and processing method |
CN109992686A (en) * | 2019-02-24 | 2019-07-09 | 复旦大学 | Based on multi-angle from the image-text retrieval system and method for attention mechanism |
CN109977969A (en) * | 2019-03-27 | 2019-07-05 | 北京经纬恒润科技有限公司 | A kind of image-recognizing method and device |
CN110135427B (en) * | 2019-04-11 | 2021-07-27 | 北京百度网讯科技有限公司 | Method, apparatus, device and medium for recognizing characters in image |
CN110197227B (en) * | 2019-05-30 | 2023-10-27 | 成都中科艾瑞科技有限公司 | Multi-model fusion intelligent instrument reading identification method |
CN112101395A (en) * | 2019-06-18 | 2020-12-18 | 上海高德威智能交通系统有限公司 | Image identification method and device |
CN110555462A (en) * | 2019-08-02 | 2019-12-10 | 深圳索信达数据技术有限公司 | non-fixed multi-character verification code identification method based on convolutional neural network |
CN111027562B (en) * | 2019-12-06 | 2023-07-18 | 中电健康云科技有限公司 | Optical character recognition method based on multiscale CNN and RNN combined with attention mechanism |
CN113033249A (en) * | 2019-12-09 | 2021-06-25 | 中兴通讯股份有限公司 | Character recognition method, device, terminal and computer storage medium thereof |
CN111242113B (en) * | 2020-01-08 | 2022-07-08 | 重庆邮电大学 | Method for recognizing natural scene text in any direction |
CN111523539A (en) * | 2020-04-15 | 2020-08-11 | 北京三快在线科技有限公司 | Character detection method and device |
CN111553290A (en) * | 2020-04-30 | 2020-08-18 | 北京市商汤科技开发有限公司 | Text recognition method, device, equipment and storage medium |
CN113688822A (en) * | 2021-09-07 | 2021-11-23 | 河南工业大学 | Time sequence attention mechanism scene image identification method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105654130A (en) * | 2015-12-30 | 2016-06-08 | 成都数联铭品科技有限公司 | Recurrent neural network-based complex image character sequence recognition system |
CN106022363A (en) * | 2016-05-12 | 2016-10-12 | 南京大学 | Method for recognizing Chinese characters in natural scene |
CN106157319A (en) * | 2016-07-28 | 2016-11-23 | 哈尔滨工业大学 | The significance detection method that region based on convolutional neural networks and Pixel-level merge |
CN106650813A (en) * | 2016-12-27 | 2017-05-10 | 华南理工大学 | Image understanding method based on depth residual error network and LSTM |
-
2017
- 2017-07-19 CN CN201710592890.3A patent/CN107368831B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105654130A (en) * | 2015-12-30 | 2016-06-08 | 成都数联铭品科技有限公司 | Recurrent neural network-based complex image character sequence recognition system |
CN106022363A (en) * | 2016-05-12 | 2016-10-12 | 南京大学 | Method for recognizing Chinese characters in natural scene |
CN106157319A (en) * | 2016-07-28 | 2016-11-23 | 哈尔滨工业大学 | The significance detection method that region based on convolutional neural networks and Pixel-level merge |
CN106650813A (en) * | 2016-12-27 | 2017-05-10 | 华南理工大学 | Image understanding method based on depth residual error network and LSTM |
Non-Patent Citations (3)
Title |
---|
Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition;Qiang Guo et al;《https://arxiv.org/abs/1601.01100》;20160106;第1-6页 |
Real-Time Lexicon-Free Scene Text Localization and Recognition;Lukas Neumann et al;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20160930;第38卷(第9期);第1872-1885页 |
基于多重卷积神经网络的大模式联机手写文字识别;葛明涛 等;《现代电子技术》;20141015;第37卷(第20期);第19-21,26页 |
Also Published As
Publication number | Publication date |
---|---|
CN107368831A (en) | 2017-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107368831B (en) | English words and digit recognition method in a kind of natural scene image | |
Bheda et al. | Using deep convolutional networks for gesture recognition in american sign language | |
Shivashankara et al. | American sign language recognition system: an optimal approach | |
CN105138998B (en) | Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again | |
Latif et al. | An automatic Arabic sign language recognition system based on deep CNN: an assistive system for the deaf and hard of hearing | |
Hossain et al. | Recognition and solution for handwritten equation using convolutional neural network | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
Talukder et al. | Real-time bangla sign language detection with sentence and speech generation | |
Alom et al. | Digit recognition in sign language based on convolutional neural network and support vector machine | |
CN109508640A (en) | A kind of crowd's sentiment analysis method, apparatus and storage medium | |
Truong et al. | Vietnamese handwritten character recognition using convolutional neural network | |
Giridharan et al. | Identification of Tamil ancient characters and information retrieval from temple epigraphy using image zoning | |
Aksoy et al. | Detection of Turkish sign language using deep learning and image processing methods | |
Inunganbi et al. | Recognition of handwritten Meitei Mayek script based on texture feature | |
Ismail et al. | Static hand gesture recognition of Arabic sign language by using deep CNNs | |
Rawf et al. | A comparative technique using 2D CNN and transfer learning to detect and classify Arabic-script-based sign language | |
Antony et al. | Haar features based handwritten character recognition system for Tulu script | |
Patel et al. | Multiresolution technique to handwritten English character recognition using learning rule and Euclidean distance metric | |
Jindal et al. | Sign Language Detection using Convolutional Neural Network (CNN) | |
Singh et al. | A comprehensive survey on Bangla handwritten numeral recognition | |
Reddy et al. | A three-dimensional neural network model for unconstrained handwritten numeral recognition: a new approach | |
Srininvas et al. | A framework to recognize the sign language system for deaf and dumb using mining techniques | |
Magrina | Convolution Neural Network based Ancient Tamil Character Recognition from Epigraphical Inscriptions | |
Katti et al. | Character and Word Level Gesture Recognition of Indian Sign Language | |
Nadgeri et al. | An Image Texture based approach in understanding and classifying Baby Sign Language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |