CN108647681A - A kind of English text detection method with text orientation correction - Google Patents
A kind of English text detection method with text orientation correction Download PDFInfo
- Publication number
- CN108647681A CN108647681A CN201810429149.XA CN201810429149A CN108647681A CN 108647681 A CN108647681 A CN 108647681A CN 201810429149 A CN201810429149 A CN 201810429149A CN 108647681 A CN108647681 A CN 108647681A
- Authority
- CN
- China
- Prior art keywords
- text
- filed
- preliminary
- text filed
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention belongs to technical field of image processing, specially a kind of English text detection method with text orientation correction;The method includes:Respectively to the carry out maximum stable extremal region detection in each channel of English text image, obtain candidate text filed;The grader based on convolutional neural networks model is established, the candidate of filter false is text filed, obtains preliminary text filed;Using the double-deck text packets algorithm by the preliminary text filed grouping;By the preliminary text filed carry out correction for direction after grouping, to obtain correction text;The present invention uses a kind of multichannel MSER models of enhancing:It is finer text filed to obtain;Parallel SPP CNN graders are introduced to better discriminate between text filed and non-textual region, the image of arbitrary size can be handled, and can be in multiple dimensioned lower extraction pond feature, so as to understand more features by the multilayer spatial information of source images;The present invention can handle the scene text being slightly slanted.
Description
Technical field
The invention belongs to technical field of image processing, specially a kind of English text detection side with text orientation correction
Method.
Background technology
Text in natural scene image has accurate, abundant information, this turning over for image analysis, based on image
It translates, picture search etc. is of great significance.In past 20 years, researcher proposes some and is examined in natural scene image
The method for surveying text.There are many based on content multimedia understand application, as automatic vision classification, image retrieval, assisting navigation,
Multilingual translation, Object identifying and the application to satisfy the needs of consumers.
Scene text detects the critical issue faced:(1) text in file and picture has regular font, similar face
Color, evenly sized and evenly distributed, even if in Same Scene, the text in natural scene may also have different words
Body, color, ratio and direction.(2) background of natural scene image may be extremely complex.Mark, fence, brick and meadow are difficult to
It is distinguished with real text, therefore be easy to cause and obscure and mistake.(3) other disturbing factors in scene character image.Such as
Non-uniform illumination obscures, translucent effects etc..
Researcher proposes many methods to detect the text in natural scene image, and there are two types of main methods.
Text is considered as a kind of texture of specific type, and uses their texture properties by the method based on texture, such as office
Portion's intensity, filter response and wavelet coefficient distinguish the text filed and non-textual region of image.The meter of these usual methods
Calculation amount is very big, because to scan all positions and scale.In addition, these methods mainly handle lateral text, to rotating and scaling
It is very sensitive;
Text is considered as connection component by component-oriented approach, first by various methods (such as color cluster or extreme area
Extract in domain) text is extracted, the grader then trained using the regular of manual designs or automatically is filtered non-textual component.
It is more effective generally, based on the method for component, because component count to be processed is relatively fewer.In addition, these methods are to rotation, contracting
It puts and font is all insensitive.The conventional method of detection candidate text filed (Candidate Text Region, be denoted as CTR) has
Maximum stable extremal region (Maximally Stable Extremal Regions, be denoted as MSER), this method is for image
It is affine variation have very strong robustness, can efficiently extract it is text filed in image, after there is scholar to improve MSER's
Extraction algorithm makes the time complexity of algorithm reach linear session.
These methods are according to the rule or feature for distinguishing text filed and non-textual region, thus by text filed and non-text
One's respective area is distinguished, although these methods are capable of detecting when text, lacks the correction to English text, and to tilting text
Differentiation effect and bad, the text identified can have serious separation because of the inclination of word.
Invention content
In view of this, the present invention proposes a kind of English text detection method with text orientation correction, it can be effective
Identify text, and will identify that tilt text be corrected, specifically include following steps:
S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from figure
Extraction MSER is as text candidates as in;It obtains candidate text filed;
S2, the grader based on convolutional neural networks model is established, extracts candidate text filed feature;It utilizes
Softmax functions text filed are divided into text class region and non-textual class region according to candidate text filed feature, by candidate;
Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English text;
S3, using the double-deck text packets algorithm by the preliminary text filed grouping;
S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text.
Further, the channel includes:Red channel, green channel, blue channel, tone channel, saturation degree channel,
Lightness channel and grey channel.
Further, the grader of the foundation based on convolutional neural networks model extracts candidate text filed spy
Sign includes:Candidate text filed fisrt feature is obtained according to five layer architectures in grader and waited by cross-layer respectively
The second feature of selection one's respective area, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, volume Two
Lamination, pyramid pond layer and full articulamentum;Cross-layer indicates the first convolutional layer to full articulamentum.
Further, candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture;It will
Filtered candidate is text filed for the first time carries out maximum pond in the second layer architecture;Utilize the volume Two in third layer architecture
Product core, to candidate's second of filtering of text filed progress behind maximum pond;It is text filed to second of filtered candidate,
It carries out utilizing pyramid pond in 4th layer architecture;It is text filed to the candidate behind pyramid pond to be carried out in layer 5 framework
Full connection, to obtain candidate text filed fisrt feature.
Further, using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification;
It is connected entirely according to the feature added manually by filtered candidate is text filed, to obtain candidate text filed second
Feature.
Further, the feature added manually includes:Depth-width ratio, compactness, stroke width area ratio, local contrast
Degree and boundary key point.
Further, the calculation formula of the local contrast is:
Wherein, lc indicates local contrast;RiIndicate the ith pixel of red channel;GiIndicate i-th of green channel
Pixel;BiIndicate the ith pixel of blue channel;N indicates that the pixel total number in the regions MSER, k indicate of boundary key point
Number.
Further, the acquisition modes of the boundary key point are:
Build binary picture;The all pixels of binary picture described in iteration;Calculate profile point;It is drawn using Doug
Si-Pu Ke compression algorithm profile points obtain boundary key point and specifically include:
The gray value for belonging to pixel in maximum stable extremal region is set as 255;Maximum stable extremal region will be belonged to
Outside, and belong to the gray value of pixel in the minimum enclosed rectangle region of maximum stable extremal region and be set as 0;If pixel
The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1) are there are one value in p (x, y-1)
0, then pixel p (x, y) belong to profile point;Using Douglas-Pu Ke compression algorithm profile points, pass through compressed remaining profile
Point is boundary key point.
Further, described to include by the preliminary text filed grouping using the double-deck text packets algorithm:It will preliminary text
One's respective area carries out vertical grouping and horizontal grouping respectively;
The vertical grouping specifically includes as follows:
Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255n;Obtain (n+1)th preliminary text area
The maximum Y axis coordinate t that pixel is 255 in domainn+1;Obtain (n+1)th preliminary text filed height hn+1;
Computed altitude is poorIf difference in height dn,n+1More than height threshold;Then by two preliminary texts
Region division is identical class, that is, belongs to one text row;If difference in height dn,n+1Less than or equal to height threshold, then at the beginning of two
It is not same class to walk text filed, (n+1)th it is preliminary it is text filed be considered as new class, and new line of text is in Y direction quilt
It splits;
The level is grouped specific steps:
Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis;Range difference Δ d packets
It includes:The distance between letter d in same word1, the distance between word d2;
According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value;
Obtain the ratio of pitch and intervalIf the ratio d of pitch and intervalhLess than width
Threshold value, then the two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio of pitch and interval
Value dhMore than or equal to width threshold value, the two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two regions are not belonging to
The same word, by the preliminary text filed beginning as a new word of the latter.
Further, the preliminary text filed progress correction for direction by after grouping includes:
S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula;Setting is just
Initial value i=1, α=- 30 °;
S402, by Model Matching process, the filtering of the group box that introduces errors into;Obtain i-th of correction text area undetermined
Domain;
S403, work as i<When 6 ,+10 ° of i=i+1, α=α, return to step S401;As i=6, by the 1st correction text undetermined
This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.
Further, the coordinate rotation formula includes:
X'=xcos θ+ysin θ
Y'=ycos θ-xsin θ
Wherein, x indicates the abscissa of pixel;Y indicates the ordinate of pixel;θ indicates rotation angle threshold value;X' is indicated
The abscissa of pixel after rotation;Y' indicates the ordinate of pixel after rotation;
The group box includes:Tilt group box and long interval group box;The inclination group box includes a letter;
The letter that the long interval group box includes is located at both ends.
Beneficial effects of the present invention:The invention has the advantages that:Using a kind of multichannel MSER models of enhancing:From R,
G, MSER is detected in B, H, S, V and grey channel, it is text filed with the candidate for obtaining finer.Introduce parallel SPP-CNN
(Spatial pyramid pooling (SPP)-Convolutional Neural Networks (CNN), pyramid pond-
Convolutional neural networks) grader better discriminates between text filed and non-textual region, and which can handle arbitrary size
Image, and can be in multiple dimensioned lower extraction pond feature, so as to pass through the multilayer space of source images (English text image)
Information understands more features;By Model Matching process, the group box introduced errors into filters;Inclined field can be handled
Scape text realizes the correction to English text.
Description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the cross-layer SPP-CNN algorithm Organization Charts that the present invention uses;
Fig. 3 is the schematic diagram of SPP working methods in the prior art;
Fig. 4 is the text packets system assumption diagram of the present invention;
Fig. 5 is the constraint schematic diagram of the line of text of the present invention;
Fig. 6 obtains for the present invention without correction for direction preliminary text filed;
Fig. 7 is that the present invention is text filed by the correction of a final proof of correction for direction;
Fig. 8 is the direction rotating model of the present invention;
Fig. 9 is the Matching Model of group box of the present invention;
Figure 10 present invention is the testing result figure of different rotary;
Figure 11 present invention is the case diagram of testing result.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with attached drawing to of the invention real
The technical solution applied in example is clearly and completely described, it is clear that described embodiment is only that a present invention part is implemented
Example, instead of all the embodiments.
The present invention provides a kind of English text detection methods with text orientation correction, as shown in Figure 1, it includes such as
Lower step:
S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from figure
Extraction MSER is as text candidates as in;It obtains candidate text filed;
S2, the classification based on convolutional neural networks (Convolutional Neural Networks, CNN) model is established
Device extracts candidate text filed feature, using softmax functions according to candidate text filed feature, by candidate text
Region is divided into text class region and non-textual class region;Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English
Text;
S3, using the double-deck text packets algorithm by the preliminary text filed grouping;
S4, the preliminary text filed carry out correction for direction after grouping is obtained into school to realize the correction of English text
Positive text, the correction text are the English text after correcting.
Preferably, five layer architectures of the CNN models that the present invention uses are as shown in Figure 2:
First layer architecture use size for 7 × 7 × 5 the first convolution kernel;It for 7 width be 7 depth is 5 that expression, which uses length,
Convolution kernel;
Second layer architecture uses 5 × 5 × 5 maximum pond;It indicates to use length for maximum pond that 5 width are that 5 depth are 5
Change;
Second convolution kernel of 5 × 3 × 5 convolution of third layer framework applications;It for 5 width be 3 depth is 5 that expression, which uses length,
Convolution kernel;
4th layer architecture uses the ponds SPP;Fig. 3 is the schematic diagram of SPP working methods, and 3 are respectively adopted to same image
× 3 ponds (that is to say that length is 3 pond of 3 width) are divided into 9 blocks, and 2 × 2 ponds are divided into 4 blocks and 1 × 1 pondization point
At 1 block, each piece of maximum value is calculated separately, to obtain output neuron, then the image of arbitrary size is converted into one
14 dimensional features of a fixed size.It is understood that the present invention can be increased pyramidal with arbitrarily devised different dimensions size
The number of plies or the size for changing grid division.
Layer 5 framework uses full articulamentum;It specifically includes:
Candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture;First time is filtered
The text filed maximum pond of the progress in the second layer architecture of candidate afterwards;Using the second convolution kernel in third layer architecture, to most
Candidate's second of filtering of text filed progress after great Chiization;It is text filed to second of filtered candidate, in the 4th layer frame
It carries out utilizing pyramid pond in structure;It is text filed to the candidate behind pyramid pond to be connected entirely in layer 5 framework,
To extract candidate text filed fisrt feature;
Using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification;After filtering
Candidate it is text filed connected entirely according to the feature added manually, to extract the text filed second feature of candidate.
Preferably, the feature of manual designs is embedded into entire CNN, i.e. cross-layer.Cross-layer is only in the first layer architecture and
It works in five layer architectures, the feature used in cross-layer namely the feature added manually include:
Depth-width ratioCompactnessStroke width area ratioLocal contrast lc and boundary key point k.
Wherein w and h respectively represents the width and height (pixel number) of the minimum enclosed rectangle of maximum stable extremal region;a
Indicate the area (all pixels point number in region) of the minimum enclosed rectangle of maximum stable extremal region;P indicates maximum stable
The number of the minimum enclosed rectangle boundary point of extremal region indicates p with boundary key point k in the present invention.
Local contrast can be obtained using following equation:
Wherein, lc indicates local contrast;RiIndicate the ith pixel of red channel;GiIndicate i-th of green channel
Pixel;BiIndicate the ith pixel of blue channel;N indicates that the pixel total number in the regions MSER, k indicate of boundary key point
Number.
By connecting boundary key point in sequence, it can approximatively restore original area, that is to say and obtain
It gets preliminary text filed.
The calculating process of k:
Build binary picture;The all pixels of binary picture described in iteration;Calculate profile point;It is drawn using Doug
Si-Pu Ke compression algorithm profile points, the profile point after overcompression are boundary key point;It specifically includes:
The gray value for belonging to pixel in maximum stable extremal region is set as 255;Maximum stable extremal region will be belonged to
Outside, and belong to the gray value of pixel in the minimum enclosed rectangle region of maximum stable extremal region and be set as 0;If pixel
The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1) are there are one value in p (x, y-1)
0, then pixel p (x, y) belong to profile point;It is compressed using Douglas-Pu Ke algorithms (Douglas-Peucker algorithm)
Profile point is boundary key point by compressed remaining profile point.
Alternatively, the classification of final feature is obtained using SoftMax classification functions;
After the preliminary text filed grouping is carried out text packets using the double-deck text packets algorithm, then carry out low dip
Correction for direction, as shown in Figure 4.It is specifically divided into three parts:Vertical grouping, horizontal grouping and correction for direction:
Vertical grouping key step is as follows:
Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255n;Obtain (n+1)th preliminary text area
The maximum Y axis coordinate t that pixel is 255 in domainn+1;Obtain (n+1)th preliminary text filed height hn+1;As shown in Figure 5;
Computed altitude is poorIf difference in height dn,n+1More than height threshold;Then by two preliminary texts
One's respective area is divided into identical class, that is, belongs to one text row;If difference in height dn,n+1Less than or equal to height threshold, then two
It is preliminary it is text filed be not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and the new line of text is existed
Y direction is split;
Wherein, height threshold of the invention takes 0.62;
Level grouping key step is as follows:
Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis;Range difference Δ d packets
It includes:The distance between letter d in same word1The distance between word d2;
According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value;
Obtain the ratio of pitch and intervalIf the ratio d of pitch and intervalhLess than width
Threshold value, two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and intervalhGreatly
In or be equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two it is adjacent it is preliminary it is text filed not
Belong to the same word, by the preliminary text filed beginning as new word of the latter.
Wherein, coefficientBe according to use from 2013 training sets of ICDAR include 229 pictures and 1226 words
What experiment obtained;
The width threshold value of the present invention takes 2.33.
Steps are as follows for low dip correction for direction:
Fig. 6 is illustrated to tentatively text filed, it can be seen that word is seriously detached because of inclination, and " ne1Wor " is recognized
To be the word of same a line;According to experiment, show that the word within 10 degree of slight inclination can be correctly grouped, therefore use
The strategy of rotatable coordinate axis, has obtained the correction of a final proof text as shown in Fig. 7.
Due to the rotation of reference axis, group box " wordline1 " is correctly grouped, but the group box that mistake introduces
" wordline2 " is not corrected correctly, so carrying out innovatory algorithm using rotation convergence strategy:
The text filed carry out correction for direction by after grouping includes:
S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula;Setting is just
Initial value i=1, α=- 30 °;
S402, by Model Matching process, the filtering of the group box that introduces errors into;Obtain i-th of correction text area undetermined
Domain;
S403, work as i<When 6 ,+10 ° of i=i+1, α=α;Return to step S401, as i=6, by the 1st correction text undetermined
This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.
Specifically,
It respectively will be preliminary text filed with tens degree clockwise or counterclockwise after grouping using coordinate rotation formula;
Set initial value i=1;
In the present invention, respectively to be rotated both clockwise and counterclockwise 10,20,30 degree, as shown in Figure 8;
By Model Matching process, the group box introduced errors into filters;It is text filed to obtain i-th of correction undetermined;
By i-th of correction text undetermined with tens degree clockwise or counterclockwise, introduced errors by Model Matching
Group box filtering, obtain i+1 correction text undetermined;
Work as i<When 6, i=i+1;Return to step S401, as i=6, by the 1st correction text undetermined to the 6th school undetermined
Positive text overlay, to obtain correction of a final proof text;As shown in Figure 9.
As another realization method, in step S403, i can be not limited to be equal to 6, can also be arbitrary in 5,7,8
One number.
It is understood that Model Matching process is according to corresponding in the postrotational preliminary text filed image to training set
The matched process of English text retains the lap, accordingly if postrotational image can be Chong Die with the training set image
, then another rotation is done to tentatively text filed, if postrotational image can be Chong Die with another training set image, again
Retain lap finally to be superimposed out by all laps, obtains correction of a final proof text.
(a) shows a model for being known as " tilting group box " in Fig. 9, includes only one the model describe each frame
Letter.The group box that this mistake introduces occurs when mainly text tilts in a single direction.
(b) shows that one is known as " long interval group box " in Fig. 9, and the group box that this mistake introduces indicates that letter is located at
There is prodigious interval at the both ends of each frame between them.Such case will be in the distance between the text that different directions arrange
Occur when too close.
An important factor for increment of rotation and number of revolutions are testing results.It, will most for balance quality and time complexity
High rotation angle degree is set as 30 degree.As shown in Figure 10, the increment of rotation is spent from 1 to 15 and is differed, and increment is smaller, obtains maximal degree institute
The number needed is more, the experimental results showed that, when increment reaches 10 degree, three indexs such as precision, recall rate and f indexs reach
Peak value, this is the final rotation angle threshold value of the method proposed.
In the present invention, in order to verify proposition algorithm correctness and validity, to ICDAR 2011 and ICDAR
Contrast experiment has been carried out on 2013 data sets.2011 test sets of ICDAR include 255 images, and 2013 test sets of ICDAR include
233 images.Each image corresponds to a txt document, it has recorded the specific coordinate for the text for needing to detect.
The assessment of detection result mainly calculates the text filed registration between actual text region of correction detected.
For each rectangle to be assessed, maximum matching value is used.Formula is as follows:
m(r;R)=max m (r, r') | r' ∈ R }
R indicates that correction is text filed, and r' indicates actual text region;A (r) indicates the rectangular area of the text filed r of correction,
R indicates matched region collection.Maximum area matching is obtained, then computational accuracy, recall rate and f indexs.Formula is as follows:
E indicates that the text filed set of correction to be detected, T indicate rectangular set to be assessed.F-measure be precision and
The combination of recall rate.The relative weighting of precision and recall rate is controlled by parameter alpha, is usually arranged as 0.5, and precision and recall rate is made to have
There is identical weight.
In the present invention, some comparative experiments demonstrate proposed method can extract it is more text filed.
The extraction result of 1 difference MSER methods of table
According to table 1 (only considering the performance of alphabetical rank, do not consider the final result of word level), in Laplacian and
After the pretreatment of multichannel, more text filed (recall rate increases) can be extracted, but is also extracted more non-textual
Region (precision reduction).
Context of methods and existing Method for text detection are carried out quantitative ratio by the validity of the method used to illustrate the invention
Compared with.Training set is manually generated by ICDAR 2011 and ICDAR 2013 using multichannel MSER, it includes 44908 English texts
Image and 56139 non English language text images.Collect 25% training set as verification, by training process, rate of accuracy reached arrives
96%.Training for SPP-CNN, using cross validation and stochastic gradient descent (SGD).To ICDAR 2011 and ICDAR
2013 compare experiment in 5 kinds of methods, the English text image identified for the present invention such as Figure 11, it can be seen that this
Invention can effectively identify English text and can realize correction.
As can be seen that the method for the present invention is in recall rate and f indexs from table 2 and table 3, it is superior to the prior art.
Influence of the table 2 in 2011 Scene text detections of ICDAR
Influence of the table 3 in 2013 Scene text detections of ICDAR
It is corresponded to respectively in table 2 and the documents in table 3:
[1]Liu Z,Li Y,Qi X,et al.Method for unconstrained text detection in
natural scene image[J].Iet Computer Vision,2017,11(7):596-604.
[2]Wu H,Zou B,Zhao YQ,et al.Natural scene text detection by multi-
scale adaptive color clustering and non-text filtering[J].Neurocomputing,
2016,214:1011–1025.
[3]Yu C,Song Y,Zhang Y.Scene text localization using edge analysis
and feature pool[J].Neurocomputing,2015,175:652-661.
[4]Yao Li,Wenjing Jia,Chunhua Shen,et al.Characterness:An Indicator
of Text in the Wild[J].IEEE transactions on image processing:a publication of
the IEEE Signal Processing Society,2014,23(4):1666-77.
[5]Tian C,Xia Y,Zhang X,et al.Natural Scene Text Detection with MC-MR
Candidate Extraction and Coarse-to-Fine Filtering[J].Neurocomputing,2017.
[6]Zhu A,Gao R,Uchida S.Could scene context be beneficial for scene
text detection[J].Pattern Recognition,2016,58:204-215.
[7]Neumann L,Matas J.Efficient Scene text localization and
recognition with local character refinement[C]//International Conference on
Document Analysis and Recognition.IEEE,2015:746-750.
[8]Gomez L,Karatzas D.A fast hierarchical method for multi-script and
arbitrary oriented scene text extraction[J].2014,19(4):1-15.
The present invention can detect the text of slight inclination and the English text with different fonts or size, as Fig. 9 be at
Work(detects case.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include:ROM, RAM, disk or CD etc..
Embodiment provided above has carried out further detailed description, institute to the object, technical solutions and advantages of the present invention
It should be understood that embodiment provided above is only the preferred embodiment of the present invention, be not intended to limit the invention, it is all
Any modification, equivalent substitution, improvement and etc. made for the present invention, should be included in the present invention within the spirit and principles in the present invention
Protection domain within.
Claims (10)
1. a kind of English text detection method with text orientation correction, which is characterized in that include the following steps:
S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from image
Extract maximum stable extremal region;It obtains candidate text filed;
S2, the grader based on convolutional neural networks model is established, extracts candidate text filed feature;Utilize softmax
Function text filed is divided into text class region and non-textual class region according to candidate text filed feature, by candidate;It filters non-
Text class region, acquisition is preliminary text filed, that is, detects English text;
S3, using the double-deck text packets algorithm by the preliminary text filed grouping;
S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text.
2. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute
Stating channel includes:Red channel, green channel, blue channel, tone channel, saturation degree channel, lightness channel and grey channel.
3. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute
It states and establishes the grader based on convolutional neural networks model, extracting candidate text filed feature includes:Respectively according to classification
Five layer architectures in device obtain candidate text filed fisrt feature and obtain candidate the second text filed spy by cross-layer
Sign, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, the second convolutional layer, pyramid pond layer with
And full articulamentum;Cross-layer indicates the first convolutional layer to full articulamentum.
4. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute
The acquisition modes for stating fisrt feature are:Candidate text filed progress first time filter is checked using the first convolution in the first layer architecture
Wave;It will the text filed maximum pond of the progress in the second layer architecture of first time filtered candidate;Using in third layer architecture
Second convolution kernel, to candidate's second of filtering of text filed progress behind maximum pond;Candidate text filtered to second
Region carries out utilizing pyramid pond in the 4th layer architecture;It is text filed in layer 5 frame to the candidate behind pyramid pond
It is connected entirely in structure, to extract candidate text filed fisrt feature.
5. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute
The acquisition modes for stating second feature are:Using the feature added manually, the first convolution is checked into candidate text filed progress first
Secondary filtering;It is connected entirely according to the feature added manually by filtered candidate is text filed, to extract candidate text
The second feature in region.
6. a kind of English text detection method with text orientation correction according to claim 5, which is characterized in that institute
Stating the feature added manually includes:Depth-width ratio, compactness, stroke width area ratio, local contrast and boundary key point.
7. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute
It states and includes by the preliminary text filed grouping using the double-deck text packets algorithm:Preliminary text filed carry out vertical grouping is incited somebody to action,
It specifically includes:
Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255n;Obtain (n+1)th it is preliminary it is text filed in
The maximum Y axis coordinate t that pixel is 255n+1;Obtain (n+1)th preliminary text filed height hn+1;
Computed altitude is poorIf difference in height dn,n+1More than height threshold;Then by two preliminary text areas
Domain is divided into identical class, that is, belongs to one text row;If difference in height dn,n+1Less than or equal to height threshold, then two it is preliminary
Text filed is not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and by the new line of text in Y-axis
Direction is split.
8. a kind of English text detection method with text orientation correction according to claim 7, which is characterized in that institute
It states and further includes by the preliminary text filed grouping using the double-deck text packets algorithm:It will horizontal point of preliminary text filed progress
Group specifically includes:
Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis;Range difference Δ d includes:Together
The distance between letter d in one word1The distance between word d2;
According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value;
Obtain the ratio of pitch and intervalIf the ratio d of pitch and intervalhLess than width threshold value,
Two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and intervalhBe more than or
Equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two adjacent preliminary text filed are not belonging to
The same word, by the preliminary text filed beginning as new word of the latter.
9. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute
It states the preliminary text filed carry out correction for direction after grouping, to realize that the correction of English text includes:
S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula;Set initial value
I=1, α=- 30 °;
S402, by Model Matching process, the filtering of the group box that introduces errors into;It is text filed to obtain i-th of correction undetermined;
S403, work as i<When 6 ,+10 ° of i=i+1, α=α;Return to step S401, as i=6, extremely by the 1st correction text undetermined
6th correction text overlay undetermined, to obtain correction of a final proof text.
10. a kind of English text detection method with text orientation correction according to claim 9, which is characterized in that
The group box includes:Tilt group box and long interval group box;The inclination group box includes a letter;Between the length
It is located at both ends every the letter that group box includes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810429149.XA CN108647681B (en) | 2018-05-08 | 2018-05-08 | A kind of English text detection method with text orientation correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810429149.XA CN108647681B (en) | 2018-05-08 | 2018-05-08 | A kind of English text detection method with text orientation correction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108647681A true CN108647681A (en) | 2018-10-12 |
CN108647681B CN108647681B (en) | 2019-06-14 |
Family
ID=63749675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810429149.XA Active CN108647681B (en) | 2018-05-08 | 2018-05-08 | A kind of English text detection method with text orientation correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647681B (en) |
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800735A (en) * | 2019-01-31 | 2019-05-24 | 中国人民解放军国防科技大学 | Accurate detection and segmentation method for ship target |
CN109934229A (en) * | 2019-03-28 | 2019-06-25 | 网易有道信息技术(北京)有限公司 | Image processing method, device, medium and calculating equipment |
CN110298343A (en) * | 2019-07-02 | 2019-10-01 | 哈尔滨理工大学 | A kind of hand-written blackboard writing on the blackboard recognition methods |
CN110674815A (en) * | 2019-09-29 | 2020-01-10 | 四川长虹电器股份有限公司 | Invoice image distortion correction method based on deep learning key point detection |
CN111353493A (en) * | 2020-03-31 | 2020-06-30 | 中国工商银行股份有限公司 | Text image direction correction method and device |
WO2021056255A1 (en) * | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
CN112825141A (en) * | 2019-11-21 | 2021-05-21 | 上海高德威智能交通系统有限公司 | Method and device for recognizing text, recognition equipment and storage medium |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
CN113298079A (en) * | 2021-06-28 | 2021-08-24 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
WO2021196013A1 (en) * | 2020-03-31 | 2021-10-07 | 京东方科技集团股份有限公司 | Word recognition method and device, and storage medium |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
CN113837169A (en) * | 2021-09-29 | 2021-12-24 | 平安科技(深圳)有限公司 | Text data processing method and device, computer equipment and storage medium |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
CN114283431A (en) * | 2022-03-04 | 2022-04-05 | 南京安元科技有限公司 | Text detection method based on differentiable binarization |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325099A (en) * | 2013-07-11 | 2013-09-25 | 北京智诺英特科技有限公司 | Image correcting method and device |
CN105279149A (en) * | 2015-10-21 | 2016-01-27 | 上海应用技术学院 | Chinese text automatic correction method |
CN105426887A (en) * | 2015-10-30 | 2016-03-23 | 北京奇艺世纪科技有限公司 | Method and device for text image correction |
CN105740774A (en) * | 2016-01-25 | 2016-07-06 | 浪潮软件股份有限公司 | Text region positioning method and apparatus for image |
CN105868758A (en) * | 2015-01-21 | 2016-08-17 | 阿里巴巴集团控股有限公司 | Method and device for detecting text area in image and electronic device |
CN106650725A (en) * | 2016-11-29 | 2017-05-10 | 华南理工大学 | Full convolutional neural network-based candidate text box generation and text detection method |
CN106778757A (en) * | 2016-12-12 | 2017-05-31 | 哈尔滨工业大学 | Scene text detection method based on text conspicuousness |
CN106997470A (en) * | 2017-02-28 | 2017-08-01 | 信雅达系统工程股份有限公司 | Tilt bearing calibration and the system of text image |
CN107066972A (en) * | 2017-04-17 | 2017-08-18 | 武汉理工大学 | Natural scene Method for text detection based on multichannel extremal region |
CN107992869A (en) * | 2016-10-26 | 2018-05-04 | 深圳超多维科技有限公司 | For tilting the method, apparatus and electronic equipment of word correction |
-
2018
- 2018-05-08 CN CN201810429149.XA patent/CN108647681B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325099A (en) * | 2013-07-11 | 2013-09-25 | 北京智诺英特科技有限公司 | Image correcting method and device |
CN105868758A (en) * | 2015-01-21 | 2016-08-17 | 阿里巴巴集团控股有限公司 | Method and device for detecting text area in image and electronic device |
CN105279149A (en) * | 2015-10-21 | 2016-01-27 | 上海应用技术学院 | Chinese text automatic correction method |
CN105426887A (en) * | 2015-10-30 | 2016-03-23 | 北京奇艺世纪科技有限公司 | Method and device for text image correction |
CN105740774A (en) * | 2016-01-25 | 2016-07-06 | 浪潮软件股份有限公司 | Text region positioning method and apparatus for image |
CN107992869A (en) * | 2016-10-26 | 2018-05-04 | 深圳超多维科技有限公司 | For tilting the method, apparatus and electronic equipment of word correction |
CN106650725A (en) * | 2016-11-29 | 2017-05-10 | 华南理工大学 | Full convolutional neural network-based candidate text box generation and text detection method |
CN106778757A (en) * | 2016-12-12 | 2017-05-31 | 哈尔滨工业大学 | Scene text detection method based on text conspicuousness |
CN106997470A (en) * | 2017-02-28 | 2017-08-01 | 信雅达系统工程股份有限公司 | Tilt bearing calibration and the system of text image |
CN107066972A (en) * | 2017-04-17 | 2017-08-18 | 武汉理工大学 | Natural scene Method for text detection based on multichannel extremal region |
Non-Patent Citations (7)
Title |
---|
BAOGUANG SHI等: "Detecting Oriented Text in Natural Images by Linking Segments", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
JIN DAI等: "Scene Text Detection Based on Enhanced Multi-channels MSER and a Fast Text Grouping Process", 《2018 THE 3RD IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS》 * |
KAIMING HE等: "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
RUI ZHU等: "Text detection based on convolutional neural networks with spatial pyramid pooling", 《2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 * |
TONG HE等: "Text-Attentional Convolutional Neural Network for Scene Text Detection", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
朱其猛: "基于文字结构特征的文本图像方向的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李玉冰: "基于深度网络的视觉跟踪算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (94)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
CN109800735A (en) * | 2019-01-31 | 2019-05-24 | 中国人民解放军国防科技大学 | Accurate detection and segmentation method for ship target |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
CN109934229B (en) * | 2019-03-28 | 2021-08-03 | 网易有道信息技术(北京)有限公司 | Image processing method, device, medium and computing equipment |
CN109934229A (en) * | 2019-03-28 | 2019-06-25 | 网易有道信息技术(北京)有限公司 | Image processing method, device, medium and calculating equipment |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
CN110298343A (en) * | 2019-07-02 | 2019-10-01 | 哈尔滨理工大学 | A kind of hand-written blackboard writing on the blackboard recognition methods |
WO2021056255A1 (en) * | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
CN110674815A (en) * | 2019-09-29 | 2020-01-10 | 四川长虹电器股份有限公司 | Invoice image distortion correction method based on deep learning key point detection |
US11928872B2 (en) | 2019-11-21 | 2024-03-12 | Shanghai Goldway Intelligent Transportation System Co., Ltd. | Methods and apparatuses for recognizing text, recognition devices and storage media |
CN112825141B (en) * | 2019-11-21 | 2023-02-17 | 上海高德威智能交通系统有限公司 | Method and device for recognizing text, recognition equipment and storage medium |
CN112825141A (en) * | 2019-11-21 | 2021-05-21 | 上海高德威智能交通系统有限公司 | Method and device for recognizing text, recognition equipment and storage medium |
WO2021196013A1 (en) * | 2020-03-31 | 2021-10-07 | 京东方科技集团股份有限公司 | Word recognition method and device, and storage medium |
CN111353493B (en) * | 2020-03-31 | 2023-04-28 | 中国工商银行股份有限公司 | Text image direction correction method and device |
US11651604B2 (en) | 2020-03-31 | 2023-05-16 | Boe Technology Group Co., Ltd. | Word recognition method, apparatus and storage medium |
CN111353493A (en) * | 2020-03-31 | 2020-06-30 | 中国工商银行股份有限公司 | Text image direction correction method and device |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
CN113298079A (en) * | 2021-06-28 | 2021-08-24 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN113298079B (en) * | 2021-06-28 | 2023-10-27 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN113837169B (en) * | 2021-09-29 | 2023-12-19 | 平安科技(深圳)有限公司 | Text data processing method, device, computer equipment and storage medium |
CN113837169A (en) * | 2021-09-29 | 2021-12-24 | 平安科技(深圳)有限公司 | Text data processing method and device, computer equipment and storage medium |
CN114283431A (en) * | 2022-03-04 | 2022-04-05 | 南京安元科技有限公司 | Text detection method based on differentiable binarization |
Also Published As
Publication number | Publication date |
---|---|
CN108647681B (en) | 2019-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647681B (en) | A kind of English text detection method with text orientation correction | |
CN111325203B (en) | American license plate recognition method and system based on image correction | |
Qureshi et al. | A bibliography of pixel-based blind image forgery detection techniques | |
CN107491730A (en) | A kind of laboratory test report recognition methods based on image procossing | |
Yin et al. | Hot region selection based on selective search and modified fuzzy C-means in remote sensing images | |
CN104408449B (en) | Intelligent mobile terminal scene literal processing method | |
CN110232387B (en) | Different-source image matching method based on KAZE-HOG algorithm | |
CN111539409B (en) | Ancient tomb question and character recognition method based on hyperspectral remote sensing technology | |
CN106529532A (en) | License plate identification system based on integral feature channels and gray projection | |
CN116071763B (en) | Teaching book intelligent correction system based on character recognition | |
CN110738216A (en) | Medicine identification method based on improved SURF algorithm | |
Hallale et al. | Twelve directional feature extraction for handwritten English character recognition | |
CN110689003A (en) | Low-illumination imaging license plate recognition method and system, computer equipment and storage medium | |
CN115311746A (en) | Off-line signature authenticity detection method based on multi-feature fusion | |
CN104899551B (en) | A kind of form image sorting technique | |
CN110084229A (en) | A kind of seal detection method, device, equipment and readable storage medium storing program for executing | |
CN109741351A (en) | A kind of classification responsive type edge detection method based on deep learning | |
CN110222660B (en) | Signature authentication method and system based on dynamic and static feature fusion | |
CN101727579A (en) | Method for detecting deformed character, method and device for determining water marking information in deformed character | |
Su et al. | Skew detection for Chinese handwriting by horizontal stroke histogram | |
CN111612045B (en) | Universal method for acquiring target detection data set | |
CN114862883A (en) | Target edge extraction method, image segmentation method and system | |
CN110555792B (en) | Image tampering blind detection method based on normalized histogram comprehensive feature vector | |
Sathisha | Bank automation system for Indian currency-a novel approach | |
Sushma et al. | Text detection in color images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |