CN107545262A - A kind of method and device that text is detected in natural scene image - Google Patents
A kind of method and device that text is detected in natural scene image Download PDFInfo
- Publication number
- CN107545262A CN107545262A CN201710642311.1A CN201710642311A CN107545262A CN 107545262 A CN107545262 A CN 107545262A CN 201710642311 A CN201710642311 A CN 201710642311A CN 107545262 A CN107545262 A CN 107545262A
- Authority
- CN
- China
- Prior art keywords
- text
- scene image
- natural scene
- line
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
A kind of method and device that text is detected in natural scene image, to solve the problems, such as that detection text precision is relatively low from the natural scene image of differing complexity in the prior art.This method includes:Obtain natural scene image, pass through FCN models, convolution algorithm is carried out to the natural scene image of acquisition, obtain the convolution feature of natural scene image, according to the convolution feature of natural scene image, the text candidates regional sequence that natural scene image includes is determined, for each text candidates region in text candidates regional sequence, is performed:Pass through area-of-interest pond layer, extract the convolution feature in text candidates region, and pass through eigentransformation, by the convolution feature in text candidates region, it is converted into fixed dimension k characteristic vector, according to the characteristic vector of time Recursive Networks model and fixed dimension k, the position for the line of text that text candidates region includes is determined, wherein k is positive integer.
Description
Technical field
The application is related to text detection techniques field, more particularly to a kind of method that text is detected in natural scene image
And device.
Background technology
Natural scene image, refer to by various capture apparatus (for example, camera, there is the mobile phone etc. of shoot function),
Under conditions of there is no specific limitation, the image directly to the scene capture of necessary being in life.Text in natural scene image
Abundant semantic information originally can be provided, for example, the text message of street, car plate, menu etc. is identified in natural scene image, can
Auxiliary people easily understand scene information, therefore, accurately detect that text is necessary in natural scene image.But
It is, due to the factor such as difference and highly mixed and disorderly background such as font, color, form of natural scene image Chinese version, certainly
It is a challenging job to detect text in right scene image.
At present, the method for text being detected in natural scene image can be divided into two major classes, be respectively:Based on sliding window
Detection method and detection method based on connected domain.Specifically:
The operation principle of detection method based on sliding window is:Primitive nature is scanned using the sliding window of different scale
Scene image, a series of natural scene image subregions that may include text are obtained, the texture for extracting these subregions is special
Sign, and whether include text using the textural characteristics training grader of extraction, checking subregion, this method passes through multiple dimensioned cunning
Dynamic window is constantly slided in natural scene image with certain step-length to extract subregion, the process of extraction and its time-consuming, and
And whether include text using the textural characteristics checking subregion of low level, make its Detection results bad.
The operation principle of detection method based on connected domain is:Pass through the color of character pixels point, the stroke width of character
Etc. feature, connected region is extracted from natural scene image, analyzes the feature of connected region, rule is merged by character, obtained
Text-string, character string is verified, removal is non-legible, obtains final detection result, it is more simple that this method is suitable only for processing background
Single natural scene image.
Above two method by rudimentary another characteristic, such as the stroke width of character, image texture characteristic etc., comes area
Point natural scene image Chinese version and background, accuracy of detection is relatively low, therefore, how from the natural scene image of differing complexity
In accurately detection text be urgent problem to be solved.
The content of the invention
The application provides a kind of method and device that text is detected in natural scene image, to solve in the prior art
The problem of text precision is relatively low is detected from the natural scene image of differing complexity.
In a first aspect, this application provides a kind of method that text is detected in natural scene image, it is in the method, first
Natural scene image is first obtained, by full convolutional network (fully convolutional networks, FCN) model, to institute
The natural scene image for stating acquisition carries out convolution algorithm, the convolution feature of the natural scene image is obtained, according to the nature
The convolution feature of scene image, the text candidates regional sequence that the natural scene image includes is determined, for the text
Each text candidates region in the sequence of candidate region, perform:Pass through area-of-interest pond layer (roi-pooling), extraction
The convolution feature in the text candidates region, and by eigentransformation, the convolution feature in the text candidates region is converted into
Fixed dimension k characteristic vector, according to the characteristic vector of time Recursive Networks model and fixed dimension k, determine that the text is waited
The position for the line of text that favored area includes, wherein k are positive integer, each text candidates in the text candidates regional sequence
Region comprises at least a line of text, the single file text that text candidates region described in the text behavior includes.
The text candidates regional sequence in natural scene image, pin are detected by FCN models in the embodiment of the present application roughly
To each text candidates region in text candidates regional sequence, passage time Recursive Networks model determines text candidates region Zhong Bao
The position of the line of text included, and then realize the accurate detection to natural scene image Chinese version.Compared to passing through in the prior art
Rudimentary another characteristic distinguishes the method for natural scene image Chinese version and background, based on FCN Model Fusion time Recursive Networks
The method of model inspection natural scene image Chinese version, eliminate the reliance on the spy of the low levels such as stroke width, the image texture of character
Sign, to distinguish the text and background in natural scene image, but passes through the depth of FCN models and time Recursive Networks model
Learning ability, the semantic information of contextual information and text in natural scene image is made full use of, can accurately determined certainly
The position for the line of text that right scene image includes.
In a kind of possible design, according to the convolution feature of the natural scene image, the natural scene image is determined
The text candidates regional sequence included, including:By being merged to the convolution feature of the natural scene image, table is determined
Levy the convolution feature of the natural scene image Chinese version position;Use the volume for characterizing the natural scene image Chinese version position
Product feature, the natural scene image is mapped, and mark the text position in the natural scene image and it is described from
Non-textual position in right scene image;At least one region of text position will be noted as in the natural scene image,
It is defined as the text candidates regional sequence.
In above-mentioned design, by the convolution feature of the natural scene image of FCN extractions, determine that natural scene image includes
Text candidates regional sequence method, based on the learning ability of FCN pixel scales, make full use of in natural scene image up and down
The semantic information of literary information and text, the text in natural scene image and background are separated, and then determine natural scene
As the text candidates regional sequence included, pass through rudimentary another characteristic, such as the stroke width of character, figure compared with prior art
As textural characteristics etc., to distinguish the method for natural scene image Chinese version and background, the determination text that the embodiment of the present application provides
The method of candidate region sequence, it can more accurately determine the position for the line of text that natural scene image includes.
In the embodiment of the present application, by the convolution feature of natural scene image, it is determined that text candidates regional sequence in wrap
The different text candidates regions included it is of different sizes, for cause subsequently can be according to time Recursive Networks model to text candidates region
Sequence carries out unified processing, and by roi-pooling in the embodiment of the present application, normalizing is carried out to text candidates regional sequence
Change, by the convolution feature in text candidates region, be converted into fixed dimension k characteristic vector, the convolution in text candidates region is special
Sign is converted into after fixed dimension k characteristic vector, according to the characteristic vector of time Recursive Networks model and fixed dimension k, really
Determine the position for the line of text that text candidates region includes.
In a kind of possible design, the time Recursive Networks model includes N layers shot and long term memory (long short-
Term memory, LSTM), wherein, N is set greater than the positive integer equal to maximum text number of lines, the maximum text line number
Mesh is the text number of lines in the most text candidates region of the text number of lines that includes of the text candidates regional sequence.
It is described according to time Recursive Networks model and fixed dimension k based on the time Recursive Networks model including N layers LSTM
Characteristic vector, determine the position for the line of text that the text candidates region includes, specifically include:By the fixed dimension k
Characteristic vector, as the N layers LSTM time frame input, gradually input what the time Recursive Networks model included
LSTM, wherein, the first layer for first only inputting the characteristic vector of the fixed dimension k in the time Recursive Networks model
LSTM, the preceding layer LSTM results exported and the characteristic vector of the fixed dimension k are inputted into next layer each time afterwards
LSTM, characteristic vector and the text position demarcated in advance using the fixed dimension k, to the time Recursive Networks model
It is trained, obtains line of text candidate frame;The upper and lower of the line of text candidate frame, left and right edges are returned, detected and connected
It is logical, determine the angle of inclination of the line of text candidate frame;According to the line of text candidate frame and the line of text candidate frame
Angle of inclination, determine the position for the line of text that the text candidates region includes.
In above-mentioned design, by fixed dimension k characteristic vector, the N layers that gradually input time Recursive Networks model includes
LSTM, in addition to first layer LSTM, each layer of LSTM afterwards inputs preceding layer LSTM testing result, based on N layer recurrence
LSTM network design, time Recursive Networks model determine it is determined that during current text row candidate frame using previous LSTM
The information of line of text candidate frame so that the determination of current text candidate frame is more accurate.Further, N layers LSTM time is passed through
Recursive Networks model determines the angle of inclination of the line of text candidate frame, can be achieved to tilt the detection of text.
In a kind of possible design, after the position for determining the line of text that the text candidates region includes, in addition to:
By matching algorithm, the position for the line of text that the text candidates region determined is included and demarcation in advance
Text position matched, it is determined that with the text position matching degree highest line of text demarcated in advance;Calculated by error
Method determines the matching degree highest line of text, the error between the text position of demarcation, and according to the error, updates institute
State the network parameter of FCN models and the time Recursive Networks model.
In above-mentioned design, by matching algorithm, it is determined that with the text position matching degree highest line of text demarcated in advance, should
Process ensures the position for only retaining a matching degree highest line of text for one text row simultaneously, by above-mentioned design, makes
It is more accurate to obtain the line of text detected in natural scene image.
In a kind of possible design, the N could be arranged to 5, may be arranged as other values certainly.
Second aspect, should be in natural scene this application provides a kind of device that text is detected in natural scene image
The device that text is detected in image has the function of realizing above-mentioned first aspect method, and the function can be realized by hardware,
Corresponding software can also be performed by hardware to realize.The hardware or software include one or more corresponding with above-mentioned function
Module.The module can be software and/or hardware.
The third aspect, this application provides a kind of equipment, the equipment can include memory and processor.Wherein, deposit
Reservoir is used for storage program, and the processor is used to perform the program in the memory, so as to perform first aspect or first
The method that text is detected in natural scene image being related in any possible design of aspect.
Fourth aspect, present invention also provides a kind of computer-readable recording medium, is stored thereon with some instructions, these
Instruction is called by computer when performing, and can cause computer completes any one of above-mentioned first aspect, first aspect may
Design in the involved method that text is detected in natural scene image.
5th aspect, the application provide a kind of computer program product, and the computer program product calls by computer
It can be completed during execution in first aspect and the arbitrarily possible design of above-mentioned first aspect involved in natural scene image
The method of middle detection text.
Brief description of the drawings
Fig. 1 is a kind of schematic network structure that text is detected in natural scene image that the application provides;
Fig. 2 is a kind of method flow diagram that text is detected in natural scene image that the application provides;
Fig. 3 is the method stream for the text candidates regional sequence that a kind of determination natural scene image that the application provides includes
Cheng Tu;
Fig. 4 is the method flow of the position for the line of text that a kind of determination text candidates region that the application provides includes
Figure;
Fig. 5 is another schematic network structure that text is detected in natural scene image that the application provides;
Fig. 6 is that a kind of line of text that the application provides matches schematic diagram;
Fig. 7 is a kind of schematic device that text is detected in natural scene image that the application provides;
Fig. 8 is a kind of equipment that text is detected in natural scene image that the application provides.
Embodiment
In order that the purpose, technical scheme and advantage of the application are clearer, the application is implemented below in conjunction with accompanying drawing
Example is described.
The embodiment of the present application provides a kind of method and device that text is detected in natural scene image, existing to solve
The problem of text precision is relatively low is detected in technology from the natural scene image of differing complexity.Wherein, method and apparatus are
Based on same inventive concept, it is similar to solve the principle of problem due to method and apparatus, therefore the implementation of apparatus and method can be with
Cross-reference, repeat part and repeat no more.
The method and device that text is detected in natural scene image that the embodiment of the present application provides, can be applied in nature
In the equipment that text is detected in scene image, for example, computer, tablet personal computer, smart mobile phone, server etc..
The application field of the embodiment of the present application includes but is not limited to, in natural scene image detect text field,
The detection field of class text wisp field or other type of object is detected in natural scene image.
Fig. 1 shows a kind of network structure signal that text is detected in natural scene image that the embodiment of the present application provides
Figure, as shown in fig.1, the network structure includes FCN models, time Recursive Networks model and roi-pooling, FCN model
Natural scene image is obtained, by handling the natural scene image got, obtains the text in natural scene image
Candidate region sequence, text candidates regional sequence is handled by roi-pooling, be fixed the feature of dimension to
Amount, according to time Recursive Networks model and the characteristic vector of fixed dimension, determines the line of text that text candidates region includes
Position.
It should be noted that the network structure that the embodiment of the present application detects text in natural scene image includes but unlimited
Due to the network structure shown in Fig. 1.
In the embodiment of the present application, FCN models can be reconfigured based on existing convolutional neural networks structure, and the application is implemented
The convolutional neural networks structure for constructing FCN models is not limited in example, for example, depth residual error network (deep can be passed through
Residualnetworks, ResNet) in ResNet-101 network structures construction FCN models, specifically, by ResNet-101
Full articulamentum in the network architecture replaces with warp lamination, and convolutional layer and pond layer can then choose suitable number according to practical application
Mesh.Wherein, it is made up of based on the FCN models that convolutional neural networks structure reconfigures convolutional layer and pond layer, no longer comprising complete
Articulamentum so that the image of input can be arbitrary size, and can retain the spatial positional information of low resolution, can realize end
To the prediction of the pixel scale at end.
Fig. 2 show a kind of method flow diagram that text is detected in natural scene image of the embodiment of the present application offer,
As shown in fig.1, including:
S101:Obtain natural scene image.In the embodiment of the present application, natural scene image, refer to by various capture apparatus
(for example, camera, there is the mobile phone etc. of shoot function), under conditions of no specific limitation, directly to truly being deposited in life
Scene capture image.
It should be noted that the mode for obtaining natural scene image includes but is not limited to:Nature is gathered by sensing equipment
Scene image, acquisition etc. from the database for be previously stored with natural scene image.The sensing equipment includes but is not limited to:Light
Fine sensing equipment, picture pick-up device, collecting device etc..The database includes but is not limited to:Local data base, cloud database, U
Disk, hard disk etc..
S102:By FCN models, convolution algorithm is carried out to natural scene image, the convolution for obtaining natural scene image is special
Sign.In the embodiment of the present application, based on the FCN models constructed, convolution algorithm is carried out to the natural scene image got, obtained
The convolutional layer of natural scene image, last convolutional layer of right scene image is obtained from by the warp lamination of FCN models
Convolution feature, and then obtain the convolution feature of natural scene image.
S103:According to the convolution feature of natural scene image, the text candidates region that natural scene image includes is determined
Sequence.Wherein, each text candidates region in text candidates regional sequence comprises at least a line of text, the text behavior
The single file text that text candidates region includes.
In the embodiment of the present application, according to the convolution feature of natural scene image, the text determined in natural scene image
Candidate region includes at least one line of text, and the final purpose that text is detected in natural scene image is that output is all only
Vertical line of text, the embodiment of the present application is in order to accurately determine at least one line of text that the text candidates region includes, pin
To each text candidates region in text candidates regional sequence, following S104 and S105 operation can be performed.
S104:By roi-pooling, the convolution feature in extraction text candidates region, and by eigentransformation, by text
The convolution feature of candidate region, is converted into fixed dimension k characteristic vector, and k is positive integer.In the embodiment of the present application, go out below
Existing fixed dimension k characteristic vector is identical with the characteristic vector implication of fixed dimension k herein.
In the embodiment of the present application, by the convolution feature of natural scene image, it is determined that text candidates regional sequence in wrap
The different text candidates regions included it is of different sizes, for cause subsequently can be according to time Recursive Networks model to text candidates region
Sequence carries out unified processing, and by roi-pooling in the embodiment of the present application, normalizing is carried out to text candidates regional sequence
Change, by the convolution feature in text candidates region, be converted into fixed dimension k characteristic vector.
S105:According to the characteristic vector of time Recursive Networks model and fixed dimension k, determine that text candidates region includes
Line of text position.
FCN and time Recursive Networks technology are based in the embodiment of the present application, by designing FCN Model Fusion time recurrence
The network structure of network model, detects text in natural scene image, using FCN models and time Recursive Networks model from
The effective feature representation of substantial amounts of natural scene image training sample learning, training can detect in natural scene image
Go out the UNE of line of text, specifically, detecting the text candidates region sequence in natural scene image roughly by FCN models
Row, for each text candidates region in text candidates regional sequence, passage time Recursive Networks model determines text candidates area
The position for the line of text that domain includes, and then realize the accurate detection to natural scene image Chinese version.Compared to prior art
In the method for natural scene image Chinese version and background is distinguished by rudimentary another characteristic, passed based on the FCN Model Fusion times
Return the method for network model detection natural scene image Chinese version, it is rudimentary to eliminate the reliance on stroke width, the image texture of character etc.
Another characteristic, to distinguish the text and background in natural scene image, but pass through FCN models and time Recursive Networks model
Deep learning ability, make full use of the semantic information of contextual information and text in natural scene image, it is possible to achieve essence
The position for the line of text that true determination natural scene image includes.
As shown in fig.3, for the convolution feature according to natural scene image, the text that natural scene image includes is determined
The process of this candidate region sequence, is specifically described:
S201:By being merged to the convolution feature of natural scene image, it is determined that characterizing natural scene image Chinese version
The convolution feature of position.
In the embodiment of the present application, the nature may include by the convolution feature of the natural scene image of FCN model extractions
The feature of multiple dimensions (for example, it may be 1024 dimensions) of scene image, to determine natural scene image Chinese version candidate regions
Domain sequence, by merging the convolution feature of natural scene image, determine to characterize nature field in natural scene image
The convolution feature of scape image Chinese version position.
S202:Using the convolution feature for characterizing natural scene image Chinese version position, natural scene image is mapped,
And the non-textual position in text position and natural scene image in natural scene image is marked by classification function.The application
In embodiment, the classification to the non-textual position use in the text position and natural scene image in mark natural scene image
Function does not limit, for example, it may be logistic functions, softmax functions etc..
S203:At least one region of text position will be noted as in natural scene image, be defined as text candidates area
Domain sequence.
It may include multiple texts in the embodiment of the present application, in natural scene image, therefore, determined in natural scene image
Text candidates regional sequence may include multiple text candidates regions, and each text candidates region includes at least one text
OK, the final purpose that text is detected but in natural scene image be output it is text filed including text-independent row, the application
After the text candidates regional sequence in natural scene image is determined by FCN models in embodiment, pass through roi-
Pooling, the convolution feature in extraction text candidates region, and by eigentransformation, the convolution feature in text candidates region turns
Fixed dimension k characteristic vector is turned to, specifically see S104, the convolution feature in text candidates region is converted into fixed dimension
After k characteristic vector, according to the characteristic vector of time Recursive Networks model and fixed dimension k, determine in text candidates region
Including line of text position.
In the embodiment of the present application, time Recursive Networks model may include N layer LSTM, wherein, N is set greater than being equal to maximum
The positive integer of text number of lines, the maximum text number of lines are that the text number of lines that text candidates regional sequence includes is most
Text number of lines in text candidates region.If for example, the text candidates number of regions determined in natural scene image be four,
Text candidates region A, text candidates region B, text candidates region C and text candidates region D are designated as respectively, and by counting this
Four text candidates region Chinese version number of lines, determine text candidates region A, text candidates region B, text candidates region C and
The text number of lines that text candidates region D includes is respectively 2,3,1,2, then N is set greater than into the positive integer equal to 3.
The embodiment of the present application is subsequently so that time Recursive Networks model includes N layers LSTM as an example, to according to time Recursive Networks
The characteristic vector of model and fixed dimension k, the process of the position for the line of text that text candidates region includes is determined, carried out specific
Description, refering to shown in Fig. 4:
S301:By fixed dimension k characteristic vector, the time frame as N layers LSTM inputs, gradually input time recurrence net
The LSTM that network model includes.
Wherein, first only by the first layer LSTM in fixed dimension k characteristic vector input time Recursive Networks model, it
The characteristic vector of the preceding layer LSTM results exported and fixed dimension k is inputted into next layer of LSTM each time afterwards, utilizes fixation
Dimension k characteristic vector and the text position demarcated in advance, time Recursive Networks model is trained, obtains line of text time
Select frame.
In the embodiment of the present application, by fixed dimension k characteristic vector, the N that gradually input time Recursive Networks model includes
Layer LSTM, in addition to first layer LSTM, each layer of LSTM afterwards inputs preceding layer LSTM testing result, passes through N layers
The design of LSTM network models, time Recursive Networks model is it is determined that during current text row candidate frame, using previous determination
Line of text candidate frame information so that the determination of current text candidate frame is more accurate.
S302:The upper and lower of line of text candidate frame, left and right edges are returned, detected and connected, determine line of text candidate
The angle of inclination of frame.
In the embodiment of the present application, passage time Recursive Networks model is carried out to the upper and lower of line of text candidate frame, left and right edges
Return, detection and connect, and then can determine that the angle of inclination of line of text candidate frame so that the embodiment of the present application offer in nature
The method of text is detected in scene image can support to tilt the detection of text, middle compared with prior art to use rectangular text candidate frame
It is determined that tilting the detection method of text, the method for the detection text that the embodiment of the present application provides, the positioning for tilting text is improved
Precision, and the angle of inclination of the line of text candidate frame is determined by N layers LSTM time Recursive Networks model, it can be achieved to tilt
The detection of text.
S303:According to the angle of inclination of line of text candidate frame and line of text candidate frame, text candidates region Zhong Bao is determined
The position of the line of text included.
In the embodiment of the present application, N number of LSTM that passage time Recursive Networks model includes determines text candidates region one by one
In single line of text position, and combine FCN network extractions text candidates region feature realize to line of text and text
The accurate detection at capable angle of inclination.
In actual test, the text number of lines that the text candidates region that is determined in natural scene image includes is generally not
More than 4, therefore, in a kind of possible design of the embodiment of the present application, the LSTM numbers of plies that time Recursive Networks model is included
It is arranged to 5, i.e. above-mentioned N is arranged to 5, ensures the time Recursive Networks model designed by the embodiment of the present application, it may be determined that go out
Line of text position in all text candidates regions, LSTM numbers of plies N in time Recursive Networks model is arranged to 5 network structure
It see shown in Fig. 5.
It should be noted that if the text number of lines that text candidates region includes is less than N, by preceding M, (M is less than N's
Positive integer) after layer LSTM determine the position for all line of text that text candidates region includes, remaining N subtracts M layers LSTM then
Export as null value.
In the embodiment of the present application, it will not be exported in order by the position of the N layers LSTM line of text determined, if for example, text
This candidate region includes 3 row single file texts, then the order exported by the position of the N layers LSTM line of text determined may be second
Row text position, the first row text position, the third line text position, and the order of actual desired output is the first row text position
Put, the second row text position, the third line text position;And flase drop there may be by the N layers LSTM line of text positions determined
Survey, for example, text candidates region is actual including three lines of text, and four line of text positions are determined by N layers LSTM;Due to upper
Problem is stated, the embodiment of the present application, by matching algorithm, is incited somebody to action it is determined that after the position for the line of text that text candidates region includes
The position for the line of text that the text candidates region determined includes is matched with the text position demarcated in advance, it is determined that with it is pre-
The text position matching degree highest line of text first demarcated, and matching degree highest line of text is determined by ERROR ALGORITHM, with mark
Error between fixed text position, and according to the error, update the network parameter of whole UNE.
In the embodiment of the present application, by matching algorithm, the position for the line of text that the text candidates region determined is included
Put and matched with the text position demarcated in advance, it is determined that with the text position matching degree highest line of text demarcated in advance, should
It is determined that with the text position matching degree highest line of text demarcated in advance during, it is ensured that only protected for one text row
Stay the position of a matching degree highest line of text.Specifically, above-mentioned matching process, can be by setting matching score graph to solicit articles this
The position for the line of text that candidate region includes and the matching degree for the text position demarcated in advance, matching fraction is more high then to represent text
The position for the line of text that this candidate region includes and the text position matching degree demarcated in advance are higher, by filtering out matching point
Number obtains the text position matching degree highest line of text with demarcating in advance less than the position for the line of text for pre-setting threshold value.
With an example below the embodiment of the present application, the position of the line of text included to the text candidates region determined
The process matched with the text position demarcated in advance illustrates, as shown in fig.6, assuming the text candidates of current detection
Region includes two line of text, is represented in figure 6 with solid box, in the text candidates region currently determined by N layers LSTM
Including line of text position be indicated by the dashed box, as dotted line frame 1,2,3 and 4 represents the text determined by N layers LSTM in Fig. 6
The line of text position that this candidate region includes, by matching algorithm, the dotted line frame 1,2,3 and 4 in Fig. 6 is entered with solid box
Row matching, it is determined that with solid box matching degree highest dotted line frame, it is final in Fig. 6 to determine dotted line frame 2 and 4 with solid box matching degree most
Height, thus in Fig. 6 can the line of text position according to corresponding to dotted line frame 2 and 4 determine line of text in natural scene image.
In the embodiment of the present application, the position of the line of text to the text candidates determined region is included is demarcated with advance
The matching algorithm that is matched of text position do not limit, for example, it may be Hungary Algorithm (hungary-loss), its
In, Hungary Algorithm is a kind of algorithm that bipartite graph maximum matching is sought with augmenting path, can effectively determine and demarcate in advance
Text position matching degree highest line of text position.
In the embodiment of the present application, by determining the error between matching degree highest line of text and the text position of demarcation,
The network parameter of whole UNE is adjusted, to improve the FCN time of fusion Recursive Networks performances of the embodiment of the present application design.
In the embodiment of the present application, the mistake of the error between the text position of matching degree highest line of text and demarcation pair is determined
Difference algorithm does not limit, for example, it may be cross entropy error algorithm.
Based on above method embodiment same idea, the embodiment of the present application additionally provide one kind in natural scene image
Detect the device of text.In the case of using integrated unit, Fig. 7 shows to detect the device of text in natural scene image
A kind of logical construction schematic diagram, the device can be applied to the equipment that text is detected in natural scene image, as shown in fig.7,
The device 100 of text is detected in natural scene image includes acquiring unit 101 and processing unit 102, wherein, acquiring unit
101, for obtaining natural scene image, acquiring unit 101 can be communication interface or transceiver that device possesses in itself etc., than
As natural scene image is transferred to by wirelessly or non-wirelessly mode the transceiver or communication interface of device by remote equipment, certainly also
Can be the input interface (such as the input interface such as keyboard, USB interface, touch-screen) that device possesses in itself, user can pass through
Natural image scene is input in device by these input interfaces.Processing unit 102, for by FCN models, to the acquisition
The natural scene image that unit 101 is got carries out convolution algorithm, obtains the convolution feature of the natural scene image, root
According to the convolution feature of the natural scene image, the text candidates regional sequence that the natural scene image includes, pin are determined
To each text candidates region in the text candidates regional sequence, perform:Pass through area-of-interest pond layer roi-
Pooling, the convolution feature in the text candidates region is extracted, and by eigentransformation, by the volume in the text candidates region
Product feature, is converted into fixed dimension k characteristic vector, k is positive integer, according to time Recursive Networks model and the fixed dimension
K characteristic vector, determine the position for the line of text that the text candidates region includes.
Wherein, each text candidates region in the text candidates regional sequence comprises at least a line of text, described
The single file text that text candidates region described in text behavior includes.
In a kind of possible design, the processing unit 102 can be especially by the convolution to the natural scene image
Feature merges, it is determined that characterizing the convolution feature of the natural scene image Chinese version position;Use the sign natural field
The convolution feature of scape image Chinese version position, maps the natural scene image, and described in being marked by classification function
The non-textual position in text position and the natural scene image in natural scene image;By in the natural scene image
At least one region of text position is noted as, is defined as the text candidates regional sequence.
In alternatively possible design, the time Recursive Networks model includes N layers shot and long term memory (long short-
Term memory, LSTM), wherein, N is set greater than the positive integer equal to maximum text number of lines, the maximum text line number
Mesh is the text number of lines in the most text candidates region of the text number of lines that includes of the text candidates regional sequence.Institute
Processing unit 102 is stated, can be especially by by the characteristic vector of the fixed dimension k, the time frame as the N layers LSTM is defeated
Enter, gradually input the LSTM that the time Recursive Networks model includes, wherein, first only by the feature of the fixed dimension k to
The first layer LSTM that amount is inputted in the time Recursive Networks model, afterwards each time by the preceding layer LSTM results exported and
The characteristic vector of the fixed dimension k inputs next layer of LSTM, characteristic vector and demarcation in advance using the fixed dimension k
Text position, the time Recursive Networks model is trained, obtains line of text candidate frame;To the line of text candidate frame
Up and down, left and right edges returned, detected and connected, determine the angle of inclination of the line of text candidate frame;According to the text
The angle of inclination of one's own profession candidate frame and the line of text candidate frame, determine the line of text that the text candidates region includes
Position.
In another possible design, the processing unit 102 can also be it is determined that the text candidates region includes
Line of text position after, by matching algorithm, the position for the line of text that the text candidates region determined is included
Put and matched with the text position demarcated in advance, it is determined that with the text position matching degree highest text demarcated in advance
OK;The matching degree highest line of text is determined by ERROR ALGORITHM, the error between the text position of demarcation, and according to institute
Error is stated, updates network parameter.
The N values referred in above-described embodiment can be, but not limited to be arranged to 5.
Division in the embodiment of the present application to module is schematical, only a kind of division of logic function, actual to realize
When can have other dividing mode, in addition, each functional module in each embodiment of the application can be integrated at one
Reason device in or be individually physically present, can also two or more modules be integrated in a module.It is above-mentioned integrated
Module can both be realized in the form of hardware, can also be realized in the form of software function module.
Wherein, integrated module is carried when being realized in the form of hardware as shown in figure 8, Fig. 8 show the embodiment of the present application
What is supplied detects the schematic diagram of the equipment 1000 of text in natural scene image.The equipment 1000 can be used for performing Fig. 2 to Fig. 4
In the method that is related to.As shown in figure 8, the equipment 1000 includes processor 1001 and memory 1002.The memory 1002
It is stored with computer program, instruction or code.The processor 1001 can be called and performed and be stored in the memory 1002
Program, instruction or code, to implement each step and function in above-mentioned embodiment, here is omitted.Above-mentioned processor
1001 embodiment can be referred to accordingly in acquiring unit 101 and processing unit 102 in above-mentioned Fig. 6 embodiments
Illustrate, repeat no more here.
Designed it is understood that Fig. 8 illustrate only the simplifying for equipment that text is detected in natural scene image.
In practical application, the equipment that text is detected in natural scene image is not limited to said structure, can divide in actual applications
Not Bao Han any number of interface, processor and memory etc., and it is all can realize the embodiment of the present application in natural scene
The equipment of text is detected in image all within the protection domain of the embodiment of the present application.
It will be further understood that the invention relates in natural scene image detect text device
100 and the equipment 1000 of text is detected in natural scene image, available for realizing the embodiment of the present application above method embodiment
In corresponding function, therefore for the embodiment of the present application description not enough in detail place, see the description of related method embodiment,
The embodiment of the present application will not be repeated here.
It will be further understood that the processor being related in the embodiment of the present application can be CPU
(central processing unit, referred to as " CPU "), can also be other general processors, digital signal processor
(DSP), application specific integrated circuit (ASIC), ready-made programmable gate array (FPGA) or other PLDs, discrete gate
Or transistor logic, discrete hardware components etc..General processor can be that microprocessor or the processor can also
It is any conventional processor etc..
Memory can include read-only storage and random access memory, and provide instruction and data to processor.Deposit
The a part of of reservoir can also include nonvolatile RAM.For example, memory can be with storage device type
Information.
Bus system can also include power bus, controlling bus and status signal bus in addition in addition to including data/address bus
Deng.But for the sake of clear explanation, various buses are all designated as bus system in figure.
In implementation process, each step being related in above method embodiment can be by the integrated of the hardware in processor
The instruction of logic circuit or software form is completed.Text is detected in natural scene image with reference to disclosed in the embodiment of the present application
The step of this method, can be embodied directly in hardware processor and perform completion, or with the hardware and software module in processor
Combination performs completion.Software module can be located at random access memory, flash memory, read-only storage, programmable read only memory or
In the ripe storage medium in this areas such as electrically erasable programmable memory, register.The storage medium is located in memory, place
The information in device reading memory is managed, the step of above method embodiment is related to is completed with reference to its hardware.To avoid repeating, here
It is not described in detail.
Based on additionally providing a kind of computer-readable storage medium with above method embodiment same idea, the embodiment of the present application
Matter, some instructions are stored thereon with, when these instructions call execution by computer, it is real can make it that computer completes the above method
Apply method involved in any one possible design of example, embodiment of the method.
Based on also providing a kind of computer program product with above method embodiment same idea, the application, the computer
Program product can be set when being called and performing by computer so that Method Of Accomplishment embodiment and above method embodiment are arbitrarily possible
Involved method in meter.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, the application can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The application be with reference to according to the present processes, equipment (system) and computer program product flow chart and/or
Block diagram describes.It should be understood that can by each flow in computer program instructions implementation process figure and/or block diagram and/or
Square frame and the flow in flow chart and/or block diagram and/or the combination of square frame.These computer program instructions can be provided to arrive
All-purpose computer, special-purpose computer, the processor of Embedded Processor or other programmable data processing devices are to produce one
Machine so that produced by the instruction of computer or the computing device of other programmable data processing devices and flowed for realizing
The device for the function of being specified in one flow of journey figure or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the application to the application
God and scope.So, if these modifications and variations of the application belong to the scope of the application claim and its equivalent technologies
Within, then the application is also intended to comprising including these changes and modification.
Claims (13)
- A kind of 1. method that text is detected in natural scene image, it is characterised in that including:Obtain natural scene image;By full convolutional network FCN models, convolution algorithm is carried out to the natural scene image, obtains the natural scene image Convolution feature;According to the convolution feature of the natural scene image, the text candidates region sequence that the natural scene image includes is determined Row, wherein, each text candidates region in the text candidates regional sequence comprises at least a line of text, the line of text The single file text included for the text candidates region;For each text candidates region in the text candidates regional sequence, perform:By area-of-interest pond layer roi-pooling, the convolution feature in the text candidates region is extracted, and passes through feature Conversion, by the convolution feature in the text candidates region, is converted into fixed dimension k characteristic vector, k is positive integer;According to the characteristic vector of time Recursive Networks model and the fixed dimension k, determine that the text candidates region includes Line of text position.
- 2. the method as described in claim 1, it is characterised in that according to the convolution feature of the natural scene image, determine institute The text candidates regional sequence that natural scene image includes is stated, including:By being merged to the convolution feature of the natural scene image, it is determined that characterizing described natural scene image Chinese one's own department or unit The convolution feature put;Using the convolution feature for characterizing the natural scene image Chinese version position, the natural scene image is mapped, And the non-textual position in text position and the natural scene image in the natural scene image is marked by classification function Put;At least one region of text position will be noted as in the natural scene image, be defined as the text candidates region Sequence.
- 3. method as claimed in claim 2, it is characterised in that the time Recursive Networks model is remembered including N layers shot and long term LSTM, wherein, N is set greater than the positive integer equal to maximum text number of lines, and the maximum text number of lines is the text Text number of lines in the most text candidates region of text number of lines that candidate region sequence includes;The characteristic vector according to time Recursive Networks model and fixed dimension k, determine that the text candidates region includes Line of text position, including:By the characteristic vector of the fixed dimension k, the time frame as the N layers LSTM inputs, and gradually inputs the time and passs Return the LSTM that network model includes, wherein, the characteristic vector of the fixed dimension k is only inputted into the time Recursive Networks first First layer LSTM in model, afterwards each time by the preceding layer LSTM results exported and the feature of the fixed dimension k to Amount next layer of LSTM of input, characteristic vector and the text position demarcated in advance using the fixed dimension k, to the time Recursive Networks model is trained, and obtains line of text candidate frame;The upper and lower of the line of text candidate frame, left and right edges are returned, detected and connected, determine the line of text candidate frame Angle of inclination;According to the angle of inclination of the line of text candidate frame and the line of text candidate frame, determine in the text candidates region Including line of text position.
- 4. method as claimed in claim 3, it is characterised in that determine the position for the line of text that the text candidates region includes After putting, methods described also includes:By matching algorithm, by the position of line of text that the text candidates region determined includes and the text demarcated in advance This position is matched, it is determined that with the text position matching degree highest line of text demarcated in advance;The matching degree highest line of text is determined by ERROR ALGORITHM, the error between the text position of demarcation, and according to The error, update network parameter.
- 5. the method as described in claim 3 or 4, it is characterised in that the N is arranged to 5.
- A kind of 6. device that text is detected in natural scene image, it is characterised in that including:Acquiring unit, for obtaining natural scene image;Processing unit, for by full convolutional network FCN models, carrying out convolution algorithm to the natural scene image, obtaining institute The convolution feature of natural scene image is stated, according to the convolution feature of the natural scene image, determines the natural scene image The text candidates regional sequence included, wherein, each text candidates region in the text candidates regional sequence is at least wrapped Include a line of text, the single file text that text candidates region described in the text behavior includes, for the text candidates area Each text candidates region in the sequence of domain, perform:By area-of-interest pond layer roi-pooling, the text is extracted The convolution feature of candidate region, and by eigentransformation, by the convolution feature in the text candidates region, it is converted into fixed dimension K characteristic vector, k are positive integer, according to the characteristic vector of time Recursive Networks model and the fixed dimension k, it is determined that described The position for the line of text that text candidates region includes.
- 7. device as claimed in claim 6, it is characterised in that the processing unit is in the volume according to the natural scene image Product feature, when determining the text candidates regional sequence that the natural scene image includes, is specifically used for:By being merged to the convolution feature of the natural scene image, it is determined that characterizing described natural scene image Chinese one's own department or unit The convolution feature put;Using the convolution feature for characterizing the natural scene image Chinese version position, the natural scene image is mapped, And the non-textual position in text position and the natural scene image in the natural scene image is marked by classification function Put;At least one region of text position will be noted as in the natural scene image, be defined as the text candidates region Sequence.
- 8. device as claimed in claim 7, it is characterised in that the time Recursive Networks model is remembered including N layers shot and long term LSTM, wherein, N is set greater than the positive integer equal to maximum text number of lines, and the maximum text number of lines is the text Text number of lines in the most text candidates region of text number of lines that candidate region sequence includes;The processing unit determines the text candidates area according to the characteristic vector of time Recursive Networks model and fixed dimension k During the position for the line of text that domain includes, it is specifically used for:By the characteristic vector of the fixed dimension k, the time frame as the N layers LSTM inputs, and gradually inputs the time and passs Return the LSTM that network model includes, wherein, the characteristic vector of the fixed dimension k is only inputted into the time Recursive Networks first First layer LSTM in model, afterwards each time by the preceding layer LSTM results exported and the feature of the fixed dimension k to Amount next layer of LSTM of input, characteristic vector and the text position demarcated in advance using the fixed dimension k, to the time Recursive Networks model is trained, and obtains line of text candidate frame;The upper and lower of the line of text candidate frame, left and right edges are returned, detected and connected, determine the line of text candidate frame Angle of inclination;According to the angle of inclination of the line of text candidate frame and the line of text candidate frame, determine in the text candidates region Including line of text position.
- 9. device as claimed in claim 8, it is characterised in that the processing unit, be additionally operable to:It is determined that after the position for the line of text that the text candidates region includes, by matching algorithm, the institute that will be determined The position for stating the line of text that text candidates region includes is matched with the text position demarcated in advance, it is determined that with it is described in advance The text position matching degree highest line of text of demarcation;The matching degree highest line of text is determined by ERROR ALGORITHM, the error between the text position of demarcation, and according to The error, update network parameter.
- 10. device as claimed in claim 8 or 9, it is characterised in that the N is arranged to 5.
- 11. a kind of equipment, it is characterised in that including being detected in natural scene image described in any one of claim 6 to 10 The device of text.
- 12. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium storage has computer to refer to Order, when the instruction is run on computers so that computer performs the method as described in claim 1-5 is any.
- 13. a kind of computer program product, it is characterised in that the computer program product by computer when being called so that Computer performs the method as described in claim 1-5 is any.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710642311.1A CN107545262B (en) | 2017-07-31 | 2017-07-31 | Method and device for detecting text in natural scene image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710642311.1A CN107545262B (en) | 2017-07-31 | 2017-07-31 | Method and device for detecting text in natural scene image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107545262A true CN107545262A (en) | 2018-01-05 |
CN107545262B CN107545262B (en) | 2020-11-06 |
Family
ID=60970281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710642311.1A Active CN107545262B (en) | 2017-07-31 | 2017-07-31 | Method and device for detecting text in natural scene image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107545262B (en) |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564035A (en) * | 2018-04-13 | 2018-09-21 | 杭州睿琪软件有限公司 | The method and system for the information recorded on identification document |
CN108595409A (en) * | 2018-03-16 | 2018-09-28 | 上海大学 | A kind of requirement documents based on neural network and service document matches method |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN109299274A (en) * | 2018-11-07 | 2019-02-01 | 南京大学 | A kind of natural scene Method for text detection based on full convolutional neural networks |
CN109783094A (en) * | 2018-12-15 | 2019-05-21 | 深圳壹账通智能科技有限公司 | Front end page generation method, device, computer equipment and storage medium |
CN109886264A (en) * | 2019-01-08 | 2019-06-14 | 深圳禾思众成科技有限公司 | A kind of character detecting method, equipment and computer readable storage medium |
CN110119681A (en) * | 2019-04-04 | 2019-08-13 | 平安科技(深圳)有限公司 | A kind of line of text extracting method and device, electronic equipment |
CN110210490A (en) * | 2018-02-28 | 2019-09-06 | 深圳市腾讯计算机系统有限公司 | Image processing method, device, computer equipment and storage medium |
CN110619255A (en) * | 2018-06-19 | 2019-12-27 | 杭州海康威视数字技术股份有限公司 | Target detection method and device |
CN110634159A (en) * | 2018-06-21 | 2019-12-31 | 北京京东尚科信息技术有限公司 | Target detection method and device |
CN111127593A (en) * | 2018-10-30 | 2020-05-08 | 珠海金山办公软件有限公司 | Document content erasing method and device, electronic equipment and readable storage medium |
CN112149645A (en) * | 2020-11-10 | 2020-12-29 | 西北工业大学 | Human body posture key point identification method based on generation of confrontation learning and graph neural network |
WO2021027283A1 (en) * | 2019-08-12 | 2021-02-18 | 北京国双科技有限公司 | Text information extraction method and apparatus |
WO2021056255A1 (en) * | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150055857A1 (en) * | 2013-08-20 | 2015-02-26 | Adobe Systems Incorporated | Text detection in natural images |
CN105426846A (en) * | 2015-11-20 | 2016-03-23 | 江南大学 | Method for positioning text in scene image based on image segmentation model |
CN105469047A (en) * | 2015-11-23 | 2016-04-06 | 上海交通大学 | Chinese detection method based on unsupervised learning and deep learning network and system thereof |
CN105608456A (en) * | 2015-12-22 | 2016-05-25 | 华中科技大学 | Multi-directional text detection method based on full convolution network |
CN105631426A (en) * | 2015-12-29 | 2016-06-01 | 中国科学院深圳先进技术研究院 | Image text detection method and device |
CN106296682A (en) * | 2016-08-09 | 2017-01-04 | 北京好运到信息科技有限公司 | Method and device for medical image Chinese version region detection |
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
-
2017
- 2017-07-31 CN CN201710642311.1A patent/CN107545262B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150055857A1 (en) * | 2013-08-20 | 2015-02-26 | Adobe Systems Incorporated | Text detection in natural images |
CN105426846A (en) * | 2015-11-20 | 2016-03-23 | 江南大学 | Method for positioning text in scene image based on image segmentation model |
CN105469047A (en) * | 2015-11-23 | 2016-04-06 | 上海交通大学 | Chinese detection method based on unsupervised learning and deep learning network and system thereof |
CN105608456A (en) * | 2015-12-22 | 2016-05-25 | 华中科技大学 | Multi-directional text detection method based on full convolution network |
CN105631426A (en) * | 2015-12-29 | 2016-06-01 | 中国科学院深圳先进技术研究院 | Image text detection method and device |
CN106296682A (en) * | 2016-08-09 | 2017-01-04 | 北京好运到信息科技有限公司 | Method and device for medical image Chinese version region detection |
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
Cited By (96)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
CN110210490A (en) * | 2018-02-28 | 2019-09-06 | 深圳市腾讯计算机系统有限公司 | Image processing method, device, computer equipment and storage medium |
CN108595409A (en) * | 2018-03-16 | 2018-09-28 | 上海大学 | A kind of requirement documents based on neural network and service document matches method |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
CN108564035B (en) * | 2018-04-13 | 2020-09-25 | 杭州睿琪软件有限公司 | Method and system for identifying information recorded on document |
CN108564035A (en) * | 2018-04-13 | 2018-09-21 | 杭州睿琪软件有限公司 | The method and system for the information recorded on identification document |
US10977513B2 (en) | 2018-04-13 | 2021-04-13 | Hangzhou Glorify Software Limited | Method, system and computer readable storage medium for identifying information carried on sheet |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
CN110619255A (en) * | 2018-06-19 | 2019-12-27 | 杭州海康威视数字技术股份有限公司 | Target detection method and device |
CN110619255B (en) * | 2018-06-19 | 2022-08-26 | 杭州海康威视数字技术股份有限公司 | Target detection method and device |
CN110634159A (en) * | 2018-06-21 | 2019-12-31 | 北京京东尚科信息技术有限公司 | Target detection method and device |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
CN111127593B (en) * | 2018-10-30 | 2023-10-31 | 珠海金山办公软件有限公司 | Document content erasing method and device, electronic equipment and readable storage medium |
CN111127593A (en) * | 2018-10-30 | 2020-05-08 | 珠海金山办公软件有限公司 | Document content erasing method and device, electronic equipment and readable storage medium |
CN109299274A (en) * | 2018-11-07 | 2019-02-01 | 南京大学 | A kind of natural scene Method for text detection based on full convolutional neural networks |
CN109783094A (en) * | 2018-12-15 | 2019-05-21 | 深圳壹账通智能科技有限公司 | Front end page generation method, device, computer equipment and storage medium |
CN109886264A (en) * | 2019-01-08 | 2019-06-14 | 深圳禾思众成科技有限公司 | A kind of character detecting method, equipment and computer readable storage medium |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
CN110119681B (en) * | 2019-04-04 | 2023-11-24 | 平安科技(深圳)有限公司 | Text line extraction method and device and electronic equipment |
CN110119681A (en) * | 2019-04-04 | 2019-08-13 | 平安科技(深圳)有限公司 | A kind of line of text extracting method and device, electronic equipment |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
CN112395422A (en) * | 2019-08-12 | 2021-02-23 | 北京国双科技有限公司 | Text information extraction method and device |
WO2021027283A1 (en) * | 2019-08-12 | 2021-02-18 | 北京国双科技有限公司 | Text information extraction method and apparatus |
WO2021056255A1 (en) * | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
CN112149645A (en) * | 2020-11-10 | 2020-12-29 | 西北工业大学 | Human body posture key point identification method based on generation of confrontation learning and graph neural network |
Also Published As
Publication number | Publication date |
---|---|
CN107545262B (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107545262A (en) | A kind of method and device that text is detected in natural scene image | |
CN107704857B (en) | End-to-end lightweight license plate recognition method and device | |
CN108876792B (en) | Semantic segmentation method, device and system and storage medium | |
CN112434721B (en) | Image classification method, system, storage medium and terminal based on small sample learning | |
CN109447990B (en) | Image semantic segmentation method and device, electronic equipment and computer readable medium | |
CN112052787B (en) | Target detection method and device based on artificial intelligence and electronic equipment | |
CN106097353B (en) | Method for segmenting objects and device, computing device based on the fusion of multi-level regional area | |
JP2020533654A (en) | Holographic anti-counterfeit code inspection method and equipment | |
CN106462572A (en) | Techniques for distributed optical character recognition and distributed machine language translation | |
CN110349138B (en) | Target object detection method and device based on example segmentation framework | |
CN111127468A (en) | Road crack detection method and device | |
CN107958230A (en) | Facial expression recognizing method and device | |
CN111353580B (en) | Training method of target detection network, electronic equipment and storage medium | |
CN109165654B (en) | Training method of target positioning model and target positioning method and device | |
JP2023501820A (en) | Face parsing methods and related devices | |
CN113901972A (en) | Method, device and equipment for detecting remote sensing image building and storage medium | |
CN111862124A (en) | Image processing method, device, equipment and computer readable storage medium | |
CN112801888A (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN116682130A (en) | Method, device and equipment for extracting icon information and readable storage medium | |
CN112749576A (en) | Image recognition method and device, computing equipment and computer storage medium | |
CN112418345A (en) | Method and device for quickly identifying fine-grained small target | |
CN111476291A (en) | Data processing method, device and storage medium | |
CN115909347A (en) | Instrument reading identification method, device, equipment and medium | |
CN112241736A (en) | Text detection method and device | |
CN111127327B (en) | Picture inclination detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |