CN109271967A - The recognition methods of text and device, electronic equipment, storage medium in image - Google Patents
The recognition methods of text and device, electronic equipment, storage medium in image Download PDFInfo
- Publication number
- CN109271967A CN109271967A CN201811202558.2A CN201811202558A CN109271967A CN 109271967 A CN109271967 A CN 109271967A CN 201811202558 A CN201811202558 A CN 201811202558A CN 109271967 A CN109271967 A CN 109271967A
- Authority
- CN
- China
- Prior art keywords
- text
- identification
- layer
- image
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
Present invention discloses a kind of recognition methods of text in image and device, electronic equipment, computer readable storage mediums, the program executes the end-to-end identification of text in image by the network model of multiple-layer stacked, the program includes: that the separable convolution operation in space of image is successively carried out by multilayer mode, space is separated into the convolution Fusion Features of convolution operation extraction to the mapped low layer that is layering, the high-rise phase mapping of low layer and output convolution feature;Global characteristics are obtained from the bottom for executing the separable convolution operation in space;The candidate region detection and the prediction of region screening parameter that text in image is carried out by global characteristics, obtain to correspond to and detect to obtain text filed pond feature;By pond feature back-propagating to the identification branched network network layers of execution character identification operation, by identifying that branched network network layers export the character string of text filed label.The program saves the model training time, improves identification accuracy.
Description
Technical field
The present invention relates to technical field of image processing, in particular to the recognition methods of text and device, electricity in a kind of image
Sub- equipment, computer readable storage medium.
Background technique
In Computer Image Processing field, text identification, which refers to, allows computer automatically to differentiate that the character in image belongs to word
Which of Fu Ku word, character repertoire are established in advance by people, generally comprise most common character in actual life.
The identification of text in image, usually by building two models, a model is used for oneself comprising text at one
Text position is found out in right scene image, is then cut out from image text filed.Another model goes out for identification
Text filed specific character content.Specifically, first obtaining the great amount of samples image comprising kinds of characters as training set, utilize
These sample images carry out the training of character classifier and the training of String localization device respectively.After the completion of training, text is first passed through
This locator oriented from testing image it is text filed, then cut out it is text filed, recycle character classifier identify
Text filed character content.
Above scheme needs to carry out the training of character classifier and the instruction of String localization device respectively using these sample images
Practice, the larger workload of model training, and the identification accuracy of final character, is influenced by two model accuracys rate, by
This limits the promotion of text recognition accuracy in image.
Summary of the invention
In order to solve to need the training for carrying out character classifier respectively and the instruction of String localization device present in the relevant technologies
Practice, the larger workload of model training, the not high problem of identification accuracy, the present invention provides a kind of identifications of text in image
Method.
The present invention provides a kind of recognition methods of text in image, and the method is executed by the network model of multiple-layer stacked
The end-to-end identification of text in image, which comprises
The space that image is successively carried out by multilayer mode separates convolution operation, and the space is separated convolution operation
The convolution Fusion Features of extraction to the mapped low layer that is layering, the low layer sets each other off with the high level for exporting the convolution feature
It penetrates;
Global characteristics are obtained from the bottom for executing the separable convolution operation in space;
The candidate region detection and the prediction of region screening parameter of text in image, acquisition pair are carried out by the global characteristics
It should be in detecting to obtain text filed pond feature;
By the pond feature back-propagating to the identification branched network network layers of execution character identification operation, pass through the identification
Branched network network layers export the character string of the text filed label.
On the other hand, the present invention provides a kind of identification device of text in image, described device passes through multiple-layer stacked
Network model executes the end-to-end identification of text in image, and described device includes:
Spatial convoluted operation module, the space for successively carrying out image by multilayer mode separate convolution operation, will
The space separates convolution Fusion Features that convolution operation is extracted to the mapped low layer that is layering, the low layer and output
The high-rise phase mapping of the convolution feature;
Global characteristics extraction module, for obtaining global characteristics from the bottom for executing the separable convolution operation in space;
Pond feature obtains module, for carrying out candidate region detection and the area of text in image by the global characteristics
Screening parameter prediction in domain obtains to correspond to and detects to obtain text filed pond feature;
Character string output module, for the identification point by the pond feature back-propagating to execution character identification operation
Branch network layer exports the character string of the text filed label by the identification branched network network layers.
On the other hand, the present invention also provides a kind of electronic equipment, the electronic equipment includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to executing the recognition methods for completing text in above-mentioned image.
In addition, the present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage
There is computer program, the computer program can be executed the recognition methods for completing text in above-mentioned image by processor.
The technical solution that the embodiment of the present invention provides can include the following benefits:
Technical solution provided by the invention executes the end-to-end knowledge of text in image by the network model of multiple-layer stacked
, thus need to only not train a network model that the identification of text in image can be realized, without separate training text locator and
Character classifier, the accuracy for reducing the workload of model training, and finally identifying, only by the accurate of network model
Property influence, mutual limitation of the promotion by two models of identification accuracy can be avoided in favor of identifying the raising of accuracy.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited
Invention.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention
Example, and in specification together principle for explaining the present invention.
Fig. 1 is the schematic diagram of related implementation environment according to the present invention;
Fig. 2 is a kind of block diagram of device shown according to an exemplary embodiment;
Fig. 3 is the flow chart of the recognition methods of text in a kind of image shown according to an exemplary embodiment;
Fig. 4 is the network architecture schematic diagram that space separates convolutional network layer;
Fig. 5 is the network architecture schematic diagram of Text region in a kind of image relatively proposed by the present invention;
Fig. 6 is the details flow chart of step 350 in Fig. 3 corresponding embodiment;
Fig. 7 is the schematic illustration that pond layer extracts Pixel-level region screening parameter from global characteristics;
Fig. 8 is the details flow chart of step 353 in Fig. 6 corresponding embodiment;
Fig. 9 is the details flow chart of step 370 in Fig. 3 corresponding embodiment;
Figure 10 is that identification branched network network layers are configuration diagrams;
Figure 11 is the network architecture schematic diagram of the recognition methods of text in image provided by the invention;
Figure 12 is the stream of the recognition methods of text in the image of another embodiment offer on the basis of Fig. 3 corresponding embodiment
Cheng Tu;
Figure 13 is the details flow chart of step 1230 in Figure 12 corresponding embodiment;
Figure 14 is the details flow chart of step 1231 in Figure 13 corresponding embodiment;
Figure 15 is practical application effect schematic diagram of the present invention.
Figure 16 is the block diagram of the identification device of text in a kind of image shown according to an exemplary embodiment;
Figure 17 is the details block diagram that pond feature obtains module in Figure 16 corresponding embodiment;
Figure 18 is the details block diagram that rotary unit is screened in Figure 17 corresponding embodiment.
Specific embodiment
Here will the description is performed on the exemplary embodiment in detail, the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended
The example of device and method being described in detail in claims, some aspects of the invention are consistent.
Fig. 1 is the schematic diagram of related implementation environment according to the present invention.The implementation environment includes: user equipment 110, is used
Family equipment 110 can carry out the identification of text in image by operation application program.User equipment can be server, desktop
Brain, mobile terminal, intelligent appliance etc..
User equipment 110 can have the image capture devices such as camera 111, and then use method pair provided by the invention
The image that image capture device 111 acquires carries out text identification.
As needed, which can also include server 130, server 130 and use in addition to user equipment 110
It being connected between family equipment 110 by wired or wireless network, images to be recognized is sent to user equipment 110 by server 130, into
And the identification of text in image is carried out by user equipment 110.
In practical applications, the content of text identified from image can be translated with further progress text, in text
Hold editor, storage etc..The recognition methods of text can be applied to the text identification under any scene times in image provided by the invention
Business realizes text content understanding in image, such as natural scene text picture, advertising pictures, video, identity card, driver's license, name
Text region in piece, license plate.
Fig. 2 is a kind of block diagram of device 200 shown according to an exemplary embodiment.For example, device 200 can be Fig. 1
User equipment 110 in shown implementation environment.
Referring to Fig. 2, device 200 may include following one or more components: processing component 202, memory 204, power supply
Component 206, multimedia component 208, audio component 210, sensor module 214 and communication component 216.
The integrated operation of the usual control device 200 of processing component 202, such as with display, telephone call, data communication, phase
Machine operation and the associated operation of record operation etc..Processing component 202 may include one or more processors 218 to execute
Instruction, to complete all or part of the steps of following methods.In addition, processing component 202 may include one or more modules,
Convenient for the interaction between processing component 202 and other assemblies.For example, processing component 202 may include multi-media module, with convenient
Interaction between multimedia component 208 and processing component 202.
Memory 204 is configured as storing various types of data to support the operation in device 200.These data are shown
Example includes the instruction of any application or method for operating on the device 200.Memory 204 can be by any kind of
Volatibility or non-volatile memory device or their combination are realized, such as static random access memory (Static Random
Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable
Programmable Read-Only Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable
Programmable Read Only Memory, abbreviation EPROM), programmable read only memory (Programmable Red-
Only Memory, abbreviation PROM), read-only memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash
Device, disk or CD.One or more modules are also stored in memory 204, which is configured to by this
One or more processors 218 execute, complete in any shown method of following Fig. 3, Fig. 6, Fig. 8, Fig. 9, Figure 12-Figure 14 to complete
Portion or part steps.
Power supply module 206 provides electric power for the various assemblies of device 200.Power supply module 206 may include power management system
System, one or more power supplys and other with for device 200 generate, manage, and distribute the associated component of electric power.
Multimedia component 208 includes the screen of one output interface of offer between described device 200 and user.One
In a little embodiments, screen may include liquid crystal display (Liquid Crystal Display, abbreviation LCD) and touch panel.
If screen includes touch panel, screen may be implemented as touch screen, to receive input signal from the user.Touch panel
Including one or more touch sensors to sense the gesture on touch, slide, and touch panel.The touch sensor can be with
The boundary of a touch or slide action is not only sensed, but also detects duration associated with the touch or slide operation and pressure
Power.Screen can also include display of organic electroluminescence (Organic Light Emitting Display, abbreviation OLED).
Audio component 210 is configured as output and/or input audio signal.For example, audio component 210 includes a Mike
Wind (Microphone, abbreviation MIC), when device 200 is in operation mode, such as call model, logging mode and speech recognition mould
When formula, microphone is configured as receiving external audio signal.The received audio signal can be further stored in memory
204 or via communication component 216 send.In some embodiments, audio component 210 further includes a loudspeaker, for exporting
Audio signal.
Sensor module 214 includes one or more sensors, and the state for providing various aspects for device 200 is commented
Estimate.For example, sensor module 214 can detecte the state that opens/closes of device 200, the relative positioning of component, sensor group
Part 214 can be with the position change of 200 1 components of detection device 200 or device and the temperature change of device 200.Some
In embodiment, which can also include Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 216 is configured to facilitate the communication of wired or wireless way between device 200 and other equipment.Device
200 can access the wireless network based on communication standard, such as WiFi (WIreless-Fidelity, Wireless Fidelity).Show at one
In example property embodiment, communication component 216 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 216 further includes near-field communication (Near Field
Communication, abbreviation NFC) module, to promote short range communication.For example, radio frequency identification (Radio can be based in NFC module
Frequency Identification, abbreviation RFID) technology, Infrared Data Association (Infrared Data
Association, abbreviation IrDA) technology, ultra wide band (Ultra Wideband, abbreviation UWB) technology, Bluetooth technology and other skills
Art is realized.
In the exemplary embodiment, device 200 can be by one or more application specific integrated circuit (Application
Specific Integrated Circuit, abbreviation ASIC), it is digital signal processor, digital signal processing appts, programmable
Logical device, field programmable gate array, controller, microcontroller, microprocessor or other electronic components are realized, for executing
Following methods.
Fig. 3 is the flow chart of the recognition methods of text in a kind of image shown according to an exemplary embodiment.The image
The scope of application and executing subject of the recognition methods of middle text can be user equipment, which can be real shown in Fig. 1
Apply the user equipment 110 of environment.This method executes the end-to-end identification of text in image by the network model of multiple-layer stacked.Its
In, end-to-end identification refers to that the input of network model is raw image data, and output is last character string.As shown in figure 3,
This method specifically includes following steps.
In the step 310, the space that image is successively carried out by multilayer mode separates convolution operation, can by the space
The convolution Fusion Features of convolution operation extraction are separated to the mapped low layer that is layering, the low layer and the output convolution are special
The high-rise phase mapping of sign.
It should be noted that the network model of multiple-layer stacked may include that space separates convolutional network layer, region returns
Network layer, pond layer, time convolutional network layer, character classification layer.Wherein, space separates convolutional network layer, region returns net
Network layers and pond layer are as detection branches, for extracting pond feature text filed in image according to raw image data, when
Between convolutional network layer and character classification layer as identification branch, for according to text filed pond feature, output to be text filed
Character string.
Specifically, space separates convolution operation, to refer to that space separates convolution (Effnet) layer layer-by-layer by multilayer mode
Convolutional calculation is carried out to image to be identified.Wherein, space separates the high level and low layer that convolutional layer includes phase mapping, it is high-rise and
Low layer is relative concept, precalculated to be known as high level, and what is calculated afterwards is known as low layer.The convolution feature that high-rise convolutional calculation is extracted
It is fused to the mapped low layer that is layering, refers to that the convolutional calculation result of low layer needs to combine high-rise convolutional calculation result.
Because the convolution number of plies is more, the details of loss is more, and the convolution Fusion Features by extracting high level can retain more to low layer
More details, avoids information from losing.
In a step 330, global characteristics are obtained from the bottom for executing the separable convolution operation in space.
Wherein, the bottom refers to that space separates the last output layer of convolutional layer, and space separates convolutional layer and passes through multilayer
Mode successively carries out space to original image to be identified and separates convolution operation, and the eigenmatrix finally exported is known as the overall situation
Feature.Global characteristics can be used for characterizing the characteristic information of original input picture.
Fig. 4 is the network architecture schematic diagram that space separates convolutional layer, as shown in figure 4, original image conduct to be identified
Space separates the input of convolutional layer, later layer-by-layer progress convolutional calculation, the low layer of the Fusion Features that high level extracts to phase mapping,
Global characteristics are exported in the bottom that space separates convolutional layer.Wherein, each parallelogram represents the convolution of every layer of extraction
Feature.
In step 350, the candidate region detection and region screening ginseng of text in image are carried out by the global characteristics
Number prediction obtains to correspond to and detects to obtain text filed pond feature.
Wherein, candidate region detection, which refers to, detects candidate region in image where text according to global characteristics, candidate
Region can be multiple.Screening parameter prediction in region refers to the predicted value that region screening parameter is obtained according to global characteristics, according to
These predicted values can carry out the screening of candidate region, improve detection accuracy text filed in image.Text filed pond
Feature refers to the text filed characteristic of pond layer output, and in one embodiment, text filed pond feature can be with
It is the image data after text filed complanation, complanation, which refers to, text filed rotates to horizontal position for inclined.
Specifically, space separate convolutional network layer output global characteristics can distinguish input area Recurrent networks layer and
Pond layer is detected by the candidate region that region Recurrent networks layer carries out text in image, exports the candidate region of text borders,
Abbreviation frame candidate region.The prediction that convolution transform realizes region screening parameter is carried out to global characteristics by pond layer, in turn
Pond layer screens frame candidate region according to region screening parameter, can detecte out it is text filed in image, in turn
To it is inclined it is text filed rotate, the text filed image data of complanation is obtained, as text filed Chi Huate
Sign.
In step 370, the identification branched network network layers pond feature back-propagating operated to execution character identification,
The character string of the text filed label is exported by the identification branched network network layers.
Wherein, identification branched network network layers are the last several layers of the network model of multiple-layer stacked, for according to text filed
Pond feature, identify it is text filed contained in character.Specifically, identification branched network network layers include the time volume of network model
Product network layer and character classification layer.Specifically, pond layer leads to text filed pond feature propagation to time convolutional network layer
It crosses time convolutional network layer and convolutional calculation is carried out to pond feature, extract character string feature, and then character string feature is passed
Character classification layer is transported to, the probability that each character belongs to each character in dictionary is exported by character classification layer.
As an example it is assumed that including 7439 texts in dictionary, then character classification layer can export each in text filed
Character belongs to the probability of each text in dictionary, then in dictionary the text of maximum probability be exactly the character in text filed identification
As a result, can export the recognition result of each character accordingly, for multiple characters in text filed, obtain text filed mark
The character string of note.
The technical solution that the above exemplary embodiments of the present invention provide is executed in image by the network model of multiple-layer stacked
Thus the end-to-end identification of text need to only train a network model that the identification of text in image can be realized, without separately instruction
Practice String localization device and character classifier, the accuracy for reducing the workload of model training, and finally identifying, only by one
The accuracy of network model influences, and in favor of the raising of identification accuracy, can avoid the promotion of identification accuracy by two
The mutual limitation of model.
Relative to the technical solution that the above exemplary embodiments of the present invention provide, Fig. 5 is a kind of stream of Text region scheme
Cheng Tu.As shown in figure 5, the detection of text and identification are divided into two tasks by the text identifying schemes, after the completion of Detection task
It can be carried out identification mission.Specifically, original image input feature vector is extracted convolutional network first when detection, then by the spy of extraction
Sign is transmitted to region Recurrent networks, and region Recurrent networks can export the frame candidate region of detection, but these regions are also relatively rough
Need to do further frame to return, to improve the accuracy of frame so as to more close to text edge, secondary frame return and
A possibility that classification can provide the coordinate of text border and corresponding confidence level in image, that is, include text.The two predictions
As a result it can be compared with text labeling position in image, then calculate prediction loss by loss function, by this damage
It loses to adjust the parameter of model and update.
When tilting text due to detection, very big blank area is had above the frame candidate region of region Recurrent networks detection
Domain, this can reduce the precision of detection block, therefore the frame candidate region that region Recurrent networks are exported and feature extraction convolution net
The global characteristics that network extracts are input to jointly on rotation interest pool area layer, to obtain the inclination character area of detection.Such as Fig. 5
Shown, inclination character area is come out in original image with text box formal notation, later according to the coordinate of text box from original
Corresponding region is cut out in image, first completes the positioning of text region as a result,.It should be noted that in this stage,
The position error of existing text region.
Later, by the text region image input identification network of cutting, identification network first can be to the region of input
Image does convolution feature extraction, and the convolution feature of extraction is supplied to character classification layer later, is identified by the character classification layer defeated
The character string for entering sequence expression, after character areas all in original image all complete identification, the text of the original image is known
Other task is completed.It should be noted that being also required to calculate the output of character classification layer by another loss function in this stage
Character string and actual characters sequence difference, adjusted by this difference identification network and character classification layer parameter more
Newly.That is, there is also the errors of character recognition in this stage, so final whole identification error, including literal field
The error of domain positioning and the error of character recognition.
It should be noted that, even if the accuracy of character recognition improves, also being limited if character area position error is larger
The raising of whole identification accuracy is made.Region detection and Text region are separately trained to the promotion for being unfavorable for performance, identification
The error that stage generates can not also be communicated to parameter of the detection part to correct detection model, lead to the table on some training sets
It now will receive the bottleneck of detection or recognition performance.Also, separately detection model and identification model is trained to will increase model training
Workload.And the speed that feature extraction convolutional network extracts feature is slower, influences the number of whole system unit time processing task
Mesh, while being also unfavorable for model and being disposed in mobile terminal.
And the present invention realizes the end-to-end identification of text in image by the network model of multiple-layer stacked, i.e., input is original
Image, output are character strings, and the accuracy of final character recognition is only determined by the error of a model, by two tasks from mould
One is synthesized in type, avoids separated trained bring performance bottleneck, is thus conducive to the raising of identification accuracy;And it only trains
The identification of text can be realized in one network model, and the time of model training is greatly saved, and more separately trains two models extremely
Half the time is saved less, and in practice because the parameter setting of two models is different, it counts in adjusting the time of ginseng that can save 4,5
Time again.In addition, the present invention separates convolution operation using the space that the Effnet network architecture carries out image, space can be divided
Convolution Fusion Features from convolution operation extraction both may be implemented to extract rank to global characteristics to the mapped low layer that is layering
The acceleration of section, while making up existing acceleration network structure and realizing the defect for needing to sacrifice model accuracy while acceleration, and
Memory space needed for reducing model running, convenient in mobile terminal application deployment.
In a kind of exemplary embodiment, as shown in fig. 6, the above-mentioned steps 350 specifically include:
In step 351, the global characteristics are inputted to the region Recurrent networks layer for executing candidate region detection, pass through institute
State the frame candidate region of text in region Recurrent networks layer output described image;
It is to be understood that the present invention executes the end-to-end identification of text in image by the network model of multiple-layer stacked,
And region Recurrent networks layer is the wherein several layers of the network model, for detect text may where region, that is, hold
The detection of row candidate region.
Specifically, separate convolutional network layer by the space of the network model extracts global characteristics from original image,
And by global characteristics input area Recurrent networks layer, the frame candidate regions of text in image are exported by region Recurrent networks layer
Domain.Frame candidate region refers to the region that text edges may surround.It, can be defeated by region Recurrent networks layer in the training stage
The frame candidate region of text out, by carrying out secondary frame recurrence and classification, the candidate detected to frame candidate region
Frame and candidate frame confidence level (a possibility that including text) calculate multitask loss according to the position coordinates of actual text frame,
By adjusting the parameter of Local Area Network network layer, loss is made to reach minimum.Wherein, Recurrent networks layer in region can be Faster-
R-CNN (fast target detection convolutional neural networks), the main contributions of Faster-R-CNN, which devise, extracts candidate region
The network architecture, instead of time-consuming selective search, so that detection speed greatly improves.
In step 352, the frame candidate region is inputted to the pond layer for executing region screening and region rotation;
Wherein, layer connection space in pond separates convolutional network layer, for separating the output of convolutional network layer according to space
Global characteristics, to frame candidate region execute region screening and region rotate.Wherein, region screening refers to waits from multiple frames
Accurate text region is filtered out in favored area, region rotation refers to inclined text filed rotation to horizontal position.
It is common to separate the global characteristics that convolutional network layer exports for the frame candidate region and space of region Recurrent networks layer output as a result,
Input pond layer.
In step 353, the picture that screening parameter prediction in region obtains is carried out to the global characteristics according to the pond layer
Plain grade region screening parameter is filtered out described text filed from the frame candidate region and is rotated described text filed to water
Prosposition is set, and the text filed pond feature is obtained.
Wherein, Pixel-level region screening parameter, which refers to, sieves frame candidate region according to what global characteristics were predicted
The parameter of choosing and rotation.Pixel-level region screening parameter may include Pixel-level classification confidence, Pixel-level rotation angle and picture
Plain grade frame distance.The text filed region referred to where text.Pond layer can by a variety of convolution kernels to global characteristics into
Row convolution transform, obtains Pixel-level region screening parameter, and then according to Pixel-level region screening parameter from multiple frame candidate regions
Filtered out in domain it is text filed, and then by it is inclined it is text filed rotation to horizontal position, obtain text filed pond feature.
As shown in fig. 7, global characteristics pass through the transformation of first convolution kernel, output pixel grade classification confidence is that is, original
Each pixel belongs to the probability of text in image.Global characteristics pass through second convolution kernel transformation, output pixel grade frame away from
From Prediction distance of that is, each pixel apart from locating text borders up and down.Global characteristics are by third convolution kernel
Transformation, output pixel grade rotate angle, i.e., the angle for needing to rotate when each pixel rotates to horizontal position.
In a kind of exemplary embodiment, as shown in figure 8, above-mentioned steps 353 specifically include:
In step 3531, obtains the Pixel-level that the pond layer carries out convolutional calculation generation to the global characteristics and classify
Confidence level, the Pixel-level classification confidence refer to that each pixel belongs to text filed probability in described image;
Specifically, pond layer can by size be 1 × 1, step-length be 1 convolution kernel to global characteristics (characteristic image) into
Row convolutional calculation exports the confidence level prediction result that each pixel belongs to text, obtains Pixel-level classification confidence.Confidence level
High pixel indicates that the pixel belongs to that text filed probability is larger, and similarly, the low expression pixel of confidence level belongs to text
The probability of one's respective area is smaller.
In step 3532, according to the friendship of the Pixel-level classification confidence and the frame candidate region and ratio,
It is filtered out from the frame candidate region described text filed;
Wherein, simultaneously ratio refers to the overlap proportion between different frame candidate regions for the friendship of frame candidate region.Due to side
There are noise frame, thus friendship and ratio of the present invention according to Pixel-level classification confidence and frame candidate region in frame candidate region
Example carries out non-maxima suppression to the testing result of frame candidate region, to filter out text area from frame candidate region
The accuracy of text filed detection is improved in domain.
Specifically, the high side of confidence level can be retained according to Pixel-level classification confidence by non-maxima suppression algorithm
Frame candidate region retains unfolded frame candidate region, retains and hands over the simultaneously low frame candidate region of ratio, thus from all
Screening obtains text filed in frame candidate region.
In step 3533, rotated according to the Pixel-level that the pond layer carries out convolutional calculation generation to the global characteristics
The text filed rotation is obtained the text area to horizontal position by interpolation algorithm by angle and Pixel-level frame distance
The pond feature in domain.
It should be noted that pond layer when obtaining Pixel-level classification confidence, can simultaneously roll up global characteristics
Product calculates, and obtains Pixel-level rotation angle and Pixel-level frame distance.Referring to being explained above, Pixel-level rotation angle refers to a picture
The angle for needing to rotate when vegetarian refreshments rotates to horizontal position, Pixel-level frame distance refer to each pixel apart from locating text side
The Prediction distance of frame up and down.Specifically, pond layer can be 1 × 1 by size, the convolution kernel that step-length is 4 is to global special
Sign carries out convolutional calculation, exports the distance of each pixel up and down apart from place text borders.Pond layer can be by big
Small is 1 × 1, and the convolution kernel that step-length is 4 carries out convolutional calculation to global characteristics, and each pixel is rotated to horizontal position by output
When need the angle that rotates.
Pond layer rotates angle according to pixel as a result, and Pixel-level frame distance can be by inclined text filed rotation
Onto horizontal direction, text filed pond feature can be rotate to horizontal direction after text filed image data.
It, will band angle originally specifically, the text filed horizontal position that rotates to that will test needs interpolation by pond layer
The text filed of degree transforms to horizontal position, so as to the identification of identification model.Interpolation needs to determine original point by transformation matrix T
The calculation formula of corresponding relationship between target point, transformation matrix T is as follows:
V_ratio indicates that the height roi_h of transformed text filed mapping is arrived with current point
The ratio of the sum of the distance of the text filed coboundary and lower boundary of prediction;Roi_h is default known quantity.
Wherein, roi_w=v_ratio × (l+r), roi_w indicate the width of transformed text filed mapping.
dx=l × cos πi-t×sinπi- x,
dy=l × cos πi+t×sinπi- y,
Wherein, r, l, t, b are right margin of the current pixel point to text borders of detection branches prediction respectively, left margin,
Coboundary, the distance (i.e. Pixel-level frame distance) of lower boundary, πiIndicate the tilt angle of the current pixel of detection branches prediction
(i.e. Pixel-level rotation angle).(x, y) is coordinate position of the current pixel point in original image.It is assumed that point is Psrc before transformation
(xs, ys), Pdst (x after transformationd, yd), then It can be by the Feature Mapping before transformation by left side equation
Position obtains the position of transformed Feature Mapping multiplied by transformation matrix T, to complete interpolation of coordinate, realizes text filed water
Graduation rotation.
It is transmitted to it is emphasized that being different from existing character recognition method by will test the testing result of model output
Identification model completes the identification of text, and the present invention, which will test, is responsible for optimization eventually as identification as a study branch of model
The characteristic pattern (i.e. pond feature) of branch's input, realizing in the same model will test result (what is detected is text filed)
By the mode conversion of numerical sample at the characteristic pattern directly used for identification branch, while realizing detection and identification mission
Learning training.
In a kind of exemplary embodiment, the identification branched network network layers in above-mentioned steps 370 include time convolutional network layer
With character classification layer, as shown in figure 9, above-mentioned steps 370 specifically include:
In step 371, the pond feature back-propagating to the time convolutional network layer is subjected to character feature
It extracts;
Wherein, back-propagating refers to that the pond feature by the output of pond layer is transmitted to time convolutional network layer, passes through the time
Convolutional network layer carries out convolution transform to pond feature, extracts character string feature.CTC is used different from existing
(Connectionist temporal classification, timing class classification neural network based) or Attention
(attention) network structure, the present invention use TCN (time convolutional network) as a part of identification branched network network layers, which has
Following advantage: since large-scale parallel can be carried out in TCN so the training of network and testing time all greatly reduce;Due to
TCN can neatly adjust receptive field size by determining to stack how many convolutional layer, thus the length of more preferable explicit Controlling model
Short-term memory length, and CTC or Attention identification model due to can not cycle-index inside prediction model, and then without Faxian
Control the length of shot and long term memory likes;The direction of propagation of TCN and the time orientation of list entries are different, so as to avoid RNN
The gradient explosion or disappearance problem that model training often occurs;The memory of TCN consumption is lower, shows more on long list entries
Add obviously, reduces the application deployment expense of model.
In step 372, extracted character feature is inputted into the character classification layer, it is defeated by the character classification layer
The character string of the text filed label out.
Wherein, character feature is exactly character string feature, and the character string feature of extraction is inputted character classification layer, can be with
The probability that each character in text filed belongs to each character in dictionary is exported, finds out the character of maximum probability in dictionary, as
The recognition result of the character in text filed, the thus character string of text filed middle label.
Figure 10 is that identification branched network network layers are configuration diagrams, and as shown in Figure 10, the pond feature of pond layer output passes through 4
Secondary time convolution operation, the input of each convolutional layer are converted by empty cause and effect convolution, weight normalization, activation primitive respectively,
The output of current convolutional layer is obtained after random drop.Wherein, the filter size k of first time convolution operation be 3, convolution kernel it is swollen
Swollen factor d is 1, and the filter size k of second of convolution operation is 3, and the expansion factor d of convolution kernel is 1, third time convolution operation
Filter size k be 3, the expansion factor d of convolution kernel is 2, and the filter size k of the 4th convolution operation is 1, convolution kernel
Expansion factor d is 4.Later, character string feature, i.e., the spy of each character are extracted by two-way LSTM (shot and long term memory network)
Sign.Two-way LSTM is that it can utilize the letter in last time and future time instance both direction simultaneously better than unidirectional LSTM's
Breath, so that final prediction is more accurate.The output result of two-way LSTM can be 512 feature vector, Zhi Houtong
The CTC decoder for crossing character classification layer carries out the classification of 7439 classifications to output feature.Wherein 7439 classifications indicate to deposit in dictionary
In 7439 characters, so as to which tagsort will be exported to one of them in 7439 characters.
Figure 11 is the network model configuration diagram of text identification in image provided by the invention, as shown in figure 11, original
The image input space first separates convolutional network layer, separates convolutional network layer by space and extracts the overall situation from original image
Global characteristics are distinguished input area Recurrent networks layer and pond layer later by feature, and region Recurrent networks layer is according to global characteristics
Detect frame candidate region.Wherein, in the training stage, it can be returned by secondary frame and frame is classified, obtain detection
Candidate frame and candidate frame confidence level calculate multitask loss, the ginseng of adjustment region Recurrent networks layer according to the position of text borders
Number is preferably minimized multitask loss.The frame candidate region of region Recurrent networks layer output inputs pond layer, and pond layer can be with
The global characteristics of convolutional network layer input and the frame candidate region of region Recurrent networks layer input are separated according to space, are carried out
The screening and complanation of frame candidate region obtain the text filed feature of complanation, i.e. pond feature.In turn by complanation
Text filed feature input time convolutional network layer extracts character string feature, and character string feature inputs character classifier,
Export the character identification result of text in image.
In a kind of exemplary embodiment, as shown in figure 12, method provided by the invention further include:
In step 1210, the sample graph image set of having text information recorded thereon on image, the content of the text information are obtained
It is known;
Wherein, sample graph image set includes great amount of images sample, and text information, and these texts are labeled on these image patterns
Known to the particular content of this information.Sample graph image set can store in the local storage medium of user equipment 110, can also deposit
Storage is in server 130.
In step 1230, the training of the network model is carried out, using the sample graph image set by adjusting the net
The parameter of network model makes the difference between the character string and corresponding text information of each sample image of the network model output
Different minimum.
Specifically, sample graph image set can be used as training set, the training present invention carries out net needed for text identification in image
Network model.Specifically, can be using sample graph image set as the input of the network model, according to the output of the network model, adjustment
The parameter of network model makes the character string recognition result and known text information of the sample graph image set of the output of network model
Between difference it is minimum.For example, the phase between calculating character recognition sequence result and known text information can be passed through
Like degree, keep similarity maximum.
In a kind of exemplary embodiment, as shown in figure 13, above-mentioned steps 1230 are specifically included:
In step 1231, the error and execution character knowledge that text filed detection generates are carried out according to the network model
The error for not operating generation obtains the text identification error of the network model;
Wherein, network model is divided into text filed detection and character recognition operates two tasks.The text of network model is known
Other error refers to the identification error of the network model general frame.The error can be the error and word that text filed detection generates
The sum of the error that symbol identification generates.Wherein, before the error that text filed detection generates can be output pool feature, detection text
Error existing for one's respective area, and after the error that character recognition operation generates can be output pool feature, in text filed
Character carry out Classification and Identification generation error.
In step 1232, according to the text identification error, the network model is adjusted by back-propagating and carries out institute
The network layer parameter of text filed detection and the network layer parameter of execution character identification operation are stated, makes the text identification error most
It is small.
Back-propagating refers to according to recognition result below the parameter for adjusting preceding networks model.Specifically, according to network
The identification error of model general frame, i.e., the error of last output character sequence, to adjust the text filed Detection task in front
Network layer parameter and the network layer parameter of execution character identification operation, make last output character sequence and true character string it
Between error it is minimum.Cognitive phase, which generates error, as a result, can be communicated to parameter of the detection part to correct detection-phase.
In a kind of exemplary embodiment, as shown in figure 14, above-mentioned steps 1231 are specifically included:
In step 1401, error, the Pixel-level frame that Pixel-level classification prediction generates are carried out according to the network model
The error that error and Pixel-level the rotation angle prediction that range prediction generates generate, determines that the network model carries out text area
The error that domain detection generates;
Wherein, the error that Pixel-level classification prediction generates refers to that Pixel-level classification confidence and actual pixels point belong to text
Error between the classification results in region.The error that Pixel-level frame early warning and alert generates refers to each pixel apart from place text
This frame up and down between Prediction distance and actual range between error, Pixel-level rotation angle prediction be pixel rotation
Go to the error between the prediction rotation angle of horizontal position and practical rotation angle.
Specifically, network model, which carries out the error that text filed detection generates, is expressed as LDetection,
LDetection=Lcls+αLgeo_reg
LDetectionIt is the total loss function of detection branches (text filed detection), LclsIt is that Pixel-level is classified in detection branches
The loss function of confidence level, that is, the error that Pixel-level classification prediction generates, Lgeo_regIt is the loss of Pixel-level frame distance
Function (apart from the distance of place frame up and down), that is, each pixel apart from place text borders up and down between
Prediction distance and actual range between error alpha be LgeoregRatio in total detection branches loss.
Wherein,N is a confidence level
Positive value element number in prediction matrix is mapped,Current pixel whether the mark (value be 0 or 1) of text, uiCurrent pixel
Whether the predicted value (value be 0 or 1) of text.
Wherein,N is one and sets
Reliability maps positive value element number, π in prediction matrixiIndicate that the Pixel-level of prediction rotates angle,Indicate the Pixel-level of mark
Angle is rotated, β indicates that angle loss accounts for Lgeo_regIn ratio.Indicate four geometric senses of the frame of prediction
(frontier distance up and down away from place text box) BiWith four geometric senses (side up and down away from place text box of mark
Boundary's distance)Between IOU loss, IOU loss function is defined as follows:
Indicate the intersection of two textboxs,Table
Show the union of two textboxs.
In step 1402, the network model is subjected to the error and execution character identification behaviour that text filed detection generates
The error for making to generate is weighted addition, obtains the text identification error of the network model.
Specifically, the loss function of whole network model, i.e. the text identification error of network model is expressed as follows:
Ltotal=LDetection+εrecognitionLrecogtion
LDetectionFor the loss that detection branches generate, LrecogtionFor the loss that identification branch generates, that is, execution word
The error that symbol identification operation generates, εrecognitionTo identify that the loss of branch accounts for obtain ratio in the total loss of model, with this
To control identification branch to the percentage contribution of entire model optimization.The loss that detection branches generate has calculated in step 1401
It arrives, the loss that identification branch generates is expressed as follows:
R is to want identification region number,It is the identification in the region
Mark, the input that ρ is currently identified,Calculation formula it is as follows:
c*It is character level annotated sequence, c*={ c0..., cL-1, L is
The length of annotated sequence, L≤7439,7439 be character number in dictionary, and the only character present in dictionary could be known
Not.
It should be noted that the loss function that frame returns uses IOU (Intersection in Detection task
Over Union is handed over and is compared) loss function, which loses compared with L2 following advantage: by four coordinates of frame as one
Entirety, which carries out study optimization, reduces the training difficulty of model, the pace of learning of Detection accuracy and model can be improved, simultaneously
It is also strengthened to the diversity adaptability of sample.
Scheme provided by the invention can support web api (web application interface) service call and mobile terminal
Deployment by using technical solution provided by the invention, Direct Recognition can go out specific text from original image as shown in figure 15
Word content is exported.
Following is apparatus of the present invention embodiment, can be used for executing in the image that the above-mentioned user equipment 110 of the present invention executes
The recognition methods embodiment of text.For undisclosed details in apparatus of the present invention embodiment, image Chinese of the present invention is please referred to
This recognition methods embodiment.
Figure 16 is the block diagram of the identification device of text in a kind of image shown according to an exemplary embodiment, in the image
The identification device of text can be used in the user equipment 110 of implementation environment shown in Fig. 1, execute Fig. 3, Fig. 6, Fig. 8, Fig. 9, figure
The all or part of step of the recognition methods of text in image shown in 12- Figure 14 is any.The device passes through multiple-layer stacked
Network model executes the end-to-end identification of text in image, and as shown in figure 16, which includes but is not limited to: spatial convoluted operation
Module 1610, Global characteristics extraction module 1630, pond feature obtain module 1650 and character string output module 1670.
Spatial convoluted operation module 1610, the space for successively carrying out image by multilayer mode separate convolution behaviour
Make, the convolution Fusion Features that the separable convolution operation in the space is extracted to the mapped low layer that is layering, the low layer
With the high-rise phase mapping for exporting the convolution feature;
Global characteristics extraction module 1630, it is global special for being obtained from the bottom for executing the separable convolution operation in space
Sign;
Pond feature obtains module 1650, for carrying out the candidate region detection of text in image by the global characteristics
It is predicted with region screening parameter, obtains to correspond to and detect to obtain text filed pond feature;
Character string output module 1670, for the knowledge by the pond feature back-propagating to execution character identification operation
Other branched network network layers export the character string of the text filed label by the identification branched network network layers.
The function of modules and the realization process of effect are specifically detailed in the identification of text in above-mentioned image in above-mentioned apparatus
The realization process of step is corresponded in method, details are not described herein.
Spatial convoluted operation module 1610 such as can be some physical structure processor 218 in Fig. 2.
Global characteristics extraction module 1630, pond feature obtain module 1650 and character string output module 1670 can also
To be functional module, for executing the correspondence step in above-mentioned image in the recognition methods of text.It is appreciated that these modules can
With by hardware, software, or a combination of both realize.When realizing in hardware, these modules may be embodied as one or
Multiple hardware modules, such as one or more specific integrated circuits.When being realized with software mode, these modules be may be embodied as
The one or more computer programs executed on the one or more processors, such as storage performed by the processor 218 of Fig. 2
Program in memory 204.
Optionally, as shown in figure 17, the pond feature acquisition module 1650 includes but is not limited to:
Candidate region output unit 1651, for the global characteristics to be inputted to the region recurrence for executing candidate region detection
Network layer exports the frame candidate region of text in described image by the region Recurrent networks layer;
Pond input unit 1652, for the frame candidate region to be inputted to the pond for executing region screening and region rotation
Change layer;
Rotary unit 1653 is screened, for carrying out the prediction of region screening parameter to the global characteristics according to the pond layer
The Pixel-level region screening parameter of acquisition filters out described text filed from the frame candidate region and rotates the text
Region to horizontal position obtains the text filed pond feature.
Optionally, as shown in figure 18, the screening rotary unit 1653 includes but is not limited to:
Confidence level obtains subelement 1801, carries out convolutional calculation generation to the global characteristics for obtaining the pond layer
Pixel-level classification confidence, it is text filed general that the Pixel-level classification confidence refers to that each pixel in described image belongs to
Rate;
Subelement 1802 is screened in candidate region, for according to the Pixel-level classification confidence and the frame candidate regions
The friendship in domain and ratio, filter out described text filed from the frame candidate region;
Text filed rotation subelement 1803, it is raw for carrying out convolutional calculation to the global characteristics according to the pond layer
At Pixel-level rotation angle and Pixel-level frame distance, by interpolation algorithm will it is described it is text filed rotate to horizontal position,
Obtain the text filed pond feature.
Optionally, the identification branched network network layers include time convolutional network layer and character classification layer, the character string
Output module 1670 includes but is not limited to:
Character feature extraction unit, for the pond feature back-propagating to the time convolutional network layer to be carried out word
Accord with the extraction of feature;
Character classification unit passes through the character point for extracted character feature to be inputted the character classification layer
Class layer exports the character string of the text filed label.
Optionally, described device further includes but is not limited to:
Sample set obtains module, for obtaining the sample graph image set of having text information recorded thereon on image, the text information
Content known to;
Model training module, for carrying out the training of the network model using the sample graph image set, by adjusting institute
The parameter for stating network model, between the character string and corresponding text information of each sample image for exporting the network model
Difference it is minimum.
Optionally, the model training module includes but is not limited to:
Model error obtaining unit, for carrying out the error and hold that text filed detection generates according to the network model
The error that line character identification operation generates, obtains the text identification error of the network model;
Model parameter adjustment unit, for adjusting the network mould by back-propagating according to the text identification error
Type carries out the network layer parameter of the text filed detection and the network layer parameter of execution character identification operation, knows the text
Other error is minimum.
Optionally, the model error obtaining unit includes but is not limited to:
Detection error determines subelement, for according to the network model carry out Pixel-level classification prediction generate error,
The error that error and Pixel-level the rotation angle prediction that Pixel-level frame range prediction generates generate, determines the network model
Carry out the error that text filed detection generates;
Error merges subelement, for the network model to be carried out the error and execution character that text filed detection generates
The error that identification operation generates is weighted addition, obtains the text identification error of the network model.
Optionally, the present invention also provides a kind of electronic equipment, which can be used for the use of implementation environment shown in Fig. 1
In family equipment 110, execute Fig. 3, Fig. 6, Fig. 8, Fig. 9, Figure 12-Figure 14 it is any shown in image the recognition methods of text whole
Or part steps.The electronic equipment includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to executing the identification side of text in image described in the above exemplary embodiments
Method.
The processor of electronic equipment executes the concrete mode operated the text in the related image in the embodiment
Detailed description is performed in the embodiment of recognition methods, no detailed explanation will be given here.
In the exemplary embodiment, a kind of storage medium is additionally provided, which is computer readable storage medium,
It such as can be the provisional and non-transitorycomputer readable storage medium for including instruction.The storage medium is stored with computer
Program, the computer program can be executed by the processor 218 of device 200 to complete the recognition methods of text in above-mentioned image.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and change can executed without departing from the scope.The scope of the present invention is limited only by the attached claims.
Claims (15)
1. the recognition methods of text in a kind of image, which is characterized in that the method is executed by the network model of multiple-layer stacked
The end-to-end identification of text in image, which comprises
The space that image is successively carried out by multilayer mode separates convolution operation, and the space is separated convolution operation and is extracted
Convolution Fusion Features to the mapped low layer that is layering, the low layer and the high-rise phase mapping for exporting the convolution feature;
Global characteristics are obtained from the bottom for executing the separable convolution operation in space;
The candidate region detection and the prediction of region screening parameter that text in image is carried out by the global characteristics, are corresponded to
It detects to obtain text filed pond feature;
By the pond feature back-propagating to the identification branched network network layers of execution character identification operation, pass through the identification branch
Network layer exports the character string of the text filed label.
2. the method according to claim 1, wherein described carry out text in image by the global characteristics
Candidate region detection and the prediction of region screening parameter obtain to correspond to and detect to obtain text filed pond feature, comprising:
The global characteristics are inputted to the region Recurrent networks layer for executing candidate region detection, pass through the region Recurrent networks layer
Export the frame candidate region of text in described image;
The frame candidate region is inputted to the pond layer for executing region screening and region rotation;
The Pixel-level region screening parameter that screening parameter prediction in region obtains is carried out to the global characteristics according to the pond layer,
It is filtered out from the frame candidate region described text filed and rotates described text filed to horizontal position, obtain the text
The pond feature of one's respective area.
3. according to the method described in claim 2, it is characterized in that, described carry out the global characteristics according to the pond layer
The Pixel-level region screening parameter that screening parameter prediction in region obtains, filters out the text area from the frame candidate region
Domain simultaneously rotates described text filed to horizontal position, the acquisition text filed pond feature, comprising:
Obtain the Pixel-level classification confidence that the pond layer carries out convolutional calculation generation to the global characteristics, the Pixel-level
Classification confidence refers to that each pixel belongs to text filed probability in described image;
According to the friendship of the Pixel-level classification confidence and the frame candidate region and ratio, from the frame candidate region
In filter out it is described text filed;
According to the pond layer global characteristics are carried out with the Pixel-level rotation angle and Pixel-level frame of convolutional calculation generation
The text filed rotation is obtained the text filed pond feature to horizontal position by interpolation algorithm by distance.
4. the method according to claim 1, wherein the identification branched network network layers include time convolutional network layer
With character classification layer, the identification branched network network layers that the pond feature back-propagating is operated to execution character identification are led to
Cross the character string that the identification branched network network layers export the text filed label, comprising:
The pond feature back-propagating to the time convolutional network layer is carried out to the extraction of character feature;
Extracted character feature is inputted into the character classification layer, the text filed mark is exported by the character classification layer
The character string of note.
5. the method according to claim 1, wherein further include:
The sample graph image set for obtaining having text information recorded thereon on image, known to the content of the text information;
The training of the network model is carried out using the sample graph image set, by adjusting the parameter of the network model, makes institute
The difference stated between the character string and corresponding text information of each sample image of network model output is minimum.
6. according to the method described in claim 5, it is characterized in that, described carry out the network mould using the sample graph image set
The training of type makes the character sequence of each sample image of the network model output by adjusting the parameter of the network model
Column are minimum with corresponding text information difference, comprising:
The error that text filed detection generates and the error that execution character identification operation generates are carried out according to the network model,
Obtain the text identification error of the network model;
According to the text identification error, the net that the network model carries out the text filed detection is adjusted by back-propagating
The network layer parameter of network layers parameter and execution character identification operation, keeps the text identification error minimum.
7. according to the method described in claim 6, it is characterized in that, described carry out text filed detection according to the network model
The error that error and execution character the identification operation of generation generate, obtains the text identification error of the network model, comprising:
Error, the error of Pixel-level frame range prediction generation that Pixel-level classification prediction generates are carried out according to the network model
And the error that Pixel-level rotation angle prediction generates, determine that the network model carries out the error that text filed detection generates;
The network model is subjected to the error that text filed detection generates and the error that execution character identification operation generates carries out
Weighting summation obtains the text identification error of the network model.
8. the identification device of text in a kind of image, which is characterized in that described device is executed by the network model of multiple-layer stacked
The end-to-end identification of text, described device include: in image
Spatial convoluted operation module, the space for successively carrying out image by multilayer mode separate convolution operation, will be described
Space separates convolution Fusion Features that convolution operation is extracted to the mapped low layer that is layering, described in the low layer and output
The high-rise phase mapping of convolution feature;
Global characteristics extraction module, for obtaining global characteristics from the bottom for executing the separable convolution operation in space;
Pond feature obtains module, for carrying out candidate region detection and the region sieve of text in image by the global characteristics
Parameter prediction is selected, obtains to correspond to and detects to obtain text filed pond feature;
Character string output module, for the identification branched network by the pond feature back-propagating to execution character identification operation
Network layers export the character string of the text filed label by the identification branched network network layers.
9. device according to claim 8, which is characterized in that the pond feature obtains module and includes:
Candidate region output unit, for the global characteristics to be inputted to the region Recurrent networks layer for executing candidate region detection,
The frame candidate region of text in described image is exported by the region Recurrent networks layer;
Pond input unit, for the frame candidate region to be inputted to the pond layer for executing region screening and region rotation;
Rotary unit is screened, for carrying out the picture that screening parameter prediction in region obtains to the global characteristics according to the pond layer
Plain grade region screening parameter is filtered out described text filed from the frame candidate region and is rotated described text filed to water
Prosposition is set, and the text filed pond feature is obtained.
10. device according to claim 9, which is characterized in that the screening rotary unit includes:
Confidence level obtains subelement, and for obtaining the pond layer global characteristics are carried out with the Pixel-level of convolutional calculation generation
Classification confidence, the Pixel-level classification confidence refer to that each pixel belongs to text filed probability in described image;
Candidate region screen subelement, for according to the friendship of the Pixel-level classification confidence and the frame candidate region simultaneously
Ratio filters out described text filed from the frame candidate region;
Text filed rotation subelement, for according to the pond layer global characteristics to be carried out with the pixel of convolutional calculation generation
Grade rotation angle and Pixel-level frame distance obtain described by interpolation algorithm by the text filed rotation to horizontal position
Text filed pond feature.
11. device according to claim 8, which is characterized in that the identification branched network network layers include time convolutional network
Layer and character classification layer, the character string output module include:
Character feature extraction unit, it is special for the pond feature back-propagating to the time convolutional network layer to be carried out character
The extraction of sign;
Character classification unit passes through the character classification layer for extracted character feature to be inputted the character classification layer
Export the character string of the text filed label.
12. device according to claim 8, which is characterized in that described device further include:
Sample set obtains module, for obtaining the sample graph image set of having text information recorded thereon on image, the text information it is interior
Known to appearance;
Model training module, for carrying out the training of the network model using the sample graph image set, by adjusting the net
The parameter of network model makes the difference between the character string and corresponding text information of each sample image of the network model output
Different minimum.
13. device according to claim 12, which is characterized in that it is described benefit type training module include:
Model error obtaining unit, for carrying out the error and execute word that text filed detection generates according to the network model
The error that symbol identification operation generates, obtains the text identification error of the network model;
Model parameter adjustment unit, for according to the text identification error, by back-propagating adjust the network model into
The network layer parameter of the row text filed detection and the network layer parameter of execution character identification operation, miss the text identification
It is poor minimum.
14. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to executing the identification for completing text in image described in claim 1-7 any one
Method.
15. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program can be executed the identification for completing text in image described in claim 1-7 any one as processor
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811202558.2A CN109271967B (en) | 2018-10-16 | 2018-10-16 | Method and device for recognizing text in image, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811202558.2A CN109271967B (en) | 2018-10-16 | 2018-10-16 | Method and device for recognizing text in image, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109271967A true CN109271967A (en) | 2019-01-25 |
CN109271967B CN109271967B (en) | 2022-08-26 |
Family
ID=65196737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811202558.2A Active CN109271967B (en) | 2018-10-16 | 2018-10-16 | Method and device for recognizing text in image, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271967B (en) |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919014A (en) * | 2019-01-28 | 2019-06-21 | 平安科技(深圳)有限公司 | OCR recognition methods and its electronic equipment |
CN109948469A (en) * | 2019-03-01 | 2019-06-28 | 吉林大学 | The automatic detection recognition method of crusing robot instrument based on deep learning |
CN110059188A (en) * | 2019-04-11 | 2019-07-26 | 四川黑马数码科技有限公司 | A kind of Chinese sentiment analysis method based on two-way time convolutional network |
CN110110652A (en) * | 2019-05-05 | 2019-08-09 | 达闼科技(北京)有限公司 | A kind of object detection method, electronic equipment and storage medium |
CN110119681A (en) * | 2019-04-04 | 2019-08-13 | 平安科技(深圳)有限公司 | A kind of line of text extracting method and device, electronic equipment |
CN110135424A (en) * | 2019-05-23 | 2019-08-16 | 阳光保险集团股份有限公司 | Tilt text detection model training method and ticket image Method for text detection |
CN110135411A (en) * | 2019-04-30 | 2019-08-16 | 北京邮电大学 | Business card identification method and device |
CN110175610A (en) * | 2019-05-23 | 2019-08-27 | 上海交通大学 | A kind of bill images text recognition method for supporting secret protection |
CN110210581A (en) * | 2019-04-28 | 2019-09-06 | 平安科技(深圳)有限公司 | A kind of handwritten text recognition methods and device, electronic equipment |
CN110232713A (en) * | 2019-06-13 | 2019-09-13 | 腾讯数码(天津)有限公司 | A kind of image object positioning correction method and relevant device |
CN110276345A (en) * | 2019-06-05 | 2019-09-24 | 北京字节跳动网络技术有限公司 | Convolutional neural networks model training method, device and computer readable storage medium |
CN110414520A (en) * | 2019-06-28 | 2019-11-05 | 平安科技(深圳)有限公司 | Universal character recognition methods, device, computer equipment and storage medium |
CN110442860A (en) * | 2019-07-05 | 2019-11-12 | 大连大学 | Name entity recognition method based on time convolutional network |
CN110458011A (en) * | 2019-07-05 | 2019-11-15 | 北京百度网讯科技有限公司 | Character recognition method and device, computer equipment and readable medium end to end |
CN110533041A (en) * | 2019-09-05 | 2019-12-03 | 重庆邮电大学 | Multiple dimensioned scene text detection method based on recurrence |
CN110610175A (en) * | 2019-08-06 | 2019-12-24 | 深圳市华付信息技术有限公司 | OCR data mislabeling cleaning method |
CN110610166A (en) * | 2019-09-18 | 2019-12-24 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110705547A (en) * | 2019-09-06 | 2020-01-17 | 中国平安财产保险股份有限公司 | Method and device for recognizing characters in image and computer readable storage medium |
CN110738203A (en) * | 2019-09-06 | 2020-01-31 | 中国平安财产保险股份有限公司 | Method and device for outputting field structuralization and computer readable storage medium |
CN110751146A (en) * | 2019-10-23 | 2020-02-04 | 北京印刷学院 | Text region detection method, text region detection device, electronic terminal and computer-readable storage medium |
CN110807459A (en) * | 2019-10-31 | 2020-02-18 | 深圳市捷顺科技实业股份有限公司 | License plate correction method and device and readable storage medium |
CN111091123A (en) * | 2019-12-02 | 2020-05-01 | 上海眼控科技股份有限公司 | Text region detection method and equipment |
CN111104934A (en) * | 2019-12-22 | 2020-05-05 | 上海眼控科技股份有限公司 | Engine label detection method, electronic device and computer readable storage medium |
CN111104941A (en) * | 2019-11-14 | 2020-05-05 | 腾讯科技(深圳)有限公司 | Image direction correcting method and device and electronic equipment |
CN111259773A (en) * | 2020-01-13 | 2020-06-09 | 中国科学院重庆绿色智能技术研究院 | Irregular text line identification method and system based on bidirectional decoding |
CN111462095A (en) * | 2020-04-03 | 2020-07-28 | 上海帆声图像科技有限公司 | Parameter automatic adjusting method for industrial flaw image detection |
CN111488883A (en) * | 2020-04-14 | 2020-08-04 | 上海眼控科技股份有限公司 | Vehicle frame number identification method and device, computer equipment and storage medium |
CN111598087A (en) * | 2020-05-15 | 2020-08-28 | 润联软件系统(深圳)有限公司 | Irregular character recognition method and device, computer equipment and storage medium |
CN111723627A (en) * | 2019-03-22 | 2020-09-29 | 北京搜狗科技发展有限公司 | Image processing method and device and electronic equipment |
CN112101360A (en) * | 2020-11-17 | 2020-12-18 | 浙江大华技术股份有限公司 | Target detection method and device and computer readable storage medium |
CN112258259A (en) * | 2019-08-14 | 2021-01-22 | 北京京东尚科信息技术有限公司 | Data processing method, device and computer readable storage medium |
CN112508015A (en) * | 2020-12-15 | 2021-03-16 | 山东大学 | Nameplate identification method, computer equipment and storage medium |
CN112580637A (en) * | 2020-12-31 | 2021-03-30 | 苏宁金融科技(南京)有限公司 | Text information identification method, text information extraction method, text information identification device, text information extraction device and text information identification system |
CN112798949A (en) * | 2020-10-22 | 2021-05-14 | 国家电网有限公司 | Pumped storage unit generator temperature early warning method and system |
CN113052159A (en) * | 2021-04-14 | 2021-06-29 | 中国移动通信集团陕西有限公司 | Image identification method, device, equipment and computer storage medium |
CN113076815A (en) * | 2021-03-16 | 2021-07-06 | 西南交通大学 | Automatic driving direction prediction method based on lightweight neural network |
CN113128306A (en) * | 2020-01-10 | 2021-07-16 | 北京字节跳动网络技术有限公司 | Vertical text line recognition method, device, equipment and computer readable storage medium |
CN113537189A (en) * | 2021-06-03 | 2021-10-22 | 深圳市雄帝科技股份有限公司 | Handwritten character recognition method, device, equipment and storage medium |
CN113591864A (en) * | 2021-07-28 | 2021-11-02 | 北京百度网讯科技有限公司 | Training method, device and system for text recognition model framework |
CN113762259A (en) * | 2020-09-02 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Text positioning method, text positioning device, computer system and readable storage medium |
CN114049648A (en) * | 2021-11-25 | 2022-02-15 | 清华大学 | Engineering drawing text detection and identification method, device and system |
CN114842464A (en) * | 2022-05-13 | 2022-08-02 | 北京百度网讯科技有限公司 | Image direction recognition method, device, equipment, storage medium and program product |
CN115205861A (en) * | 2022-08-17 | 2022-10-18 | 北京睿企信息科技有限公司 | Method for acquiring abnormal character recognition area, electronic equipment and storage medium |
CN113052159B (en) * | 2021-04-14 | 2024-06-07 | 中国移动通信集团陕西有限公司 | Image recognition method, device, equipment and computer storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107305630A (en) * | 2016-04-25 | 2017-10-31 | 腾讯科技(深圳)有限公司 | Text sequence recognition methods and device |
US20180107892A1 (en) * | 2015-04-20 | 2018-04-19 | 3M Innovative Properties Company | Dual embedded optical character recognition (ocr) engines |
CN108345850A (en) * | 2018-01-23 | 2018-07-31 | 哈尔滨工业大学 | The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel |
-
2018
- 2018-10-16 CN CN201811202558.2A patent/CN109271967B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180107892A1 (en) * | 2015-04-20 | 2018-04-19 | 3M Innovative Properties Company | Dual embedded optical character recognition (ocr) engines |
CN107305630A (en) * | 2016-04-25 | 2017-10-31 | 腾讯科技(深圳)有限公司 | Text sequence recognition methods and device |
CN108345850A (en) * | 2018-01-23 | 2018-07-31 | 哈尔滨工业大学 | The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel |
Non-Patent Citations (2)
Title |
---|
HUI LI,ET AL: "《Towards end-to-end text spotting with convolutional recurrent neural networks》", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
IDO FREEMAN,ET AL: "《EffNet: AN EFFICIENT STRUCTURE FOR CONVOLUTIONAL NEURAL NETWORKS》", 《COMPUTER VISION AND PATTERN RECOGNITION》 * |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919014B (en) * | 2019-01-28 | 2023-11-03 | 平安科技(深圳)有限公司 | OCR (optical character recognition) method and electronic equipment thereof |
CN109919014A (en) * | 2019-01-28 | 2019-06-21 | 平安科技(深圳)有限公司 | OCR recognition methods and its electronic equipment |
CN109948469A (en) * | 2019-03-01 | 2019-06-28 | 吉林大学 | The automatic detection recognition method of crusing robot instrument based on deep learning |
CN111723627A (en) * | 2019-03-22 | 2020-09-29 | 北京搜狗科技发展有限公司 | Image processing method and device and electronic equipment |
CN110119681B (en) * | 2019-04-04 | 2023-11-24 | 平安科技(深圳)有限公司 | Text line extraction method and device and electronic equipment |
CN110119681A (en) * | 2019-04-04 | 2019-08-13 | 平安科技(深圳)有限公司 | A kind of line of text extracting method and device, electronic equipment |
CN110059188A (en) * | 2019-04-11 | 2019-07-26 | 四川黑马数码科技有限公司 | A kind of Chinese sentiment analysis method based on two-way time convolutional network |
CN110059188B (en) * | 2019-04-11 | 2022-06-21 | 四川黑马数码科技有限公司 | Chinese emotion analysis method based on bidirectional time convolution network |
CN110210581A (en) * | 2019-04-28 | 2019-09-06 | 平安科技(深圳)有限公司 | A kind of handwritten text recognition methods and device, electronic equipment |
CN110210581B (en) * | 2019-04-28 | 2023-11-24 | 平安科技(深圳)有限公司 | Handwriting text recognition method and device and electronic equipment |
CN110135411B (en) * | 2019-04-30 | 2021-09-10 | 北京邮电大学 | Business card recognition method and device |
CN110135411A (en) * | 2019-04-30 | 2019-08-16 | 北京邮电大学 | Business card identification method and device |
CN110110652A (en) * | 2019-05-05 | 2019-08-09 | 达闼科技(北京)有限公司 | A kind of object detection method, electronic equipment and storage medium |
CN110175610A (en) * | 2019-05-23 | 2019-08-27 | 上海交通大学 | A kind of bill images text recognition method for supporting secret protection |
CN110135424B (en) * | 2019-05-23 | 2021-06-11 | 阳光保险集团股份有限公司 | Inclined text detection model training method and ticket image text detection method |
CN110175610B (en) * | 2019-05-23 | 2023-09-05 | 上海交通大学 | Bill image text recognition method supporting privacy protection |
CN110135424A (en) * | 2019-05-23 | 2019-08-16 | 阳光保险集团股份有限公司 | Tilt text detection model training method and ticket image Method for text detection |
CN110276345A (en) * | 2019-06-05 | 2019-09-24 | 北京字节跳动网络技术有限公司 | Convolutional neural networks model training method, device and computer readable storage medium |
CN110276345B (en) * | 2019-06-05 | 2021-09-17 | 北京字节跳动网络技术有限公司 | Convolutional neural network model training method and device and computer readable storage medium |
CN110232713B (en) * | 2019-06-13 | 2022-09-20 | 腾讯数码(天津)有限公司 | Image target positioning correction method and related equipment |
CN110232713A (en) * | 2019-06-13 | 2019-09-13 | 腾讯数码(天津)有限公司 | A kind of image object positioning correction method and relevant device |
CN110414520A (en) * | 2019-06-28 | 2019-11-05 | 平安科技(深圳)有限公司 | Universal character recognition methods, device, computer equipment and storage medium |
US11210546B2 (en) | 2019-07-05 | 2021-12-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | End-to-end text recognition method and apparatus, computer device and readable medium |
CN110458011A (en) * | 2019-07-05 | 2019-11-15 | 北京百度网讯科技有限公司 | Character recognition method and device, computer equipment and readable medium end to end |
CN110442860A (en) * | 2019-07-05 | 2019-11-12 | 大连大学 | Name entity recognition method based on time convolutional network |
CN110610175A (en) * | 2019-08-06 | 2019-12-24 | 深圳市华付信息技术有限公司 | OCR data mislabeling cleaning method |
CN112258259A (en) * | 2019-08-14 | 2021-01-22 | 北京京东尚科信息技术有限公司 | Data processing method, device and computer readable storage medium |
CN110533041B (en) * | 2019-09-05 | 2022-07-01 | 重庆邮电大学 | Regression-based multi-scale scene text detection method |
CN110533041A (en) * | 2019-09-05 | 2019-12-03 | 重庆邮电大学 | Multiple dimensioned scene text detection method based on recurrence |
CN110705547A (en) * | 2019-09-06 | 2020-01-17 | 中国平安财产保险股份有限公司 | Method and device for recognizing characters in image and computer readable storage medium |
CN110738203B (en) * | 2019-09-06 | 2024-04-05 | 中国平安财产保险股份有限公司 | Field structured output method, device and computer readable storage medium |
CN110738203A (en) * | 2019-09-06 | 2020-01-31 | 中国平安财产保险股份有限公司 | Method and device for outputting field structuralization and computer readable storage medium |
CN110705547B (en) * | 2019-09-06 | 2023-08-18 | 中国平安财产保险股份有限公司 | Method and device for recognizing text in image and computer readable storage medium |
CN110610166B (en) * | 2019-09-18 | 2022-06-07 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110610166A (en) * | 2019-09-18 | 2019-12-24 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110751146A (en) * | 2019-10-23 | 2020-02-04 | 北京印刷学院 | Text region detection method, text region detection device, electronic terminal and computer-readable storage medium |
CN110807459A (en) * | 2019-10-31 | 2020-02-18 | 深圳市捷顺科技实业股份有限公司 | License plate correction method and device and readable storage medium |
CN110807459B (en) * | 2019-10-31 | 2022-06-17 | 深圳市捷顺科技实业股份有限公司 | License plate correction method and device and readable storage medium |
CN111104941A (en) * | 2019-11-14 | 2020-05-05 | 腾讯科技(深圳)有限公司 | Image direction correcting method and device and electronic equipment |
CN111091123A (en) * | 2019-12-02 | 2020-05-01 | 上海眼控科技股份有限公司 | Text region detection method and equipment |
CN111104934A (en) * | 2019-12-22 | 2020-05-05 | 上海眼控科技股份有限公司 | Engine label detection method, electronic device and computer readable storage medium |
CN113128306A (en) * | 2020-01-10 | 2021-07-16 | 北京字节跳动网络技术有限公司 | Vertical text line recognition method, device, equipment and computer readable storage medium |
CN111259773A (en) * | 2020-01-13 | 2020-06-09 | 中国科学院重庆绿色智能技术研究院 | Irregular text line identification method and system based on bidirectional decoding |
CN111462095A (en) * | 2020-04-03 | 2020-07-28 | 上海帆声图像科技有限公司 | Parameter automatic adjusting method for industrial flaw image detection |
CN111462095B (en) * | 2020-04-03 | 2024-04-09 | 上海帆声图像科技有限公司 | Automatic parameter adjusting method for industrial flaw image detection |
CN111488883A (en) * | 2020-04-14 | 2020-08-04 | 上海眼控科技股份有限公司 | Vehicle frame number identification method and device, computer equipment and storage medium |
CN111598087A (en) * | 2020-05-15 | 2020-08-28 | 润联软件系统(深圳)有限公司 | Irregular character recognition method and device, computer equipment and storage medium |
CN111598087B (en) * | 2020-05-15 | 2023-05-23 | 华润数字科技有限公司 | Irregular character recognition method, device, computer equipment and storage medium |
CN113762259A (en) * | 2020-09-02 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Text positioning method, text positioning device, computer system and readable storage medium |
CN112798949A (en) * | 2020-10-22 | 2021-05-14 | 国家电网有限公司 | Pumped storage unit generator temperature early warning method and system |
CN112101360A (en) * | 2020-11-17 | 2020-12-18 | 浙江大华技术股份有限公司 | Target detection method and device and computer readable storage medium |
CN112508015A (en) * | 2020-12-15 | 2021-03-16 | 山东大学 | Nameplate identification method, computer equipment and storage medium |
CN112580637B (en) * | 2020-12-31 | 2023-05-12 | 苏宁金融科技(南京)有限公司 | Text information identification method, text information extraction method, text information identification device, text information extraction device and text information extraction system |
CN112580637A (en) * | 2020-12-31 | 2021-03-30 | 苏宁金融科技(南京)有限公司 | Text information identification method, text information extraction method, text information identification device, text information extraction device and text information identification system |
CN113076815B (en) * | 2021-03-16 | 2022-09-27 | 西南交通大学 | Automatic driving direction prediction method based on lightweight neural network |
CN113076815A (en) * | 2021-03-16 | 2021-07-06 | 西南交通大学 | Automatic driving direction prediction method based on lightweight neural network |
CN113052159A (en) * | 2021-04-14 | 2021-06-29 | 中国移动通信集团陕西有限公司 | Image identification method, device, equipment and computer storage medium |
CN113052159B (en) * | 2021-04-14 | 2024-06-07 | 中国移动通信集团陕西有限公司 | Image recognition method, device, equipment and computer storage medium |
CN113537189A (en) * | 2021-06-03 | 2021-10-22 | 深圳市雄帝科技股份有限公司 | Handwritten character recognition method, device, equipment and storage medium |
WO2023005253A1 (en) * | 2021-07-28 | 2023-02-02 | 北京百度网讯科技有限公司 | Method, apparatus and system for training text recognition model framework |
CN113591864A (en) * | 2021-07-28 | 2021-11-02 | 北京百度网讯科技有限公司 | Training method, device and system for text recognition model framework |
CN114049648A (en) * | 2021-11-25 | 2022-02-15 | 清华大学 | Engineering drawing text detection and identification method, device and system |
CN114049648B (en) * | 2021-11-25 | 2024-06-11 | 清华大学 | Engineering drawing text detection and recognition method, device and system |
CN114842464A (en) * | 2022-05-13 | 2022-08-02 | 北京百度网讯科技有限公司 | Image direction recognition method, device, equipment, storage medium and program product |
CN115205861B (en) * | 2022-08-17 | 2023-03-31 | 北京睿企信息科技有限公司 | Method for acquiring abnormal character recognition area, electronic equipment and storage medium |
CN115205861A (en) * | 2022-08-17 | 2022-10-18 | 北京睿企信息科技有限公司 | Method for acquiring abnormal character recognition area, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109271967B (en) | 2022-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271967A (en) | The recognition methods of text and device, electronic equipment, storage medium in image | |
US10636169B2 (en) | Synthesizing training data for broad area geospatial object detection | |
CN106127204B (en) | A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks | |
CN103959330B (en) | System and method for matching visual object component | |
CN107273502B (en) | Image geographic labeling method based on spatial cognitive learning | |
CN103578119B (en) | Target detection method in Codebook dynamic scene based on superpixels | |
EP1418509B1 (en) | Method using image recomposition to improve scene classification | |
CN110503154A (en) | Method, system, electronic equipment and the storage medium of image classification | |
CN109359696A (en) | A kind of vehicle money recognition methods, system and storage medium | |
CN109271991A (en) | A kind of detection method of license plate based on deep learning | |
CN110516671A (en) | Training method, image detecting method and the device of neural network model | |
CN106815604A (en) | Method for viewing points detecting based on fusion of multi-layer information | |
CN109711399A (en) | Shop recognition methods based on image, device, electronic equipment | |
CN110210581A (en) | A kind of handwritten text recognition methods and device, electronic equipment | |
CN109858547A (en) | A kind of object detection method and device based on BSSD | |
CN109002752A (en) | A kind of complicated common scene rapid pedestrian detection method based on deep learning | |
CN106296734B (en) | Method for tracking target based on extreme learning machine and boosting Multiple Kernel Learnings | |
Zhao et al. | Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention | |
CN109214245A (en) | A kind of method for tracking target, device, equipment and computer readable storage medium | |
CN108509567B (en) | Method and device for building digital culture content library | |
CN106991397A (en) | View-based access control model conspicuousness constrains the remote sensing images detection method of depth confidence network | |
CN114332586A (en) | Small target detection method and device, equipment, medium and product thereof | |
CN110517270A (en) | A kind of indoor scene semantic segmentation method based on super-pixel depth network | |
US11200650B1 (en) | Dynamic image re-timing | |
CN115984226A (en) | Insulator defect detection method, device, medium, and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |