CN108229303A - Detection identification and the detection identification training method of network and device, equipment, medium - Google Patents
Detection identification and the detection identification training method of network and device, equipment, medium Download PDFInfo
- Publication number
- CN108229303A CN108229303A CN201711126372.9A CN201711126372A CN108229303A CN 108229303 A CN108229303 A CN 108229303A CN 201711126372 A CN201711126372 A CN 201711126372A CN 108229303 A CN108229303 A CN 108229303A
- Authority
- CN
- China
- Prior art keywords
- text box
- information
- detection
- network layer
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The embodiment of the invention discloses a kind of detection identification and the detection identification training method of network and device, equipment, medium, wherein, detection recognition method includes:Pending image is inputted into detection identification network;The detection identification network includes shared network layer, detection network layer and identification network layer;The inclusion layer feature of the pending image is exported through the shared network layer;The inclusion layer feature is inputted into the detection network layer, the detection layers feature of the pending image is exported through the detection network layer, the text box information of word is included based on the detection layers feature acquisition pending image;The inclusion layer feature and text box information are inputted into the identification network layer, the word content in the text box is exported through the identification network layer.The embodiment of the present invention reduces repetition and carries out feature extraction to image, improves treatment effeciency;Improve text detection and the efficiency and speed of identification.
Description
Technical field
The present invention relates to computer vision technique, especially a kind of training method for detecting identification and detection identification network and
Device, equipment, medium.
Background technology
Text detection and identification under natural scene are the major issues that image understanding and image restore field.Accurate text
This detection and identification can be used in the picture search under many problems, such as large data sets, automatic translation, blind person's guiding, machine
People's navigation etc..
However the text detection and identification under natural scene are challenges of having very much, different background scenes, low resolution,
Different fonts, different illumination conditions, different size scale, different inclined directions, obscure etc. factors, all so that the problem become ten
Divide complicated and difficult.
Invention content
The embodiment of the present invention provides a kind of character recognition technology scheme.
One side according to embodiments of the present invention, a kind of detection recognition method provided, including:
Pending image is inputted into detection identification network;The detection identification network includes shared network layer, detection network
Layer and identification network layer;
The inclusion layer feature of the pending image is exported through the shared network layer, the sharing feature is schemed for embodying
Following characteristics is at least one as in:Wisp textural characteristics, edge feature, minutia;
The inclusion layer feature is inputted into the detection network layer, the pending image is exported through the detection network layer
Detection layers feature, the text box information that the pending image includes word is obtained based on the detection layers feature;
The inclusion layer feature and text box information are inputted into the identification network layer, through the identification network layer output institute
State the word content in text box.
In another embodiment based on the above method of the present invention, the detection layers feature includes the pending image
In each pixel classification information;Whether it is word class that the classification information is used for through the different corresponding pixels of information indicating
Not;
The text box information for including word based on the detection layers feature acquisition pending image, including:
Include the text of word by the classification information acquisition pending image of each pixel in the pending image
This frame information, the text box information include:Text box classification information and text box location information;The text box classification information
For whether representing in the text box comprising word;The text box location information includes any picture in the pending image
The rotation angle of distance and text box in vegetarian refreshments to the text box up and down.
It is described to pass through each pixel in the pending image in another embodiment based on the above method of the present invention
Classification information obtains the text box information that the pending image includes word, including:
The length of the pending image and width are narrowed down to setting ratio, root by the classification information based on pending image respectively
The pending image is divided into multiple rectangle frames according to location of pixels relationship;Classification based on pixel each inside the rectangle frame
The rectangle frame that information is denoted as text information obtains text box;
Any pixel point in the pending image is obtained apart from text box range information up and down and described
The rotation angle information of text box;
Text box location information and text box classification information based on the acquisition obtain the text box information.
It is described by the inclusion layer feature and text box information in another embodiment based on the above method of the present invention
The identification network layer is inputted, the text information in the text box is predicted through the identification network layer, including:
Text box information based on the output obtains corresponding text box feature, by the text box feature with it is described common
The inclusion layer feature for enjoying network layer output carries out Fusion Features;
The identification network layer predicts the text information in the text box based on the feature after fusion.
In another embodiment based on the above method of the present invention, the text box information based on the output obtains
Corresponding text box feature, including:
Perspective transform is carried out to the text box information, text box is partitioned into from pending image, is partitioned into based on described
Text box generate corresponding text box feature.
It is described to be partitioned into text box from pending image in another embodiment based on the above method of the present invention, packet
It includes:
The top left co-ordinate of the text box is obtained according to text box location information;
The height of the text box and the constant rate of width are kept, the text box is zoomed in and out, makes each text
This frame it is highly consistent;
Rotation angle based on the text box, the top left co-ordinate and scaling structure perspective transformation matrix;
Based on the perspective transformation matrix, the text box is partitioned into from the pending image.
It is described based on the perspective transformation matrix in another embodiment based on the above method of the present invention, from described
The text box is partitioned into pending image, including:
Matrix multiplication operation is performed to the perspective transformation matrix and the pending image, one is obtained and waits to locate with described
The identical segmentation image of image size is managed, each segmentation image only includes a text box in the upper left corner.
Other side according to embodiments of the present invention, the training method of a kind of detection identification network provided, including:
Pending image is inputted into detection identification network;Wherein, the detection identification network includes shared network layer, detection
Network layer and identification network layer;The text information that the pending image labeling has a text box information and text box includes;
The first inclusion layer feature is exported through the shared network layer;By the first inclusion layer feature and the pending figure
As the text box information input identification network layer of mark, the word included through the identification network layer prediction text box
Information;The text information of text information and mark based on prediction trains the shared network layer and the identification network layer, directly
Completion condition is trained to meeting first;The sharing feature is used to embody at least one of following characteristics in image:Wisp line
Manage feature, edge feature, minutia;
By the shared network layer after the input training of pending image, the shared network layer output second after the training is common
Enjoy a layer feature;The second inclusion layer feature is inputted into the detection network layer, through waiting to locate described in the detection network layer prediction
The detection layers feature of image is managed, the text box letter of word is included based on the detection layers feature acquisition pending image
Breath;The text box information of text box information and mark based on prediction trains the detection network layer, until meeting the second training
Completion condition.
In another embodiment based on the above method of the present invention, the word letter of text information and mark based on prediction
The breath training shared network layer and the identification network layer, until meet the first training completion condition, including:
Text information based on prediction and the error between the text information of mark are to the shared network layer and identification net
Network parameter values in network layers are adjusted;
Iteration performs to be identified pending image and obtains by adjusting the shared network layer after parameter and identification network layer
The text information that must be predicted, until meeting the first training completion condition.
In another embodiment based on the above method of the present invention, the first training completion condition, including:
Error between the text information of the prediction and the text information of mark is less than the first preset value;Or iteration prediction
Number is greater than or equal to the first preset times.
In another embodiment based on the above method of the present invention, the text box information based on prediction and mark
Text box information trains the detection network layer, until meet the second training completion condition, including:
Text box information based on prediction and the error between the text box information of mark are to the ginseng for detecting network layer
Number is adjusted;
Iteration performs the text for being detected by adjusting the detection network layer after parameter to pending image and obtaining prediction
Frame information, until meeting the second training completion condition.
In another embodiment based on the above method of the present invention, the second training completion condition, including:
Error between the text box information of the prediction and the text box information of mark is less than the second preset value;Or iteration
Predict that number is greater than or equal to the second preset times.
In another embodiment based on the above method of the present invention, the detection layers feature includes the pending image
In each pixel classification information;Whether it is word class that the classification information is used for through the different corresponding pixels of information indicating
Not;
The text box information for including word based on the detection layers feature acquisition pending image includes:
Include the text of word by the classification information acquisition pending image of each pixel in the pending image
This frame information, the text box information include:Text box classification information and text box location information;The text box classification information
For whether representing in the text box comprising word;The text box location information includes any picture in the pending image
The rotation angle of distance and text box in vegetarian refreshments to the text box up and down.
It is described to pass through each pixel in the pending image in another embodiment based on the above method of the present invention
Classification information obtains the text box information that the pending image includes word, including:
The length of the pending image and width are narrowed down to setting ratio by the classification information based on the pending image respectively
Example, multiple rectangle frames are divided into according to location of pixels relationship by the pending image;Based on pixel each inside the rectangle frame
Classification information be denoted as text information rectangle frame obtain text box;
Any pixel point in the pending image is obtained apart from text box range information up and down and described
The rotation angle information of text box;
Text box location information and text box classification information based on the acquisition obtain the text box information.
In another embodiment based on the above method of the present invention, predicted in the text box through the identification network layer
Including text information, including:
Text box information based on the pending image labeling obtains corresponding text box feature, by text box spy
Sign and the first inclusion layer feature of the shared network layer output carry out Fusion Features;
The identification network layer predicts the text information in the text box based on the feature after fusion.
In another embodiment based on the above method of the present invention, the text box letter based on the pending image labeling
Breath obtains corresponding text box feature, including:
Perspective transform is carried out to the text box information of the mark, text box is partitioned into from the pending image, is based on
The text box being partitioned into generates corresponding text box feature.
In another embodiment based on the above method of the present invention, text box is partitioned into from the pending image, is wrapped
It includes:
The top left co-ordinate of the text box is obtained according to text box location information;
The height of the text box and the constant rate of width are kept, the text box is zoomed in and out, makes each text
This frame it is highly consistent;
Rotation angle based on the text box, the top left co-ordinate and scaling structure perspective transformation matrix;
Based on the perspective transformation matrix, the text box is partitioned into from the pending image.
In another embodiment based on the above method of the present invention, based on the perspective transformation matrix, from described treat
The text box is partitioned into reason image, including:
Matrix multiplication operation is performed to the perspective transformation matrix and the pending image, one is obtained and waits to locate with described
The identical segmentation image of image size is managed, each segmentation image only includes a text box in the upper left corner.
Other side according to embodiments of the present invention, a kind of detection identification device provided, including:
Input unit, for pending image to be inputted detection identification network;The detection identification network includes shared net
Network layers, detection network layer and identification network layer;
Low layer extraction unit, for exporting the inclusion layer feature of the pending image through the shared network layer;It is described
Sharing feature is used to embody at least one of following characteristics in image:Wisp textural characteristics, edge feature, minutia;
Text box detection unit, for the inclusion layer feature to be inputted the detection network layer, through the detection network
Layer exports the detection layers feature of the pending image, and obtaining the pending image based on the detection layers feature includes text
The text box information of word;
Word recognition unit, for the inclusion layer feature and text box information to be inputted the identification network layer, through institute
State the word content in the identification network layer output text box.
In another embodiment based on above device of the present invention, the detection layers feature includes the pending image
In each pixel classification information;Whether it is word class that the classification information is used for through the different corresponding pixels of information indicating
Not;
The text box detection unit, specifically for obtaining institute by the classification information of each pixel in the pending image
The text box information that pending image includes word is stated, the text box information includes:Text box classification information and text box
Location information;Whether the text box classification information is used to represent in the text box comprising word;The text box position letter
Breath includes the rotation angle of the distance and text box in the pending image in any pixel point to the text box up and down
Degree.
In another embodiment based on above device of the present invention, the text box detection unit, including:
Text box obtains module, divides the length of the pending image and width for the classification information based on pending image
Setting ratio is not narrowed down to, the pending image is divided by multiple rectangle frames according to location of pixels relationship;Based on the square
The classification information of each pixel is denoted as the rectangle frame acquisition text box of text information inside shape frame;
Data obtaining module, for obtaining in the pending image any pixel point apart from the text box up and down
Range information and the text box rotation angle information;Text box location information and text box classification based on the acquisition
Text box information described in acquisition of information.
In another embodiment based on above device of the present invention, the word recognition unit, including:
Characteristic extracting module, for obtaining corresponding text box feature based on the text box information of the output, by described in
Text box feature and the inclusion layer feature of the shared network layer output carry out Fusion Features;
Word-predictor module, for the identification network layer based on the word in the feature prediction text box after fusion
Information.
In another embodiment based on above device of the present invention, the characteristic extracting module, specifically for described
Text box information carries out perspective transform, and text box is partitioned into from pending image, based on the text box generation pair being partitioned into
The text box feature answered.
In another embodiment based on above device of the present invention, the characteristic extracting module, including:
Zoom module, for obtaining the top left co-ordinate of the text box according to text box location information;Keep the text
The height of this frame and the constant rate of width, zoom in and out the text box, make the highly consistent of each text box;
Conversion module, it is saturating for the rotation angle based on the text box, the top left co-ordinate and scaling structure
Depending on transformation matrix;
Text box divides module, for being based on the perspective transformation matrix, is partitioned into from the pending image described
Text box.
In another embodiment based on above device of the present invention, the text box divides module, specifically for institute
It states perspective transformation matrix and performs matrix multiplication operation with the pending image, obtain one and the pending image size phase
Same segmentation image, each segmentation image only include a text box in the upper left corner.
Other side according to embodiments of the present invention, the training device of a kind of detection identification network provided, including:
Image input units, for pending image to be inputted detection identification network;Wherein, the detection identification network packet
Include shared network layer, detection network layer and identification network layer;The pending image labeling has in text box information and text box
Including text information;
First training unit, for exporting the first inclusion layer feature through the shared network layer;By first inclusion layer
Feature and the text box information of the pending image labeling input the identification network layer, through the identification network layer prediction institute
State the text information that text box includes;The text information of text information based on prediction and mark train the shared network layer and
The identification network layer, until meeting the first training completion condition;The sharing feature is used to embody following characteristics in image
It is at least one:Wisp textural characteristics, edge feature, minutia;
Second training unit, for pending image to be inputted to the shared network layer after training, being total to after the training
It enjoys network layer and exports the second inclusion layer feature;The second inclusion layer feature is inputted into the detection network layer, through the detection
Network layer predicts the detection layers feature of the pending image, is obtained in the pending image and wrapped based on the detection layers feature
Include the text box information of word;The text box information of text box information and mark based on prediction trains the detection network layer,
Until meet the second training completion condition.
In another embodiment based on above device of the present invention, first training unit is pre- specifically for being based on
Error between the text information of survey and the text information of mark joins the network in the shared network layer and identification network layer
Numerical value is adjusted;Iteration, which performs, knows pending image by adjusting the shared network layer after parameter and identification network layer
The text information of prediction is not obtained, until meeting the first training completion condition.
In another embodiment based on above device of the present invention, the first training completion condition, including:
Error between the text information of the prediction and the text information of mark is less than the first preset value;Or iteration prediction
Number is greater than or equal to the first preset times.
In another embodiment based on above device of the present invention, second training unit is pre- specifically for being based on
Error between the text box information of survey and the text box information of mark is adjusted the parameter of the detection network layer;Iteration
The text box information for being detected by adjusting the detection network layer after parameter to pending image and obtaining prediction is performed, until full
Foot second trains completion condition.
In another embodiment based on above device of the present invention, the second training completion condition, including:
Error between the text box information of the prediction and the text box information of mark is less than the second preset value;Or iteration
Predict that number is greater than or equal to the second preset times.
In another embodiment based on above device of the present invention, the detection layers feature includes the pending image
In each pixel classification information;Whether it is word class that the classification information is used for through the different corresponding pixels of information indicating
Not;
Second training unit, specifically for described in the classification information acquisition by each pixel in the pending image
Pending image includes the text box information of word;The text box information includes:Text box classification information and text box position
Confidence ceases;Whether the text box classification information is used to represent in the text box comprising word;The text box location information
Including the distance and the rotation angle of text box in any pixel point to the text box in the pending image up and down.
In another embodiment based on above device of the present invention, second training unit, including:
Text box obtains module, for the classification information based on the pending image by the length of the pending image and
Width narrows down to setting ratio respectively, and the pending image is divided into multiple rectangle frames according to location of pixels relationship;Based on institute
The classification information for stating each pixel inside rectangle frame is denoted as the rectangle frame acquisition text box of text information;
Data obtaining module, for obtaining in the pending image any pixel point apart from the text box up and down
Range information and the text box rotation angle information;Text box location information and text box classification based on the acquisition
Text box information described in acquisition of information.
In another embodiment based on above device of the present invention, first training unit, including:
Characteristic extracting module, it is special for obtaining corresponding text box based on the text box information of the pending image labeling
The text box feature and the first inclusion layer feature of the shared network layer output are carried out Fusion Features by sign;
File prediction module, for the identification network layer based on the word in the feature prediction text box after fusion
Information.
In another embodiment based on above device of the present invention, the characteristic extracting module, specifically for described
The text box information of mark carries out perspective transform, text box is partitioned into from the pending image, based on the text being partitioned into
This frame generates corresponding text box feature.
In another embodiment based on above device of the present invention, the characteristic extracting module, including:
Zoom module, for obtaining the top left co-ordinate of the text box according to text box location information;Keep the text
The height of this frame and the constant rate of width, zoom in and out the text box, make the highly consistent of each text box;
Conversion module, it is saturating for the rotation angle based on the text box, the top left co-ordinate and scaling structure
Depending on transformation matrix;
Text box divides module, for being based on the perspective transformation matrix, is partitioned into from the pending image described
Text box.
In another embodiment based on above device of the present invention, the text box divides module, specifically for institute
It states perspective transformation matrix and performs matrix multiplication operation with the pending image, obtain one and the pending image size phase
Same segmentation image, each segmentation image only include a text box in the upper left corner.
Other side according to embodiments of the present invention, a kind of electronic equipment provided, including processor, the processor
Training device including detecting identification device or detection identification network as described above as described above.
One side according to embodiments of the present invention, a kind of electronic equipment provided, which is characterized in that including:Memory,
For storing executable instruction;
And processor, it completes to examine as described above to perform the executable instruction for communicating with the memory
The operation of the training method of detection identifying method or detection identification network as described above.
A kind of one side according to embodiments of the present invention, the computer storage media provided, can for storing computer
The instruction of reading, which is characterized in that described instruction, which is performed, performs detection recognition method as described above or inspection as described above
Survey the operation of the training method of identification network.
The training method and device of a kind of detection identification and detection identification network based on the above embodiment of the present invention offer,
Pending image is inputted detection identification network by equipment, medium;The inclusion layer spy of pending image is exported through sharing network layer
Sign;The inclusion layer feature exported by sharing network layer reduces repetition and carries out feature extraction to image, improves treatment effeciency;
Inclusion layer feature is inputted into detection network layer, network exports the text box information that pending image includes word after testing;It will
Inclusion layer feature and text box information input identification network layer, identified network layer export the word content in text box;Pass through
One detection identifies the identification of the text information in the real-time performance detection and text box of text box information;Improve word knowledge
Other efficiency and speed.
Below by drawings and examples, technical scheme of the present invention is described in further detail.
Description of the drawings
The attached drawing of a part for constitution instruction describes the embodiment of the present invention, and is used to explain together with description
The principle of the present invention.
With reference to attached drawing, according to following detailed description, the present invention can be more clearly understood, wherein:
Fig. 1 is the flow chart of detection recognition method one embodiment of the present invention.
Fig. 2 is the structure diagram of present invention detection identification device one embodiment.
Fig. 3 is the flow chart of training method one embodiment that present invention detection identifies network.
Fig. 4 is the structure diagram of training device one embodiment that present invention detection identifies network.
Fig. 5 is the structure diagram for realizing the terminal device of the embodiment of the present application or the electronic equipment of server.
Specific embodiment
Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should be noted that:Unless in addition have
Body illustrates that the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally
The range of invention.
Simultaneously, it should be appreciated that for ease of description, the size of the various pieces shown in attached drawing is not according to reality
Proportionate relationship draw.
It is illustrative to the description only actually of at least one exemplary embodiment below, is never used as to the present invention
And its application or any restrictions that use.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as part of specification.
It should be noted that:Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, then in subsequent attached drawing does not need to that it is further discussed.
The embodiment of the present invention can be applied to computer system/server, can be with numerous other general or specialized calculating
System environments or configuration operate together.Suitable for be used together with computer system/server well-known computing system, ring
The example of border and/or configuration includes but not limited to:Personal computer system, server computer system, thin client, thick client
Machine, hand-held or laptop devices, the system based on microprocessor, set-top box, programmable consumer electronics, NetPC Network PC,
Little types Ji calculates machine Xi Tong ﹑ large computer systems and the distributed cloud computing technology environment including any of the above described system, etc..
Computer system/server can be in computer system executable instruction (such as journey performed by computer system
Sequence module) general linguistic context under describe.In general, program module can include routine, program, target program, component, logic, number
According to structure etc., they perform specific task or realize specific abstract data type.Computer system/server can be with
Implement in distributed cloud computing environment, in distributed cloud computing environment, task is long-range by what is be linked through a communication network
Manage what equipment performed.In distributed cloud computing environment, program module can be located at the Local or Remote meter for including storage device
It calculates in system storage medium.
In the prior art, the outstanding method of most of effects is divided into the detection and identification of text all using deep learning
Two parts processing, i.e., first carry out text detection to whole pictures, the location information of different texts obtained, then according to location information
The text detected is deducted and is identified.
In the implementation of the present invention, inventor has found, the prior art has at least the following problems:
1. by the detection of text and identification be divided into two parts processing method entirety accuracy rate respectively be limited to detection with
The accuracy rate of identification;2. the intermediate result that the detection of text and identification are divided into the method needs storage detection of two parts processing is made
For the input of identification, simultaneously because detection and two network models of identification are increasingly complex so that operation and storage efficiency are relatively low.
Fig. 1 is the flow chart of detection recognition method one embodiment of the present invention.As shown in Figure 1, the embodiment method includes:
Step 101, pending image is inputted into detection identification network.
Wherein, detection identification network includes shared network layer, detection network layer and identification network layer.
Step 102, the inclusion layer feature of pending image is exported through sharing network layer.
Sharing feature is used to embody at least one of following characteristics in image:Wisp textural characteristics, edge feature, details
Feature, text box detection and Text region are respectively necessary for a neural network when being handled separately as a task, can be with
Two networks are regarded as to the place of text box detection network and Text region network, text box detection network and Text region network
Reason object is image, neural network substantially combined by network layers such as a certain number of convolutional layers, pond layer, full articulamentums and
Into, all it is the text information that handles in image since text box detects network and Text region network, text box detection net
Network and Text region the network parameter in the former layer network layers for obtaining inclusion layer feature can be shared, wherein, inclusion layer feature
It is used to obtain the details of the textural characteristics of more wisps, the edge feature of image and image in image detection and identification network
The features such as feature can preferably handle detection and identification to wisp;Text box is detected into network and Text region network
In the network layer that is related to jointly feature extraction is carried out to pending image separately as shared network layer, avoid to pending
Obtained inclusion layer feature subsequently need to only be inputted text box detection and/or Text region corresponding by the reprocessing of image
Network layer.Illustratively, cascaded using Analysis On Multi-scale Features (by the inclusion layer characteristic pattern that shared network layer exports with it is known
Text box information is merged, i.e., by the Fusion Features of different levels) and CTC (Connectionist Temporal
Classfication, continuous time classify, by side of the sequential decoding for another sequence in a kind of deep neural network
Method, effect is fine in Text region) it improves to text detection and the accuracy rate identified, preferably handle smaller difficulty in picture
Word is differentiated, by sharing subnetwork, the feature to image for reducing repetition extracts.
Step 103, inclusion layer feature is inputted into detection network layer, network layer exports the detection layers of pending image after testing
Feature includes the text box information of word based on the pending image of detection layers feature acquisition.
Step 104, by inclusion layer feature and text box information input identification network layer, identified network layer exports text box
In word content.
Based on a kind of detection recognition method that the above embodiment of the present invention provides, pending image is inputted into detection identification net
Network;The inclusion layer feature of pending image is exported through sharing network layer;The inclusion layer feature exported by sharing network layer is reduced
It repeats to carry out feature extraction to image, improves treatment effeciency;Inclusion layer feature is inputted into detection network layer, after testing network
Export the text box information that pending image includes word;Inclusion layer feature and text box information input are identified into network layer,
Word content in identified network layer output text box;The real-time performance detection of text box information is identified by a detection
With the identification of the text information in text box;Improve the efficiency and speed of Text region.
Detection recognition method provided by the invention is all suitable for the language of different language, only need to be for different language
During training detection identification network, make to be trained using the word of languages to be treated, obtained detection identification network can
To realize detection and identification to the languages word.
In a specific example of detection recognition method above-described embodiment of the present invention, detection layers feature includes pending figure
The classification information of each pixel as in;Wherein, whether it is word that classification information is used for through the different corresponding pixels of information indicating
Classification;Optionally, classification information specifically can represent non-legible classification and 1 expression word classification by 0 or represent non-by 1
Word classification and 0 represents word classification.
Operation 103 includes:
Include the text box information of word by the pending image of classification information acquisition of each pixel in pending image.
Wherein, text box information includes:Text box classification information and text box location information;Text box classification information is used for
It whether represents in the text box comprising word;Text box location information includes any pixel point in pending image to text box
In distance up and down and the rotation angle of text box.In the present embodiment, by the pending image of sample image to inspection
It surveys before identifying that network is trained, the pending image to sample image is needed to be labeled, by pending to sample image
The classification of each pixel is labeled in image, and to determine the position of text box, the classification usually marked includes text and Fei Wen
This (can use 1 and 0 mark), by being labelled with text and non-textual, it is possible to the determining corresponding text of text box for including text
This frame information.
In a specific example of detection recognition method the various embodiments described above of the present invention, pass through each picture in pending image
The classification information of element obtains the text box information that pending image includes word, including:
The length of pending image and width are narrowed down to setting ratio by the classification information based on pending image respectively, according to picture
Pending image is divided into multiple rectangle frames by plain position relationship, and the classification information based on pixel each inside rectangle frame is denoted as
The rectangle frame of text information obtains text box;
Obtain the rotation of range information and text box of any pixel point apart from text box up and down in pending image
Angle information;
Text box location information and text box classification information based on acquisition obtain text box information.
Setting through this embodiment, pending image are just labeled as only that (classification information passes through 1 table including 1 and 0 image
Show that word classification, 0 represent non-legible classification or represents that non-legible classification, 0 represent word classification by 1), and in network class
In the process, it is possible to the problem of existence position is inaccurate, at this point, the length of text box and width are narrowed down to setting ratio respectively (such as:
It is long to be reduced into original 0.6 times with wide), shadow of the text position inaccuracy to algorithm can be reduced by reducing the size of text box
It rings;And the location information of determining text box is the minimum enclosed rectangle by finding text box, passes through the boundary rectangle
The range information of each pixel distance text frame up and down in text box is obtained, the angle information of text box is then to be based on this most
The rotation angle of small boundary rectangle and the positive rectangle placed.
Another embodiment of detection recognition method of the present invention, on the basis of the various embodiments described above, operation 104 includes:
Text box information based on output obtains corresponding text box feature, and text box feature and shared network layer are exported
Inclusion layer feature carry out Fusion Features;
Identify network layer based on the text information in the feature prediction text box after fusion.
Signified Fusion Features, are to connect together the inclusion layer feature of acquisition with detection layers feature in the present embodiment, this
Feature after sample fusion had not only included the inclusion layer feature of image, but also include the semantic feature of detection layers, can preferably be used for
Text detection and identification.
In a specific example of detection recognition method the various embodiments described above of the present invention, the text box information based on output
The corresponding pending image of text box feature is obtained, including:
Perspective transform is carried out to text box information, text box is partitioned into from pending image, based on the text box being partitioned into
Generate corresponding text box feature.
In the present embodiment, text box from artwork is deducted according to the location information manually marked and perspective change may be used
It changes, the arbitrary quadrilateral that will mark acquisition is deducted and transforms to a rectangle, to identify the input of network layer.Formula is as follows:
tx=l-x0
ty=t-y0
Scale=dath/(t+b)
datω=scale × (l+r)
Wherein, it inputs:T, b, l, r are certain vertical range of point away from quadrangle side up and down in arbitrary quadrilateral, and θ is should
Arbitrary quadrilateral rotation angle, dsth, dstwThe height and width of respectively set output rectangle picture, x0, y0Becoming for the point
The coordinate position of picture before changing.Output:Original image is multiplied by perspective transformation matrix M with matrix M, can directly obtain output figure
Piece, that is, the rectangle picture plucked out are used to identify network layer;The present embodiment meaning text box is characterized as text box characteristic pattern, based on obtaining
The text box respective pixel value obtained can obtain text box characteristic pattern.
In a specific example of detection recognition method the various embodiments described above of the present invention, text is partitioned into from pending image
This frame, including:
The top left co-ordinate of text box is obtained according to text box location information;
The height of text box and the constant rate of width are kept, text box is zoomed in and out, makes the height one of each text box
It causes;
Rotation angle, top left co-ordinate and scaling structure perspective transformation matrix based on text box;
Based on perspective transformation matrix, text box is partitioned into from pending image.
In the present embodiment, in order to build the top left co-ordinate of perspective transformation matrix, first acquisition text box, for the ease of
All text boxes are obtained, the height of all text boxes is adjusted to consistent, the text box after adjustment can be based on
One perspective transformation matrix is split.
In a specific example of detection recognition method the various embodiments described above of the present invention, based on perspective transformation matrix, from
Text box is partitioned into pending image, including:
Matrix multiplication operation is performed to perspective transformation matrix and pending image, obtains one and pending image size phase
Same segmentation image, each image of dividing only include a text box in the upper left corner.
In the present embodiment, a text box can only be partitioned into based on perspective transformation matrix every time, is become by mobile perspective
It changes matrix and performs matrix multiplication with pending image to obtain all text boxes.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
The relevant hardware of program instruction is completed, and aforementioned program can be stored in a computer read/write memory medium, the program
When being executed, step including the steps of the foregoing method embodiments is performed;And aforementioned storage medium includes:ROM, RAM, magnetic disc or light
The various media that can store program code such as disk.
Fig. 2 is the structure diagram of present invention detection identification device one embodiment.The device of the embodiment can be used for real
The existing above-mentioned each method embodiment of the present invention.As shown in Fig. 2, the device of the embodiment includes:
Input unit 21, for pending image to be inputted detection identification network.
Wherein, detection identification network includes shared network layer, detection network layer and identification network layer.
Low layer extraction unit 22, for exporting the inclusion layer feature of pending image through shared network layer.
Sharing feature is used to embody at least one of following characteristics in image:Wisp textural characteristics, edge feature, details
Feature.
Text box detection unit 23, for inclusion layer feature to be inputted detection network layer, network layer, which exports, after testing waits to locate
The detection layers feature of image is managed, the text box information of word is included based on the pending image of detection layers feature acquisition.
Word recognition unit 24, for inclusion layer feature and text box information input to be identified network layer, identified network
Word content in layer output text box.
Based on a kind of detection identification device that the above embodiment of the present invention provides, pending image is inputted into detection identification net
Network;The inclusion layer feature of pending image is exported through sharing network layer;The inclusion layer feature exported by sharing network layer is reduced
It repeats to carry out feature extraction to image, improves treatment effeciency;Inclusion layer feature is inputted into detection network layer, after testing network
Export the text box information that pending image includes word;Inclusion layer feature and text box information input are identified into network layer,
Word content in identified network layer output text box;The real-time performance detection of text box information is identified by a detection
With the identification of the text information in text box;Improve the efficiency and speed of Text region.
In a specific example of present invention detection identification device above-described embodiment, detection layers feature includes pending figure
The classification information of each pixel as in;Whether it is word classification that classification information is used for through the different corresponding pixels of information indicating.
Text box detection unit 23, specifically for obtaining pending figure by the classification information of each pixel in pending image
Text box information as including word.
Wherein, text box information includes:Text box classification information and text box location information;Text box classification information is used for
Represent whether include word in text box;Text box location information is included in pending image in any pixel point to text box
The distance of lower left and right and the rotation angle of text box.
In a specific example of present invention detection identification device the various embodiments described above, text box detection unit 23, packet
It includes:
Text box obtains module, and the length of pending image and width contract respectively for the classification information based on pending image
It is small to arrive setting ratio, pending image is divided by multiple rectangle frames according to location of pixels relationship;Based on picture each inside rectangle frame
The rectangle frame that the classification information of element is denoted as text information obtains text box;
Data obtaining module, for obtaining distance letter of any pixel point apart from text box up and down in pending image
The rotation angle information of breath and text box;Text box location information and text box classification information based on acquisition obtain text box letter
Breath.
Another embodiment of present invention detection identification device, on the basis of the various embodiments described above, word recognition unit 24,
Including:
Characteristic extracting module obtains corresponding text box feature for the text box information based on output, by text box spy
Sign and the inclusion layer feature of shared network layer output carry out Fusion Features;
Word-predictor module, for identifying network layer based on the text information in the feature prediction text box after fusion.
Signified Fusion Features, are to connect together the inclusion layer feature of acquisition with detection layers feature in the present embodiment, this
Feature after sample fusion had not only included the inclusion layer feature of image, but also include the semantic feature of detection layers, can preferably be used for
Text detection and identification.
In a specific example of present invention detection identification device the various embodiments described above, characteristic extracting module is specific to use
In carrying out perspective transform to text box information, text box is partitioned into from pending image, based on the text box generation pair being partitioned into
The text box feature answered.
In a specific example of present invention detection identification device the various embodiments described above, characteristic extracting module, including:
Zoom module, for obtaining the top left co-ordinate of text box according to text box location information;Keep the height of text box
The constant rate of degree and width, zooms in and out text box, makes the highly consistent of each text box;
Conversion module builds perspective transform square for the rotation angle based on text box, top left co-ordinate and scaling
Battle array;
Text box divides module, and for being based on perspective transformation matrix, text box is partitioned into from pending image.
In a specific example of present invention detection identification device the various embodiments described above, text box segmentation module, specifically
For performing matrix multiplication operation to perspective transformation matrix and pending image, obtain one it is identical with pending image size
Divide image, each image of dividing only includes a text box in the upper left corner.
Fig. 3 is the flow chart of training method one embodiment that present invention detection identifies network.As shown in figure 3, the implementation
Example method includes:
Step 301, pending image is inputted into detection identification network.
Wherein, detection identification network includes shared network layer, detection network layer and identification network layer;Pending image labeling
There is a text box information and a text information that all text boxes include;Pending image is inputted into detection identification network, it can be simultaneously
Two training missions of text detection and Text region are completed, with being trained phase to text detection network and Text region net respectively
Than being equivalent to and more labeled data and information being utilized, effectively alleviate over-fitting, promote final result accuracy rate
Promotion, being carried out at the same time Text region no longer needs two networks of text detection and Text region, improves the effect of Text region
Rate and speed.
Step 302, the first inclusion layer feature is exported through shared network layer;By the first inclusion layer feature and pending image mark
The text box information input identification network layer of note, the text information that identified network layer prediction text box includes;Based on prediction
The shared network layer of the text information of text information and mark training and identification network layer, until meeting the first training completion condition.
Network is identified for detection, trains shared network layer therein and identification network layer first, at this point, shared network layer
Regard a network as with identification network layer;Can by be wherein input in identification network layer be shared network layer output inclusion layer
The text box information of feature and pending image labeling;Sharing feature is used to embody at least one of following characteristics in image:It is small
Object texture feature, edge feature, minutia.
Step 303, by the shared network layer after the input training of pending image, it is trained after shared network layer output the
Two inclusion layer features;Second inclusion layer feature is inputted into detection network layer, network layer predicts the detection of pending image after testing
Layer feature includes the text box information of word based on the pending image of detection layers feature acquisition;Text box letter based on prediction
The text box information of breath and mark training detection network layer, until meeting the second training completion condition.
Based on the training method of a kind of detection identification network that the above embodiment of the present invention provides, pass through pending figure first
As the shared network layer of training and identification network layer, pending image is inputted into trained shared network layer and untrained detection
Network layer obtains prediction text box information, the text box information training detection network of text box information and mark based on prediction
Layer;When training detects network layer, shared network layer and detection branches layer are detected into network layer as a network, to this net
Network is trained, since shared network layer has trained, which just realizes to detection branches layer detection network layer
Training, trained shared network layer, identification branch layer identification network layer and detection branches layer detection network layer composing training are good
Detection identification network, obtained detection identification network can realize the detection and identification of word, and due to sharing network simultaneously
The presence of layer, reduces the feature extraction to image of repetition, has gently changed network structure, reduce the complexity in time and space,
Reduce model volume.
In the specific example that training method above-described embodiment of identification network is detected in the present invention, operation 302 is based on
The shared network layer of text information training and identification network layer of the text information and mark of prediction, including:
Text information based on prediction and the error between the text information of mark are to shared network layer and identification network layer
In network parameter values be adjusted;
Iteration performs to be identified pending image and obtains by adjusting the shared network layer after parameter and identification network layer
The text information that must be predicted, until meeting the first training completion condition.
In the present embodiment, specifically according to error, to parameter, newer process can include:By predictive text information and
Error between known text information is as worst error;By worst error by gradient backpropagation, shared network layer is calculated
With each layer in identification network layer of error;Go out the gradient of each layer parameter according to each layer of error calculation, repaiied according to gradient
Just shared network layer and the parameter for identifying respective layer in network layer;Shared network layer and identification network layer after calculation optimization parameter
Error between the predictive text information of output and known text information, using the error as worst error;
Iteration is performed worst error through gradient backpropagation, calculates each layer in shared network layer and identification network layer
Error;Go out the gradient of each layer parameter according to each layer of error calculation, network layer and identification net are shared according to gradient modification
The parameter of respective layer in network layers, until meeting default first training completion condition.
First training completion condition in above-described embodiment, including:
Error between the text information of prediction and the text information of mark is less than the first preset value;Or iteration prediction number
More than or equal to the first preset times.
In network training, the stop condition of network training can be judged according to error amount or according to iteration
Frequency of training is judged or thought by other skilled in the art can be with the stop condition of deconditioning, the present embodiment
The realization for facilitating those skilled in the art to the present embodiment method is only used for, is not used in limitation the present embodiment method.
Another embodiment of the training method of present invention detection identification network, on the basis of the various embodiments described above, behaviour
Make the text box information training detection network layer of the text box information and mark in 303 based on prediction, including:
Text box information based on prediction and the error between the text box information of mark to detect the parameter of network layer into
Row adjustment;
Iteration performs the text for being detected by adjusting the detection network layer after parameter to pending image and obtaining prediction
Frame information, until meeting the second training completion condition.
In the present embodiment, the parameter in detection network layer by reversed gradient method can also be trained, had
Body training process can include:Using the error predicted between text box information and known text frame information as worst error;It will
Worst error calculates detection network layer and (since shared network layer has trained, shares network layer at this time by gradient backpropagation
Parameter do not need to retraining) in each layer of error;Go out the gradient of each layer parameter according to each layer of error calculation, according to
The parameter of respective layer in gradient modification detection network layer;The prediction text box letter of detection network layer output after calculation optimization parameter
Error between breath and known text frame information, using the error as worst error;
Iteration is performed worst error through gradient backpropagation, calculates each layer in detection network layer of error;According to
Each layer of error calculation goes out the gradient of each layer parameter, and the parameter of respective layer in network layer is detected according to gradient modification, until
Meet default second training completion condition.
Second training completion condition in above-described embodiment, including:
Error between the text box information of prediction and the text box information of mark is less than the second preset value;Or iteration prediction
Number is greater than or equal to the second preset times.
In network training, the stop condition of network training can be judged according to error amount or according to iteration
Frequency of training is judged or thought by other skilled in the art can be with the stop condition of deconditioning, the present embodiment
The realization for facilitating those skilled in the art to the present embodiment method is only used for, is not used in limitation the present embodiment method.
Another embodiment of the training method of present invention detection identification network, on the basis of the various embodiments described above, inspection
Survey the classification information that layer feature includes each pixel in pending image;Wherein, classification information is used for through different information indicatings
Whether corresponding pixel is word classification;Optionally, classification information specifically can represent that non-legible classification and 1 represents text by 0
Word classification represents that non-legible classification and 0 represents word classification by 1.
Operation 303 includes:
Include the text box information of word by the pending image of classification information acquisition of each pixel in pending image.
Text box information includes:Text box classification information and text box location information;Text box classification information is used to represent
Whether word is included in text box;Text box location information includes bottom left upper in any pixel point to text box in pending image
Right distance and the rotation angle of text box.In the present embodiment, net is being identified to detection by the pending image of sample image
Before network is trained, the pending image to sample image is needed to be labeled, by every in image pending to sample image
The classification of a pixel is labeled, and to determine the position of text box, the classification usually marked includes text and non-textual (can use
1 and 0 mark), by being labelled with text and non-textual, it is possible to determine the corresponding text box information of text box for including text.
In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, by pending
The classification information of each pixel obtains the text box information that pending image includes word in image, including:
The length of pending image and width are narrowed down to setting ratio by the classification information based on pending image respectively, according to picture
Pending image is divided into multiple rectangle frames by plain position relationship;Classification information based on pixel each inside rectangle frame is denoted as
The rectangle frame of text information obtains text box;
Obtain the rotation of range information and text box of any pixel point apart from text box up and down in pending image
Angle information;
Text box location information and text box classification information based on acquisition obtain text box information.
Setting through this embodiment, by pending image labeling for only (classification information passes through 1 table including 1 and 0 image
Show that word classification, 0 represent non-legible classification or represents that non-legible classification, 0 represent word classification by 1), and in network class
In the process, it is possible to the problem of existence position is inaccurate, at this point, the length of text box and width are narrowed down to setting ratio respectively (such as:
It is long to be reduced into original 0.6 times with wide), shadow of the text position inaccuracy to algorithm can be reduced by reducing the size of text box
It rings;And the location information of determining text box is the minimum enclosed rectangle by finding text box, passes through the boundary rectangle
The range information of each pixel distance text frame up and down in text box is obtained, the angle information of text box is then to be based on this most
The rotation angle of small boundary rectangle and the positive rectangle placed.
The a still further embodiment of the training method of present invention detection identification network, on the basis of the various embodiments described above, behaviour
Make 302, including:
Text box information based on pending image labeling obtains corresponding text box feature, by text box feature with sharing
First inclusion layer feature of network layer output carries out Fusion Features;
Identify network layer based on the text information in the feature prediction text box after fusion.
Signified Fusion Features, are to connect together the inclusion layer feature of acquisition with detection layers feature in the present embodiment, this
Feature after sample fusion had not only included the inclusion layer feature of image, but also include the semantic feature of detection layers, can preferably be used for
Text detection and identification.
In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, based on pending
The text box information of image labeling obtains corresponding text box feature, including:
Perspective transform is carried out to the text box information of mark, text box is partitioned into from pending image, based on what is be partitioned into
Text box generates corresponding text box feature.
In the present embodiment, text box from artwork is deducted according to the location information manually marked and perspective change may be used
It changes, the arbitrary quadrilateral that will mark acquisition is deducted and transforms to a rectangle, to identify the input of network layer.Formula is as follows:
tx=l-x0
ty=t-y0
Scale=dath/(t+b)
datω=scale × (l+r)
Wherein, it inputs:T, b, l, r are certain vertical range of point away from quadrangle side up and down in arbitrary quadrilateral, and θ is should
Arbitrary quadrilateral rotation angle, dsth, dstwThe height and width of respectively set output rectangle picture, x0, y0Becoming for the point
The coordinate position of picture before changing.Output:Original image is multiplied by perspective transformation matrix M with matrix M, can directly obtain output figure
Piece, that is, the rectangle picture plucked out are used to identify network layer;The present embodiment meaning text box is characterized as text box characteristic pattern, based on obtaining
The text box respective pixel value obtained can obtain text box characteristic pattern.
In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, from pending figure
As being partitioned into text box, including:
The top left co-ordinate of text box is obtained according to text box location information;
The height of text box and the constant rate of width are kept, text box is zoomed in and out, makes the height one of each text box
It causes;
Rotation angle, top left co-ordinate and scaling structure perspective transformation matrix based on text box;
Based on perspective transformation matrix, the text box is partitioned into from pending image.
In the present embodiment, in order to build the top left co-ordinate of perspective transformation matrix, first acquisition text box, for the ease of
All text boxes are obtained, the height of all text boxes is adjusted to consistent, the text box after adjustment can be based on
One perspective transformation matrix is split.
In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, become based on perspective
Matrix is changed, text box is partitioned into from pending image, including:
Matrix multiplication operation is performed to perspective transformation matrix and pending image, obtains one and pending image size phase
Same segmentation image, each image of dividing only include a text box in the upper left corner.
In the present embodiment, a text box can only be partitioned into based on perspective transformation matrix every time, is become by mobile perspective
It changes matrix and performs matrix multiplication with pending image to obtain all text boxes.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
The relevant hardware of program instruction is completed, and aforementioned program can be stored in a computer read/write memory medium, the program
When being executed, step including the steps of the foregoing method embodiments is performed;And aforementioned storage medium includes:ROM, RAM, magnetic disc or light
The various media that can store program code such as disk.
Detection recognition method provided by the invention is all suitable for the language of different language, only need to be for different language
During training detection identification network, make to be trained using the word of languages to be treated, obtained detection identification network can
To realize detection and identification to the languages word.
Fig. 4 is the structure diagram of training device one embodiment that present invention detection identifies network.The dress of the embodiment
It puts available for realizing the above-mentioned each method embodiment of the present invention.As shown in figure 4, the device of the embodiment includes:
Image input units 41, for pending image to be inputted detection identification network.
Wherein, detection identification network includes shared network layer, detection network layer and identification network layer;Pending image labeling
There is a text box information and a text information that text box includes.
First training unit 42, for exporting the first inclusion layer feature through shared network layer;By the first inclusion layer feature and
The text box information input identification network layer of pending image labeling, the word letter that identified network layer prediction text box includes
Breath;The shared network layer of text information training and the identification network layer of text information and mark based on prediction, until meeting
First training completion condition.
Sharing feature is used to embody at least one of following characteristics in image:Wisp textural characteristics, edge feature, details
Feature.
Second training unit 43, for by the shared network layer after the input training of pending image, it is trained after it is shared
Network layer exports the second inclusion layer feature;Second inclusion layer feature is inputted into detection network layer, network layer, which is predicted, after testing waits to locate
The detection layers feature of image is managed, the text box information of word is included based on the pending image of detection layers feature acquisition;Based on pre-
The text box information of survey and the text box information training detection network layer of mark, until meeting the second training completion condition.
Based on the training device of a kind of detection identification network that the above embodiment of the present invention provides, pass through pending figure first
As the shared network layer of training and identification network layer, pending image is inputted into trained shared network layer and untrained detection
Network layer obtains prediction text box information, the text box information training detection network of text box information and mark based on prediction
Layer;When training detects network layer, shared network layer and detection branches layer are detected into network layer as a network, to this net
Network is trained, since shared network layer has trained, which just realizes to detection branches layer detection network layer
Training, trained shared network layer, identification branch layer identification network layer and detection branches layer detection network layer composing training are good
Detection identification network, obtained detection identification network can realize the detection and identification of word, and due to sharing network simultaneously
The presence of layer, reduces the feature extraction to image of repetition, has gently changed network structure, reduce the complexity in time and space,
Reduce model volume.
In the specific example that training device above-described embodiment of identification network is detected in the present invention, the first training is single
Member, specifically for the error between the text information based on prediction and the text information of mark to shared network layer and identification network
Network parameter values in layer are adjusted;Iteration performs treats place by adjusting the shared network layer after parameter and identification network layer
The text information for obtaining prediction is identified in reason image, until meeting the first training completion condition.
The default first training completion condition met in above-described embodiment, including:
Error between the text information of prediction and the text information of mark is less than the first preset value;Or iteration prediction number
More than or equal to the first preset times.
Another embodiment of the training device of present invention detection identification network, on the basis of the various embodiments described above, the
Two training units, specifically for the error between the text box information based on prediction and the text box information of mark to detecting network
The parameter of layer is adjusted;Iteration performs is detected pending image acquisition in advance by adjusting the detection network layer after parameter
The text box information of survey, until meeting default second training completion condition.
In the present embodiment, the parameter in detection network layer by reversed gradient method can also be trained, had
Body training process can include:Using the error predicted between text box information and known text frame information as worst error;It will
Worst error calculates detection network layer and (since shared network layer has trained, shares network layer at this time by gradient backpropagation
Parameter do not need to retraining) in each layer of error;Go out the gradient of each layer parameter according to each layer of error calculation, according to
The parameter of respective layer in gradient modification detection network layer;The prediction text box letter of detection network layer output after calculation optimization parameter
Error between breath and known text frame information, using the error as worst error;
Iteration is performed worst error through gradient backpropagation, calculates each layer in detection network layer of error;According to
Each layer of error calculation goes out the gradient of each layer parameter, and the parameter of respective layer in network layer is detected according to gradient modification, until
Meet the second training completion condition.
The default second training completion condition met in above-described embodiment, including:
Error between the text box information of prediction and the text box information of mark is less than the second preset value;Or iteration prediction
Number is greater than or equal to the second preset times.
Another embodiment of the training device of present invention detection identification network, on the basis of the various embodiments described above,
Detection layers feature includes the classification information of each pixel in pending image;Classification information is used to pass through different information
Indicate whether corresponding pixel is word classification;
Second training unit 43, specifically for obtaining pending image by the classification information of each pixel in pending image
Include the text box information of word.
Text box information includes:Text box classification information and text box location information;Text box classification information is used to represent
Whether word is included in text box;Text box location information includes bottom left upper in any pixel point to text box in pending image
Right distance and the rotation angle of text box.In the present embodiment, detection identification network is being instructed by pending image
It before white silk, needs to be labeled pending image, be labeled by the classification to pixel each in pending image, with true
Determine the position of text box, the classification usually marked includes text and non-textual (can use 1 and 0 mark), by being labelled with text
With it is non-textual, it is possible to determine to include the corresponding text box information of text box of text.
In the specific example that the training device the various embodiments described above of identification network are detected in the present invention,
Second training unit, including:
Text box obtains module, and the length of pending image and width contract respectively for the classification information based on pending image
It is small to arrive setting ratio, pending image is divided by multiple rectangle frames according to location of pixels relationship;Based on picture each inside rectangle frame
The rectangle frame that the classification information of element is denoted as text information obtains text box;
Data obtaining module, for obtaining distance letter of any pixel point apart from text box up and down in pending image
The rotation angle information of breath and text box;Text box location information and text box classification information based on acquisition obtain text box letter
Breath.
The a still further embodiment of the training device of present invention detection identification network, on the basis of the various embodiments described above, the
One training unit 42, including:
Characteristic extracting module, for obtaining corresponding text box feature based on the text box information of pending image labeling,
First inclusion layer feature of text box feature and the output of shared network layer is subjected to Fusion Features;
File prediction module, for identifying network layer based on the text information in the feature prediction text box after fusion.
Signified Fusion Features, are to connect together the inclusion layer feature of acquisition with detection layers feature in the present embodiment, this
Feature after sample fusion had not only included the inclusion layer feature of image, but also include the semantic feature of detection layers, can preferably be used for
Text detection and identification.
In the specific example that the training device the various embodiments described above of identification network are detected in the present invention, feature extraction mould
Specifically for carrying out perspective transform to the text box information of mark, text box is partitioned into from pending image for block, based on being partitioned into
Text box generate corresponding text box feature.
In the specific example that the training device the various embodiments described above of identification network are detected in the present invention, feature extraction mould
Block, including:
Zoom module, for obtaining the top left co-ordinate of text box according to text box location information;Keep the height of text box
The constant rate of degree and width, zooms in and out text box, makes the highly consistent of each text box;
Conversion module builds perspective transform square for the rotation angle based on text box, top left co-ordinate and scaling
Battle array;
Text box divides module, and for being based on perspective transformation matrix, text box is partitioned into from pending image.
In the specific example that the training device the various embodiments described above of identification network are detected in the present invention, text box segmentation
Module specifically for performing matrix multiplication operation to perspective transformation matrix and pending image, obtains one and pending image
The identical segmentation image of size, each image of dividing only include a text box in the upper left corner.
One side according to embodiments of the present invention, a kind of electronic equipment provided, including processor, processor includes this
Invent the detection identification device of any of the above-described embodiment or the training device of detection identification network.
One side according to embodiments of the present invention, a kind of electronic equipment provided, including:Memory, can for storing
Execute instruction;
And processor, for communicating to perform executable instruction any of the above-described implementation thereby completing the present invention with memory
The operation of example detection recognition method or the training method of detection identification network.
A kind of one side according to embodiments of the present invention, the computer storage media provided, can for storing computer
The instruction of reading, described instruction are performed any of the above-described embodiment detection recognition method of the execution present invention or detection identification network
Training method operation.
The embodiment of the present invention additionally provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down
Plate computer, server etc..Below with reference to Fig. 5, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present application or service
The structure diagram of the electronic equipment 500 of device:As shown in figure 5, computer system 500 includes one or more processors, communication
Portion etc., one or more of processors are for example:One or more central processing unit (CPU) 501 and/or one or more
Image processor (GPU) 513 etc., processor can according to the executable instruction being stored in read-only memory (ROM) 502 or
From the executable instruction that storage section 508 is loaded into random access storage device (RAM) 503 perform various appropriate actions and
Processing.Communication unit 512 may include but be not limited to network interface card, and the network interface card may include but be not limited to IB (Infiniband) network interface card.
Processor can communicate with read-only memory 502 and/or random access storage device 530 to perform executable instruction,
It is connected by bus 504 with communication unit 512 and is communicated through communication unit 512 with other target devices, is implemented so as to complete the application
The corresponding operation of any one method that example provides, for example, pending image is inputted detection identification network;It is defeated through shared network layer
Go out the inclusion layer feature of pending image;Inclusion layer feature is inputted into detection network layer, network layer exports pending figure after testing
The detection layers feature of picture includes the text box information of word based on the pending image of detection layers feature acquisition;By inclusion layer spy
Text box information of seeking peace input identification network layer, identified network layer export the word content in text box.
In addition, in RAM 503, it can also be stored with various programs and data needed for device operation.CPU501、ROM502
And RAM503 is connected with each other by bus 504.In the case where there is RAM503, ROM502 is optional module.RAM503 is stored
Executable instruction is written in executable instruction into ROM502 at runtime, and it is above-mentioned logical that executable instruction performs processor 501
The corresponding operation of letter method.Input/output (I/O) interface 505 is also connected to bus 504.Communication unit 512 can be integrally disposed,
It may be set to be with multiple submodule (such as multiple IB network interface cards), and in bus link.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 508 including hard disk etc.;
And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because
The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 510, as needed in order to be read from thereon
Computer program be mounted into storage section 508 as needed.
Need what is illustrated, framework as shown in Figure 5 is only a kind of optional realization method, can root during concrete practice
The component count amount and type of above-mentioned Fig. 5 are selected, are deleted, increased or replaced according to actual needs;It is set in different function component
Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection
Into on CPU, communication unit separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiments
Each fall within protection domain disclosed by the invention.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product, it is machine readable including being tangibly embodied in
Computer program on medium, computer program are included for the program code of the method shown in execution flow chart, program code
It may include the corresponding instruction of corresponding execution method and step provided by the embodiments of the present application, detected for example, pending image is inputted
Identify network;The inclusion layer feature of pending image is exported through sharing network layer;Inclusion layer feature is inputted into detection network layer, warp
The detection layers feature of the pending image of network output layer is detected, obtaining pending image based on detection layers feature includes word
Text box information;By inclusion layer feature and text box information input identification network layer, identified network layer is exported in text box
Word content.In such embodiments, which can be downloaded and pacified from network by communications portion 509
It fills and/or is mounted from detachable media 511.When the computer program is performed by central processing unit (CPU) 501, perform
The above-mentioned function of being limited in the present processes.
Methods and apparatus of the present invention, equipment may be achieved in many ways.For example, software, hardware, firmware can be passed through
Or any combinations of software, hardware, firmware realize methods and apparatus of the present invention, equipment.The step of for method
Sequence is stated merely to illustrate, the step of method of the invention is not limited to sequence described in detail above, unless with other
Mode illustrates.In addition, in some embodiments, the present invention can be also embodied as recording program in the recording medium, this
A little programs include being used to implement machine readable instructions according to the method for the present invention.Thus, the present invention also covering stores to hold
The recording medium of the program of row according to the method for the present invention.
Description of the invention provides for the sake of example and description, and is not exhaustively or will be of the invention
It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches
It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those of ordinary skill in the art is enable to manage
The solution present invention is so as to design the various embodiments with various modifications suitable for special-purpose.
Claims (10)
1. a kind of detection recognition method, which is characterized in that including:
Pending image is inputted into detection identification network;The detection identification network include shared network layer, detection network layer and
Identify network layer;
The inclusion layer feature of the pending image is exported through the shared network layer;The sharing feature is used to embody in image
Following characteristics it is at least one:Wisp textural characteristics, edge feature, minutia;
The inclusion layer feature is inputted into the detection network layer, the inspection of the pending image is exported through the detection network layer
Layer feature is surveyed, the text box information of word is included based on the detection layers feature acquisition pending image;
The inclusion layer feature and text box information are inputted into the identification network layer, the text is exported through the identification network layer
Word content in this frame.
2. according to the method described in claim 1, it is characterized in that, the detection layers feature is included in the pending image respectively
The classification information of pixel;Whether it is word classification that the classification information is used for through the different corresponding pixels of information indicating;
The text box information for including word based on the detection layers feature acquisition pending image, including:
Include the text box of word by the classification information acquisition pending image of each pixel in the pending image
Information, the text box information include:Text box classification information and text box location information;The text box classification information is used for
It whether represents in the text box comprising word;The text box location information includes any pixel point in the pending image
The rotation angle of distance and text box into the text box up and down.
3. the according to the method described in claim 2, it is characterized in that, classification by each pixel in the pending image
Pending image includes the text box information of word described in acquisition of information, including:
The length of the pending image and width are narrowed down to setting ratio by the classification information based on pending image respectively, according to picture
The pending image is divided into multiple rectangle frames by plain position relationship;Classification information based on pixel each inside the rectangle frame
The rectangle frame for being denoted as text information obtains text box;
Any pixel point in the pending image is obtained apart from text box range information up and down and the text
The rotation angle information of frame;
Text box location information and text box classification information based on the acquisition obtain the text box information.
4. according to any methods of claim 1-3, which is characterized in that described to believe the inclusion layer feature and text box
Breath inputs the identification network layer, and the text information in the text box is predicted through the identification network layer, including:
Text box information based on the output obtains corresponding text box feature, by the text box feature and the shared net
The inclusion layer feature of network layers output carries out Fusion Features;
The identification network layer predicts the text information in the text box based on the feature after fusion.
5. a kind of training method for detecting identification network, which is characterized in that including:
Pending image is inputted into detection identification network;Wherein, the detection identification network includes shared network layer, detection network
Layer and identification network layer;The text information that the pending image labeling has a text box information and text box includes;
The first inclusion layer feature is exported through the shared network layer;By the first inclusion layer feature and the pending image mark
The text box information of note inputs the identification network layer, predicts that the word that the text box includes is believed through the identification network layer
Breath;The text information of text information and mark based on prediction trains the shared network layer and the identification network layer, until
Meet the first training completion condition, the sharing feature is used to embody at least one of following characteristics in image:Wisp texture
Feature, edge feature, minutia;
By the shared network layer after the input training of pending image, the shared network layer after the training exports the second inclusion layer
Feature;The second inclusion layer feature is inputted into the detection network layer, the pending figure is predicted through the detection network layer
The detection layers feature of picture includes the text box information of word based on the detection layers feature acquisition pending image;Base
The detection network layer is trained in the text box information of prediction and the text box information of mark, until meeting the second training completes item
Part.
6. a kind of detection identification device, which is characterized in that including:
Input unit, for pending image to be inputted detection identification network;The detection identification network include shared network layer,
Detect network layer and identification network layer;
Low layer extraction unit, it is described shared for exporting the inclusion layer feature of the pending image through the shared network layer
Feature is used to embody at least one of following characteristics in image:Wisp textural characteristics, edge feature, minutia;
Text box detection unit, it is defeated through the detection network layer for the inclusion layer feature to be inputted the detection network layer
Go out the detection layers feature of the pending image, obtaining the pending image based on the detection layers feature includes word
Text box information;
Word recognition unit, for the inclusion layer feature and text box information to be inputted the identification network layer, through the knowledge
Other network layer exports the word content in the text box.
7. a kind of training device for detecting identification network, which is characterized in that including:
Image input units, for pending image to be inputted detection identification network;Wherein, the detection identification network is included altogether
Enjoy network layer, detection network layer and identification network layer;The pending image labeling has text box information and text box to include
Text information;
First training unit, for exporting the first inclusion layer feature through the shared network layer;By the first inclusion layer feature
The identification network layer is inputted with the text box information of the pending image labeling, the text is predicted through the identification network layer
The text information that this frame includes;The text information of text information based on prediction and mark trains the shared network layer and described
Identify network layer, until meet the first training completion condition, the sharing feature is for embodying in image following characteristics at least
One:Wisp textural characteristics, edge feature, minutia;
Second training unit, for pending image to be inputted to the shared network layer after training, the shared net after the training
Network layers export the second inclusion layer feature;The second inclusion layer feature is inputted into the detection network layer, through the detection network
Layer predicts the detection layers feature of the pending image, and obtaining the pending image based on the detection layers feature includes text
The text box information of word;The text box information of text box information and mark based on prediction trains the detection network layer, until
Meet the second training completion condition.
8. a kind of electronic equipment, which is characterized in that including processor, the detection that the processor includes described in claim 6 is known
The training device of detection identification network described in other device or claim 7.
9. a kind of electronic equipment, which is characterized in that including:Memory, for storing executable instruction;
And processor, appointed for communicating with the memory with performing the executable instruction so as to complete Claims 1-4
The operation of the training method of detection identification network described in a detection recognition method of anticipating or claim 5.
10. a kind of computer storage media, for storing computer-readable instruction, which is characterized in that described instruction is held
Perform claim requires detection recognition method described in 1 to 4 any one or the instruction of the detection identification network described in claim 5 during row
Practice the operation of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711126372.9A CN108229303B (en) | 2017-11-14 | 2017-11-14 | Detection recognition and training method, device, equipment and medium for detection recognition network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711126372.9A CN108229303B (en) | 2017-11-14 | 2017-11-14 | Detection recognition and training method, device, equipment and medium for detection recognition network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108229303A true CN108229303A (en) | 2018-06-29 |
CN108229303B CN108229303B (en) | 2021-05-04 |
Family
ID=62655785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711126372.9A Active CN108229303B (en) | 2017-11-14 | 2017-11-14 | Detection recognition and training method, device, equipment and medium for detection recognition network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108229303B (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325494A (en) * | 2018-08-27 | 2019-02-12 | 腾讯科技(深圳)有限公司 | Image processing method, task data treating method and apparatus |
CN109635805A (en) * | 2018-12-11 | 2019-04-16 | 上海智臻智能网络科技股份有限公司 | Image text location method and device, image text recognition methods and device |
CN109670458A (en) * | 2018-12-21 | 2019-04-23 | 北京市商汤科技开发有限公司 | A kind of licence plate recognition method and device |
CN109858420A (en) * | 2019-01-24 | 2019-06-07 | 国信电子票据平台信息服务有限公司 | A kind of bill processing system and processing method |
CN110135427A (en) * | 2019-04-11 | 2019-08-16 | 北京百度网讯科技有限公司 | The method, apparatus, equipment and medium of character in image for identification |
CN110458164A (en) * | 2019-08-07 | 2019-11-15 | 深圳市商汤科技有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN110458011A (en) * | 2019-07-05 | 2019-11-15 | 北京百度网讯科技有限公司 | Character recognition method and device, computer equipment and readable medium end to end |
CN110704619A (en) * | 2019-09-24 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | Text classification method and device and electronic equipment |
WO2020024939A1 (en) * | 2018-08-01 | 2020-02-06 | 北京京东尚科信息技术有限公司 | Text region identification method and device |
CN110781925A (en) * | 2019-09-29 | 2020-02-11 | 支付宝(杭州)信息技术有限公司 | Software page classification method and device, electronic equipment and storage medium |
CN111259846A (en) * | 2020-01-21 | 2020-06-09 | 第四范式(北京)技术有限公司 | Text positioning method and system and text positioning model training method and system |
CN111639639A (en) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | Method, device, equipment and storage medium for detecting text area |
CN111985469A (en) * | 2019-05-22 | 2020-11-24 | 珠海金山办公软件有限公司 | Method and device for recognizing characters in image and electronic equipment |
CN112101165A (en) * | 2020-09-07 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Interest point identification method and device, computer equipment and storage medium |
CN112101477A (en) * | 2020-09-23 | 2020-12-18 | 创新奇智(西安)科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN112308051A (en) * | 2020-12-29 | 2021-02-02 | 北京易真学思教育科技有限公司 | Text box detection method and device, electronic equipment and computer storage medium |
CN112446262A (en) * | 2019-09-02 | 2021-03-05 | 深圳中兴网信科技有限公司 | Text analysis method, text analysis device, text analysis terminal and computer-readable storage medium |
CN113449559A (en) * | 2020-03-26 | 2021-09-28 | 顺丰科技有限公司 | Table identification method and device, computer equipment and storage medium |
CN113762292A (en) * | 2020-06-03 | 2021-12-07 | 杭州海康威视数字技术股份有限公司 | Training data acquisition method and device and model training method and device |
TWI807467B (en) * | 2021-11-02 | 2023-07-01 | 中國信託商業銀行股份有限公司 | Key-item detection model building method, business-oriented key-value identification system and method |
CN110796133B (en) * | 2018-08-01 | 2024-05-24 | 北京京东尚科信息技术有限公司 | Text region identification method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130044945A1 (en) * | 2011-08-15 | 2013-02-21 | Vistaprint Technologies Limited | Method and system for detecting text in raster images |
US20130136359A1 (en) * | 2010-03-31 | 2013-05-30 | Microsoft Corporation | Segmentation of textual lines in an image that include western characters and hieroglyphic characters |
US20140169683A1 (en) * | 2012-12-18 | 2014-06-19 | Samsung Electronics Co., Ltd. | Image retrieval method, real-time drawing prompting method, and devices thereof |
CN106407971A (en) * | 2016-09-14 | 2017-02-15 | 北京小米移动软件有限公司 | Text recognition method and device |
CN106446899A (en) * | 2016-09-22 | 2017-02-22 | 北京市商汤科技开发有限公司 | Text detection method and device and text detection training method and device |
CN106778852A (en) * | 2016-12-07 | 2017-05-31 | 中国科学院信息工程研究所 | A kind of picture material recognition methods for correcting erroneous judgement |
-
2017
- 2017-11-14 CN CN201711126372.9A patent/CN108229303B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130136359A1 (en) * | 2010-03-31 | 2013-05-30 | Microsoft Corporation | Segmentation of textual lines in an image that include western characters and hieroglyphic characters |
US20130044945A1 (en) * | 2011-08-15 | 2013-02-21 | Vistaprint Technologies Limited | Method and system for detecting text in raster images |
US20140169683A1 (en) * | 2012-12-18 | 2014-06-19 | Samsung Electronics Co., Ltd. | Image retrieval method, real-time drawing prompting method, and devices thereof |
CN106407971A (en) * | 2016-09-14 | 2017-02-15 | 北京小米移动软件有限公司 | Text recognition method and device |
CN106446899A (en) * | 2016-09-22 | 2017-02-22 | 北京市商汤科技开发有限公司 | Text detection method and device and text detection training method and device |
CN106778852A (en) * | 2016-12-07 | 2017-05-31 | 中国科学院信息工程研究所 | A kind of picture material recognition methods for correcting erroneous judgement |
Non-Patent Citations (2)
Title |
---|
XINYU ZHOU ET AL.: "EAST: An Efficient and Accurate Scene Text Detector", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
傅泽田 等: "《面向移动终端的农业信息智能获取》", 30 September 2015 * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020024939A1 (en) * | 2018-08-01 | 2020-02-06 | 北京京东尚科信息技术有限公司 | Text region identification method and device |
CN110796133B (en) * | 2018-08-01 | 2024-05-24 | 北京京东尚科信息技术有限公司 | Text region identification method and device |
US11763167B2 (en) | 2018-08-01 | 2023-09-19 | Bejing Jingdong Shangke Information Technology Co, Ltd. | Copy area identification method and device |
EP3812965A4 (en) * | 2018-08-01 | 2022-03-30 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Text region identification method and device |
CN110796133A (en) * | 2018-08-01 | 2020-02-14 | 北京京东尚科信息技术有限公司 | Method and device for identifying file area |
CN109325494A (en) * | 2018-08-27 | 2019-02-12 | 腾讯科技(深圳)有限公司 | Image processing method, task data treating method and apparatus |
WO2020043057A1 (en) * | 2018-08-27 | 2020-03-05 | 腾讯科技(深圳)有限公司 | Image processing method, and task data processing method and device |
CN109635805B (en) * | 2018-12-11 | 2022-01-11 | 上海智臻智能网络科技股份有限公司 | Image text positioning method and device and image text identification method and device |
CN109635805A (en) * | 2018-12-11 | 2019-04-16 | 上海智臻智能网络科技股份有限公司 | Image text location method and device, image text recognition methods and device |
CN109670458A (en) * | 2018-12-21 | 2019-04-23 | 北京市商汤科技开发有限公司 | A kind of licence plate recognition method and device |
CN109858420A (en) * | 2019-01-24 | 2019-06-07 | 国信电子票据平台信息服务有限公司 | A kind of bill processing system and processing method |
CN111639639B (en) * | 2019-03-01 | 2023-05-02 | 杭州海康威视数字技术股份有限公司 | Method, device, equipment and storage medium for detecting text area |
CN111639639A (en) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | Method, device, equipment and storage medium for detecting text area |
CN110135427A (en) * | 2019-04-11 | 2019-08-16 | 北京百度网讯科技有限公司 | The method, apparatus, equipment and medium of character in image for identification |
CN111985469B (en) * | 2019-05-22 | 2024-03-19 | 珠海金山办公软件有限公司 | Method and device for recognizing characters in image and electronic equipment |
CN111985469A (en) * | 2019-05-22 | 2020-11-24 | 珠海金山办公软件有限公司 | Method and device for recognizing characters in image and electronic equipment |
US11210546B2 (en) | 2019-07-05 | 2021-12-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | End-to-end text recognition method and apparatus, computer device and readable medium |
CN110458011A (en) * | 2019-07-05 | 2019-11-15 | 北京百度网讯科技有限公司 | Character recognition method and device, computer equipment and readable medium end to end |
CN110458164A (en) * | 2019-08-07 | 2019-11-15 | 深圳市商汤科技有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN112446262A (en) * | 2019-09-02 | 2021-03-05 | 深圳中兴网信科技有限公司 | Text analysis method, text analysis device, text analysis terminal and computer-readable storage medium |
CN110704619A (en) * | 2019-09-24 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | Text classification method and device and electronic equipment |
CN110781925B (en) * | 2019-09-29 | 2023-03-10 | 支付宝(杭州)信息技术有限公司 | Software page classification method and device, electronic equipment and storage medium |
CN110781925A (en) * | 2019-09-29 | 2020-02-11 | 支付宝(杭州)信息技术有限公司 | Software page classification method and device, electronic equipment and storage medium |
CN111259846A (en) * | 2020-01-21 | 2020-06-09 | 第四范式(北京)技术有限公司 | Text positioning method and system and text positioning model training method and system |
WO2021147817A1 (en) * | 2020-01-21 | 2021-07-29 | 第四范式(北京)技术有限公司 | Text positioning method and system, and text positioning model training method and system |
CN111259846B (en) * | 2020-01-21 | 2024-04-02 | 第四范式(北京)技术有限公司 | Text positioning method and system and text positioning model training method and system |
CN113449559A (en) * | 2020-03-26 | 2021-09-28 | 顺丰科技有限公司 | Table identification method and device, computer equipment and storage medium |
CN113762292A (en) * | 2020-06-03 | 2021-12-07 | 杭州海康威视数字技术股份有限公司 | Training data acquisition method and device and model training method and device |
CN113762292B (en) * | 2020-06-03 | 2024-02-02 | 杭州海康威视数字技术股份有限公司 | Training data acquisition method and device and model training method and device |
CN112101165B (en) * | 2020-09-07 | 2022-07-15 | 腾讯科技(深圳)有限公司 | Interest point identification method and device, computer equipment and storage medium |
CN112101165A (en) * | 2020-09-07 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Interest point identification method and device, computer equipment and storage medium |
CN112101477A (en) * | 2020-09-23 | 2020-12-18 | 创新奇智(西安)科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN112308051B (en) * | 2020-12-29 | 2021-10-29 | 北京易真学思教育科技有限公司 | Text box detection method and device, electronic equipment and computer storage medium |
CN112308051A (en) * | 2020-12-29 | 2021-02-02 | 北京易真学思教育科技有限公司 | Text box detection method and device, electronic equipment and computer storage medium |
TWI807467B (en) * | 2021-11-02 | 2023-07-01 | 中國信託商業銀行股份有限公司 | Key-item detection model building method, business-oriented key-value identification system and method |
Also Published As
Publication number | Publication date |
---|---|
CN108229303B (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229303A (en) | Detection identification and the detection identification training method of network and device, equipment, medium | |
US9349076B1 (en) | Template-based target object detection in an image | |
CN109658455A (en) | Image processing method and processing equipment | |
WO2020078119A1 (en) | Method, device and system for simulating user wearing clothing and accessories | |
CN108229280A (en) | Time domain motion detection method and system, electronic equipment, computer storage media | |
TWI821671B (en) | A method and device for positioning text areas | |
CN108229341A (en) | Sorting technique and device, electronic equipment, computer storage media, program | |
CN111680678B (en) | Target area identification method, device, equipment and readable storage medium | |
CN109934173A (en) | Expression recognition method, device and electronic equipment | |
US11164306B2 (en) | Visualization of inspection results | |
CN108805222A (en) | A kind of deep learning digital handwriting body recognition methods based on ARM platforms | |
US20220343683A1 (en) | Expression Recognition Method and Apparatus, Computer Device, and Readable Storage Medium | |
CN110413816A (en) | Colored sketches picture search | |
CN115861462B (en) | Training method and device for image generation model, electronic equipment and storage medium | |
Montserrat et al. | Logo detection and recognition with synthetic images | |
CN111522979A (en) | Picture sorting recommendation method and device, electronic equipment and storage medium | |
CN108268629A (en) | Image Description Methods and device, equipment, medium, program based on keyword | |
CN117094362B (en) | Task processing method and related device | |
CN113869371A (en) | Model training method, clothing fine-grained segmentation method and related device | |
CN116361502B (en) | Image retrieval method, device, computer equipment and storage medium | |
CN113537187A (en) | Text recognition method and device, electronic equipment and readable storage medium | |
Amador et al. | Benchmarking head pose estimation in-the-wild | |
CN108230332A (en) | The treating method and apparatus of character image, electronic equipment, computer storage media | |
CN111753736A (en) | Human body posture recognition method, device, equipment and medium based on packet convolution | |
Wang et al. | Self-attention deep saliency network for fabric defect detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |