CN108229303A

CN108229303A - Detection identification and the detection identification training method of network and device, equipment, medium

Info

Publication number: CN108229303A
Application number: CN201711126372.9A
Authority: CN
Inventors: 刘学博; 梁鼎
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2018-06-29
Anticipated expiration: 2037-11-14
Also published as: CN108229303B

Abstract

The embodiment of the invention discloses a kind of detection identification and the detection identification training method of network and device, equipment, medium, wherein, detection recognition method includes：Pending image is inputted into detection identification network；The detection identification network includes shared network layer, detection network layer and identification network layer；The inclusion layer feature of the pending image is exported through the shared network layer；The inclusion layer feature is inputted into the detection network layer, the detection layers feature of the pending image is exported through the detection network layer, the text box information of word is included based on the detection layers feature acquisition pending image；The inclusion layer feature and text box information are inputted into the identification network layer, the word content in the text box is exported through the identification network layer.The embodiment of the present invention reduces repetition and carries out feature extraction to image, improves treatment effeciency；Improve text detection and the efficiency and speed of identification.

Description

Detection identification and the detection identification training method of network and device, equipment, medium

Technical field

The present invention relates to computer vision technique, especially a kind of training method for detecting identification and detection identification network and Device, equipment, medium.

Background technology

Text detection and identification under natural scene are the major issues that image understanding and image restore field.Accurate text This detection and identification can be used in the picture search under many problems, such as large data sets, automatic translation, blind person's guiding, machine People's navigation etc..

However the text detection and identification under natural scene are challenges of having very much, different background scenes, low resolution, Different fonts, different illumination conditions, different size scale, different inclined directions, obscure etc. factors, all so that the problem become ten Divide complicated and difficult.

Invention content

The embodiment of the present invention provides a kind of character recognition technology scheme.

One side according to embodiments of the present invention, a kind of detection recognition method provided, including：

Pending image is inputted into detection identification network；The detection identification network includes shared network layer, detection network Layer and identification network layer；

The inclusion layer feature of the pending image is exported through the shared network layer, the sharing feature is schemed for embodying Following characteristics is at least one as in：Wisp textural characteristics, edge feature, minutia；

The inclusion layer feature is inputted into the detection network layer, the pending image is exported through the detection network layer Detection layers feature, the text box information that the pending image includes word is obtained based on the detection layers feature；

The inclusion layer feature and text box information are inputted into the identification network layer, through the identification network layer output institute State the word content in text box.

In another embodiment based on the above method of the present invention, the detection layers feature includes the pending image In each pixel classification information；Whether it is word class that the classification information is used for through the different corresponding pixels of information indicating Not；

The text box information for including word based on the detection layers feature acquisition pending image, including：

Include the text of word by the classification information acquisition pending image of each pixel in the pending image This frame information, the text box information include：Text box classification information and text box location information；The text box classification information For whether representing in the text box comprising word；The text box location information includes any picture in the pending image The rotation angle of distance and text box in vegetarian refreshments to the text box up and down.

It is described to pass through each pixel in the pending image in another embodiment based on the above method of the present invention Classification information obtains the text box information that the pending image includes word, including：

The length of the pending image and width are narrowed down to setting ratio, root by the classification information based on pending image respectively The pending image is divided into multiple rectangle frames according to location of pixels relationship；Classification based on pixel each inside the rectangle frame The rectangle frame that information is denoted as text information obtains text box；

Any pixel point in the pending image is obtained apart from text box range information up and down and described The rotation angle information of text box；

Text box location information and text box classification information based on the acquisition obtain the text box information.

It is described by the inclusion layer feature and text box information in another embodiment based on the above method of the present invention The identification network layer is inputted, the text information in the text box is predicted through the identification network layer, including：

Text box information based on the output obtains corresponding text box feature, by the text box feature with it is described common The inclusion layer feature for enjoying network layer output carries out Fusion Features；

The identification network layer predicts the text information in the text box based on the feature after fusion.

In another embodiment based on the above method of the present invention, the text box information based on the output obtains Corresponding text box feature, including：

Perspective transform is carried out to the text box information, text box is partitioned into from pending image, is partitioned into based on described Text box generate corresponding text box feature.

It is described to be partitioned into text box from pending image in another embodiment based on the above method of the present invention, packet It includes：

The top left co-ordinate of the text box is obtained according to text box location information；

The height of the text box and the constant rate of width are kept, the text box is zoomed in and out, makes each text This frame it is highly consistent；

Rotation angle based on the text box, the top left co-ordinate and scaling structure perspective transformation matrix；

Based on the perspective transformation matrix, the text box is partitioned into from the pending image.

It is described based on the perspective transformation matrix in another embodiment based on the above method of the present invention, from described The text box is partitioned into pending image, including：

Matrix multiplication operation is performed to the perspective transformation matrix and the pending image, one is obtained and waits to locate with described The identical segmentation image of image size is managed, each segmentation image only includes a text box in the upper left corner.

Other side according to embodiments of the present invention, the training method of a kind of detection identification network provided, including：

Pending image is inputted into detection identification network；Wherein, the detection identification network includes shared network layer, detection Network layer and identification network layer；The text information that the pending image labeling has a text box information and text box includes；

The first inclusion layer feature is exported through the shared network layer；By the first inclusion layer feature and the pending figure As the text box information input identification network layer of mark, the word included through the identification network layer prediction text box Information；The text information of text information and mark based on prediction trains the shared network layer and the identification network layer, directly Completion condition is trained to meeting first；The sharing feature is used to embody at least one of following characteristics in image：Wisp line Manage feature, edge feature, minutia；

By the shared network layer after the input training of pending image, the shared network layer output second after the training is common Enjoy a layer feature；The second inclusion layer feature is inputted into the detection network layer, through waiting to locate described in the detection network layer prediction The detection layers feature of image is managed, the text box letter of word is included based on the detection layers feature acquisition pending image Breath；The text box information of text box information and mark based on prediction trains the detection network layer, until meeting the second training Completion condition.

In another embodiment based on the above method of the present invention, the word letter of text information and mark based on prediction The breath training shared network layer and the identification network layer, until meet the first training completion condition, including：

Text information based on prediction and the error between the text information of mark are to the shared network layer and identification net Network parameter values in network layers are adjusted；

Iteration performs to be identified pending image and obtains by adjusting the shared network layer after parameter and identification network layer The text information that must be predicted, until meeting the first training completion condition.

In another embodiment based on the above method of the present invention, the first training completion condition, including：

Error between the text information of the prediction and the text information of mark is less than the first preset value；Or iteration prediction Number is greater than or equal to the first preset times.

In another embodiment based on the above method of the present invention, the text box information based on prediction and mark Text box information trains the detection network layer, until meet the second training completion condition, including：

Text box information based on prediction and the error between the text box information of mark are to the ginseng for detecting network layer Number is adjusted；

Iteration performs the text for being detected by adjusting the detection network layer after parameter to pending image and obtaining prediction Frame information, until meeting the second training completion condition.

In another embodiment based on the above method of the present invention, the second training completion condition, including：

Error between the text box information of the prediction and the text box information of mark is less than the second preset value；Or iteration Predict that number is greater than or equal to the second preset times.

The text box information for including word based on the detection layers feature acquisition pending image includes：

The length of the pending image and width are narrowed down to setting ratio by the classification information based on the pending image respectively Example, multiple rectangle frames are divided into according to location of pixels relationship by the pending image；Based on pixel each inside the rectangle frame Classification information be denoted as text information rectangle frame obtain text box；

In another embodiment based on the above method of the present invention, predicted in the text box through the identification network layer Including text information, including：

Text box information based on the pending image labeling obtains corresponding text box feature, by text box spy Sign and the first inclusion layer feature of the shared network layer output carry out Fusion Features；

In another embodiment based on the above method of the present invention, the text box letter based on the pending image labeling Breath obtains corresponding text box feature, including：

Perspective transform is carried out to the text box information of the mark, text box is partitioned into from the pending image, is based on The text box being partitioned into generates corresponding text box feature.

In another embodiment based on the above method of the present invention, text box is partitioned into from the pending image, is wrapped It includes：

In another embodiment based on the above method of the present invention, based on the perspective transformation matrix, from described treat The text box is partitioned into reason image, including：

Other side according to embodiments of the present invention, a kind of detection identification device provided, including：

Input unit, for pending image to be inputted detection identification network；The detection identification network includes shared net Network layers, detection network layer and identification network layer；

Low layer extraction unit, for exporting the inclusion layer feature of the pending image through the shared network layer；It is described Sharing feature is used to embody at least one of following characteristics in image：Wisp textural characteristics, edge feature, minutia；

Text box detection unit, for the inclusion layer feature to be inputted the detection network layer, through the detection network Layer exports the detection layers feature of the pending image, and obtaining the pending image based on the detection layers feature includes text The text box information of word；

Word recognition unit, for the inclusion layer feature and text box information to be inputted the identification network layer, through institute State the word content in the identification network layer output text box.

In another embodiment based on above device of the present invention, the detection layers feature includes the pending image In each pixel classification information；Whether it is word class that the classification information is used for through the different corresponding pixels of information indicating Not；

The text box detection unit, specifically for obtaining institute by the classification information of each pixel in the pending image The text box information that pending image includes word is stated, the text box information includes：Text box classification information and text box Location information；Whether the text box classification information is used to represent in the text box comprising word；The text box position letter Breath includes the rotation angle of the distance and text box in the pending image in any pixel point to the text box up and down Degree.

In another embodiment based on above device of the present invention, the text box detection unit, including：

Text box obtains module, divides the length of the pending image and width for the classification information based on pending image Setting ratio is not narrowed down to, the pending image is divided by multiple rectangle frames according to location of pixels relationship；Based on the square The classification information of each pixel is denoted as the rectangle frame acquisition text box of text information inside shape frame；

Data obtaining module, for obtaining in the pending image any pixel point apart from the text box up and down Range information and the text box rotation angle information；Text box location information and text box classification based on the acquisition Text box information described in acquisition of information.

In another embodiment based on above device of the present invention, the word recognition unit, including：

Characteristic extracting module, for obtaining corresponding text box feature based on the text box information of the output, by described in Text box feature and the inclusion layer feature of the shared network layer output carry out Fusion Features；

Word-predictor module, for the identification network layer based on the word in the feature prediction text box after fusion Information.

In another embodiment based on above device of the present invention, the characteristic extracting module, specifically for described Text box information carries out perspective transform, and text box is partitioned into from pending image, based on the text box generation pair being partitioned into The text box feature answered.

In another embodiment based on above device of the present invention, the characteristic extracting module, including：

Zoom module, for obtaining the top left co-ordinate of the text box according to text box location information；Keep the text The height of this frame and the constant rate of width, zoom in and out the text box, make the highly consistent of each text box；

Conversion module, it is saturating for the rotation angle based on the text box, the top left co-ordinate and scaling structure Depending on transformation matrix；

Text box divides module, for being based on the perspective transformation matrix, is partitioned into from the pending image described Text box.

In another embodiment based on above device of the present invention, the text box divides module, specifically for institute It states perspective transformation matrix and performs matrix multiplication operation with the pending image, obtain one and the pending image size phase Same segmentation image, each segmentation image only include a text box in the upper left corner.

Other side according to embodiments of the present invention, the training device of a kind of detection identification network provided, including：

Image input units, for pending image to be inputted detection identification network；Wherein, the detection identification network packet Include shared network layer, detection network layer and identification network layer；The pending image labeling has in text box information and text box Including text information；

First training unit, for exporting the first inclusion layer feature through the shared network layer；By first inclusion layer Feature and the text box information of the pending image labeling input the identification network layer, through the identification network layer prediction institute State the text information that text box includes；The text information of text information based on prediction and mark train the shared network layer and The identification network layer, until meeting the first training completion condition；The sharing feature is used to embody following characteristics in image It is at least one：Wisp textural characteristics, edge feature, minutia；

Second training unit, for pending image to be inputted to the shared network layer after training, being total to after the training It enjoys network layer and exports the second inclusion layer feature；The second inclusion layer feature is inputted into the detection network layer, through the detection Network layer predicts the detection layers feature of the pending image, is obtained in the pending image and wrapped based on the detection layers feature Include the text box information of word；The text box information of text box information and mark based on prediction trains the detection network layer, Until meet the second training completion condition.

In another embodiment based on above device of the present invention, first training unit is pre- specifically for being based on Error between the text information of survey and the text information of mark joins the network in the shared network layer and identification network layer Numerical value is adjusted；Iteration, which performs, knows pending image by adjusting the shared network layer after parameter and identification network layer The text information of prediction is not obtained, until meeting the first training completion condition.

In another embodiment based on above device of the present invention, the first training completion condition, including：

In another embodiment based on above device of the present invention, second training unit is pre- specifically for being based on Error between the text box information of survey and the text box information of mark is adjusted the parameter of the detection network layer；Iteration The text box information for being detected by adjusting the detection network layer after parameter to pending image and obtaining prediction is performed, until full Foot second trains completion condition.

In another embodiment based on above device of the present invention, the second training completion condition, including：

Second training unit, specifically for described in the classification information acquisition by each pixel in the pending image Pending image includes the text box information of word；The text box information includes：Text box classification information and text box position Confidence ceases；Whether the text box classification information is used to represent in the text box comprising word；The text box location information Including the distance and the rotation angle of text box in any pixel point to the text box in the pending image up and down.

In another embodiment based on above device of the present invention, second training unit, including：

Text box obtains module, for the classification information based on the pending image by the length of the pending image and Width narrows down to setting ratio respectively, and the pending image is divided into multiple rectangle frames according to location of pixels relationship；Based on institute The classification information for stating each pixel inside rectangle frame is denoted as the rectangle frame acquisition text box of text information；

In another embodiment based on above device of the present invention, first training unit, including：

Characteristic extracting module, it is special for obtaining corresponding text box based on the text box information of the pending image labeling The text box feature and the first inclusion layer feature of the shared network layer output are carried out Fusion Features by sign；

File prediction module, for the identification network layer based on the word in the feature prediction text box after fusion Information.

In another embodiment based on above device of the present invention, the characteristic extracting module, specifically for described The text box information of mark carries out perspective transform, text box is partitioned into from the pending image, based on the text being partitioned into This frame generates corresponding text box feature.

Other side according to embodiments of the present invention, a kind of electronic equipment provided, including processor, the processor Training device including detecting identification device or detection identification network as described above as described above.

One side according to embodiments of the present invention, a kind of electronic equipment provided, which is characterized in that including：Memory, For storing executable instruction；

And processor, it completes to examine as described above to perform the executable instruction for communicating with the memory The operation of the training method of detection identifying method or detection identification network as described above.

A kind of one side according to embodiments of the present invention, the computer storage media provided, can for storing computer The instruction of reading, which is characterized in that described instruction, which is performed, performs detection recognition method as described above or inspection as described above Survey the operation of the training method of identification network.

The training method and device of a kind of detection identification and detection identification network based on the above embodiment of the present invention offer, Pending image is inputted detection identification network by equipment, medium；The inclusion layer spy of pending image is exported through sharing network layer Sign；The inclusion layer feature exported by sharing network layer reduces repetition and carries out feature extraction to image, improves treatment effeciency； Inclusion layer feature is inputted into detection network layer, network exports the text box information that pending image includes word after testing；It will Inclusion layer feature and text box information input identification network layer, identified network layer export the word content in text box；Pass through One detection identifies the identification of the text information in the real-time performance detection and text box of text box information；Improve word knowledge Other efficiency and speed.

Below by drawings and examples, technical scheme of the present invention is described in further detail.

Description of the drawings

The attached drawing of a part for constitution instruction describes the embodiment of the present invention, and is used to explain together with description The principle of the present invention.

With reference to attached drawing, according to following detailed description, the present invention can be more clearly understood, wherein：

Fig. 1 is the flow chart of detection recognition method one embodiment of the present invention.

Fig. 2 is the structure diagram of present invention detection identification device one embodiment.

Fig. 3 is the flow chart of training method one embodiment that present invention detection identifies network.

Fig. 4 is the structure diagram of training device one embodiment that present invention detection identifies network.

Fig. 5 is the structure diagram for realizing the terminal device of the embodiment of the present application or the electronic equipment of server.

Specific embodiment

Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should be noted that：Unless in addition have Body illustrates that the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of invention.

Simultaneously, it should be appreciated that for ease of description, the size of the various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

It is illustrative to the description only actually of at least one exemplary embodiment below, is never used as to the present invention And its application or any restrictions that use.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It should be noted that：Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need to that it is further discussed.

The embodiment of the present invention can be applied to computer system/server, can be with numerous other general or specialized calculating System environments or configuration operate together.Suitable for be used together with computer system/server well-known computing system, ring The example of border and/or configuration includes but not limited to：Personal computer system, server computer system, thin client, thick client Machine, hand-held or laptop devices, the system based on microprocessor, set-top box, programmable consumer electronics, NetPC Network PC, Little types Ji calculates machine Xi Tong ﹑ large computer systems and the distributed cloud computing technology environment including any of the above described system, etc..

Computer system/server can be in computer system executable instruction (such as journey performed by computer system Sequence module) general linguistic context under describe.In general, program module can include routine, program, target program, component, logic, number According to structure etc., they perform specific task or realize specific abstract data type.Computer system/server can be with Implement in distributed cloud computing environment, in distributed cloud computing environment, task is long-range by what is be linked through a communication network Manage what equipment performed.In distributed cloud computing environment, program module can be located at the Local or Remote meter for including storage device It calculates in system storage medium.

In the prior art, the outstanding method of most of effects is divided into the detection and identification of text all using deep learning Two parts processing, i.e., first carry out text detection to whole pictures, the location information of different texts obtained, then according to location information The text detected is deducted and is identified.

In the implementation of the present invention, inventor has found, the prior art has at least the following problems：

1. by the detection of text and identification be divided into two parts processing method entirety accuracy rate respectively be limited to detection with The accuracy rate of identification；2. the intermediate result that the detection of text and identification are divided into the method needs storage detection of two parts processing is made For the input of identification, simultaneously because detection and two network models of identification are increasingly complex so that operation and storage efficiency are relatively low.

Fig. 1 is the flow chart of detection recognition method one embodiment of the present invention.As shown in Figure 1, the embodiment method includes：

Step 101, pending image is inputted into detection identification network.

Wherein, detection identification network includes shared network layer, detection network layer and identification network layer.

Step 102, the inclusion layer feature of pending image is exported through sharing network layer.

Sharing feature is used to embody at least one of following characteristics in image：Wisp textural characteristics, edge feature, details Feature, text box detection and Text region are respectively necessary for a neural network when being handled separately as a task, can be with Two networks are regarded as to the place of text box detection network and Text region network, text box detection network and Text region network Reason object is image, neural network substantially combined by network layers such as a certain number of convolutional layers, pond layer, full articulamentums and Into, all it is the text information that handles in image since text box detects network and Text region network, text box detection net Network and Text region the network parameter in the former layer network layers for obtaining inclusion layer feature can be shared, wherein, inclusion layer feature It is used to obtain the details of the textural characteristics of more wisps, the edge feature of image and image in image detection and identification network The features such as feature can preferably handle detection and identification to wisp；Text box is detected into network and Text region network In the network layer that is related to jointly feature extraction is carried out to pending image separately as shared network layer, avoid to pending Obtained inclusion layer feature subsequently need to only be inputted text box detection and/or Text region corresponding by the reprocessing of image Network layer.Illustratively, cascaded using Analysis On Multi-scale Features (by the inclusion layer characteristic pattern that shared network layer exports with it is known Text box information is merged, i.e., by the Fusion Features of different levels) and CTC (Connectionist Temporal Classfication, continuous time classify, by side of the sequential decoding for another sequence in a kind of deep neural network Method, effect is fine in Text region) it improves to text detection and the accuracy rate identified, preferably handle smaller difficulty in picture Word is differentiated, by sharing subnetwork, the feature to image for reducing repetition extracts.

Step 103, inclusion layer feature is inputted into detection network layer, network layer exports the detection layers of pending image after testing Feature includes the text box information of word based on the pending image of detection layers feature acquisition.

Step 104, by inclusion layer feature and text box information input identification network layer, identified network layer exports text box In word content.

Based on a kind of detection recognition method that the above embodiment of the present invention provides, pending image is inputted into detection identification net Network；The inclusion layer feature of pending image is exported through sharing network layer；The inclusion layer feature exported by sharing network layer is reduced It repeats to carry out feature extraction to image, improves treatment effeciency；Inclusion layer feature is inputted into detection network layer, after testing network Export the text box information that pending image includes word；Inclusion layer feature and text box information input are identified into network layer, Word content in identified network layer output text box；The real-time performance detection of text box information is identified by a detection With the identification of the text information in text box；Improve the efficiency and speed of Text region.

Detection recognition method provided by the invention is all suitable for the language of different language, only need to be for different language During training detection identification network, make to be trained using the word of languages to be treated, obtained detection identification network can To realize detection and identification to the languages word.

In a specific example of detection recognition method above-described embodiment of the present invention, detection layers feature includes pending figure The classification information of each pixel as in；Wherein, whether it is word that classification information is used for through the different corresponding pixels of information indicating Classification；Optionally, classification information specifically can represent non-legible classification and 1 expression word classification by 0 or represent non-by 1 Word classification and 0 represents word classification.

Operation 103 includes：

Include the text box information of word by the pending image of classification information acquisition of each pixel in pending image.

Wherein, text box information includes：Text box classification information and text box location information；Text box classification information is used for It whether represents in the text box comprising word；Text box location information includes any pixel point in pending image to text box In distance up and down and the rotation angle of text box.In the present embodiment, by the pending image of sample image to inspection It surveys before identifying that network is trained, the pending image to sample image is needed to be labeled, by pending to sample image The classification of each pixel is labeled in image, and to determine the position of text box, the classification usually marked includes text and Fei Wen This (can use 1 and 0 mark), by being labelled with text and non-textual, it is possible to the determining corresponding text of text box for including text This frame information.

In a specific example of detection recognition method the various embodiments described above of the present invention, pass through each picture in pending image The classification information of element obtains the text box information that pending image includes word, including：

The length of pending image and width are narrowed down to setting ratio by the classification information based on pending image respectively, according to picture Pending image is divided into multiple rectangle frames by plain position relationship, and the classification information based on pixel each inside rectangle frame is denoted as The rectangle frame of text information obtains text box；

Obtain the rotation of range information and text box of any pixel point apart from text box up and down in pending image Angle information；

Text box location information and text box classification information based on acquisition obtain text box information.

Setting through this embodiment, pending image are just labeled as only that (classification information passes through 1 table including 1 and 0 image Show that word classification, 0 represent non-legible classification or represents that non-legible classification, 0 represent word classification by 1), and in network class In the process, it is possible to the problem of existence position is inaccurate, at this point, the length of text box and width are narrowed down to setting ratio respectively (such as： It is long to be reduced into original 0.6 times with wide), shadow of the text position inaccuracy to algorithm can be reduced by reducing the size of text box It rings；And the location information of determining text box is the minimum enclosed rectangle by finding text box, passes through the boundary rectangle The range information of each pixel distance text frame up and down in text box is obtained, the angle information of text box is then to be based on this most The rotation angle of small boundary rectangle and the positive rectangle placed.

Another embodiment of detection recognition method of the present invention, on the basis of the various embodiments described above, operation 104 includes：

Text box information based on output obtains corresponding text box feature, and text box feature and shared network layer are exported Inclusion layer feature carry out Fusion Features；

Identify network layer based on the text information in the feature prediction text box after fusion.

Signified Fusion Features, are to connect together the inclusion layer feature of acquisition with detection layers feature in the present embodiment, this Feature after sample fusion had not only included the inclusion layer feature of image, but also include the semantic feature of detection layers, can preferably be used for Text detection and identification.

In a specific example of detection recognition method the various embodiments described above of the present invention, the text box information based on output The corresponding pending image of text box feature is obtained, including：

Perspective transform is carried out to text box information, text box is partitioned into from pending image, based on the text box being partitioned into Generate corresponding text box feature.

In the present embodiment, text box from artwork is deducted according to the location information manually marked and perspective change may be used It changes, the arbitrary quadrilateral that will mark acquisition is deducted and transforms to a rectangle, to identify the input of network layer.Formula is as follows：

t_x=l-x₀

t_y=t-y₀

Scale=dat_h/(t+b)

dat_ω=scale × (l+r)

Wherein, it inputs：T, b, l, r are certain vertical range of point away from quadrangle side up and down in arbitrary quadrilateral, and θ is should Arbitrary quadrilateral rotation angle, dst_h, dst_wThe height and width of respectively set output rectangle picture, x₀, y₀Becoming for the point The coordinate position of picture before changing.Output：Original image is multiplied by perspective transformation matrix M with matrix M, can directly obtain output figure Piece, that is, the rectangle picture plucked out are used to identify network layer；The present embodiment meaning text box is characterized as text box characteristic pattern, based on obtaining The text box respective pixel value obtained can obtain text box characteristic pattern.

In a specific example of detection recognition method the various embodiments described above of the present invention, text is partitioned into from pending image This frame, including：

The top left co-ordinate of text box is obtained according to text box location information；

The height of text box and the constant rate of width are kept, text box is zoomed in and out, makes the height one of each text box It causes；

Rotation angle, top left co-ordinate and scaling structure perspective transformation matrix based on text box；

Based on perspective transformation matrix, text box is partitioned into from pending image.

In the present embodiment, in order to build the top left co-ordinate of perspective transformation matrix, first acquisition text box, for the ease of All text boxes are obtained, the height of all text boxes is adjusted to consistent, the text box after adjustment can be based on One perspective transformation matrix is split.

In a specific example of detection recognition method the various embodiments described above of the present invention, based on perspective transformation matrix, from Text box is partitioned into pending image, including：

Matrix multiplication operation is performed to perspective transformation matrix and pending image, obtains one and pending image size phase Same segmentation image, each image of dividing only include a text box in the upper left corner.

In the present embodiment, a text box can only be partitioned into based on perspective transformation matrix every time, is become by mobile perspective It changes matrix and performs matrix multiplication with pending image to obtain all text boxes.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and aforementioned program can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is performed；And aforementioned storage medium includes：ROM, RAM, magnetic disc or light The various media that can store program code such as disk.

Fig. 2 is the structure diagram of present invention detection identification device one embodiment.The device of the embodiment can be used for real The existing above-mentioned each method embodiment of the present invention.As shown in Fig. 2, the device of the embodiment includes：

Input unit 21, for pending image to be inputted detection identification network.

Low layer extraction unit 22, for exporting the inclusion layer feature of pending image through shared network layer.

Sharing feature is used to embody at least one of following characteristics in image：Wisp textural characteristics, edge feature, details Feature.

Text box detection unit 23, for inclusion layer feature to be inputted detection network layer, network layer, which exports, after testing waits to locate The detection layers feature of image is managed, the text box information of word is included based on the pending image of detection layers feature acquisition.

Word recognition unit 24, for inclusion layer feature and text box information input to be identified network layer, identified network Word content in layer output text box.

Based on a kind of detection identification device that the above embodiment of the present invention provides, pending image is inputted into detection identification net Network；The inclusion layer feature of pending image is exported through sharing network layer；The inclusion layer feature exported by sharing network layer is reduced It repeats to carry out feature extraction to image, improves treatment effeciency；Inclusion layer feature is inputted into detection network layer, after testing network Export the text box information that pending image includes word；Inclusion layer feature and text box information input are identified into network layer, Word content in identified network layer output text box；The real-time performance detection of text box information is identified by a detection With the identification of the text information in text box；Improve the efficiency and speed of Text region.

In a specific example of present invention detection identification device above-described embodiment, detection layers feature includes pending figure The classification information of each pixel as in；Whether it is word classification that classification information is used for through the different corresponding pixels of information indicating.

Text box detection unit 23, specifically for obtaining pending figure by the classification information of each pixel in pending image Text box information as including word.

Wherein, text box information includes：Text box classification information and text box location information；Text box classification information is used for Represent whether include word in text box；Text box location information is included in pending image in any pixel point to text box The distance of lower left and right and the rotation angle of text box.

In a specific example of present invention detection identification device the various embodiments described above, text box detection unit 23, packet It includes：

Text box obtains module, and the length of pending image and width contract respectively for the classification information based on pending image It is small to arrive setting ratio, pending image is divided by multiple rectangle frames according to location of pixels relationship；Based on picture each inside rectangle frame The rectangle frame that the classification information of element is denoted as text information obtains text box；

Data obtaining module, for obtaining distance letter of any pixel point apart from text box up and down in pending image The rotation angle information of breath and text box；Text box location information and text box classification information based on acquisition obtain text box letter Breath.

Another embodiment of present invention detection identification device, on the basis of the various embodiments described above, word recognition unit 24, Including：

Characteristic extracting module obtains corresponding text box feature for the text box information based on output, by text box spy Sign and the inclusion layer feature of shared network layer output carry out Fusion Features；

Word-predictor module, for identifying network layer based on the text information in the feature prediction text box after fusion.

In a specific example of present invention detection identification device the various embodiments described above, characteristic extracting module is specific to use In carrying out perspective transform to text box information, text box is partitioned into from pending image, based on the text box generation pair being partitioned into The text box feature answered.

In a specific example of present invention detection identification device the various embodiments described above, characteristic extracting module, including：

Zoom module, for obtaining the top left co-ordinate of text box according to text box location information；Keep the height of text box The constant rate of degree and width, zooms in and out text box, makes the highly consistent of each text box；

Conversion module builds perspective transform square for the rotation angle based on text box, top left co-ordinate and scaling Battle array；

Text box divides module, and for being based on perspective transformation matrix, text box is partitioned into from pending image.

In a specific example of present invention detection identification device the various embodiments described above, text box segmentation module, specifically For performing matrix multiplication operation to perspective transformation matrix and pending image, obtain one it is identical with pending image size Divide image, each image of dividing only includes a text box in the upper left corner.

Fig. 3 is the flow chart of training method one embodiment that present invention detection identifies network.As shown in figure 3, the implementation Example method includes：

Step 301, pending image is inputted into detection identification network.

Wherein, detection identification network includes shared network layer, detection network layer and identification network layer；Pending image labeling There is a text box information and a text information that all text boxes include；Pending image is inputted into detection identification network, it can be simultaneously Two training missions of text detection and Text region are completed, with being trained phase to text detection network and Text region net respectively Than being equivalent to and more labeled data and information being utilized, effectively alleviate over-fitting, promote final result accuracy rate Promotion, being carried out at the same time Text region no longer needs two networks of text detection and Text region, improves the effect of Text region Rate and speed.

Step 302, the first inclusion layer feature is exported through shared network layer；By the first inclusion layer feature and pending image mark The text box information input identification network layer of note, the text information that identified network layer prediction text box includes；Based on prediction The shared network layer of the text information of text information and mark training and identification network layer, until meeting the first training completion condition.

Network is identified for detection, trains shared network layer therein and identification network layer first, at this point, shared network layer Regard a network as with identification network layer；Can by be wherein input in identification network layer be shared network layer output inclusion layer The text box information of feature and pending image labeling；Sharing feature is used to embody at least one of following characteristics in image：It is small Object texture feature, edge feature, minutia.

Step 303, by the shared network layer after the input training of pending image, it is trained after shared network layer output the Two inclusion layer features；Second inclusion layer feature is inputted into detection network layer, network layer predicts the detection of pending image after testing Layer feature includes the text box information of word based on the pending image of detection layers feature acquisition；Text box letter based on prediction The text box information of breath and mark training detection network layer, until meeting the second training completion condition.

Based on the training method of a kind of detection identification network that the above embodiment of the present invention provides, pass through pending figure first As the shared network layer of training and identification network layer, pending image is inputted into trained shared network layer and untrained detection Network layer obtains prediction text box information, the text box information training detection network of text box information and mark based on prediction Layer；When training detects network layer, shared network layer and detection branches layer are detected into network layer as a network, to this net Network is trained, since shared network layer has trained, which just realizes to detection branches layer detection network layer Training, trained shared network layer, identification branch layer identification network layer and detection branches layer detection network layer composing training are good Detection identification network, obtained detection identification network can realize the detection and identification of word, and due to sharing network simultaneously The presence of layer, reduces the feature extraction to image of repetition, has gently changed network structure, reduce the complexity in time and space, Reduce model volume.

In the specific example that training method above-described embodiment of identification network is detected in the present invention, operation 302 is based on The shared network layer of text information training and identification network layer of the text information and mark of prediction, including：

Text information based on prediction and the error between the text information of mark are to shared network layer and identification network layer In network parameter values be adjusted；

In the present embodiment, specifically according to error, to parameter, newer process can include：By predictive text information and Error between known text information is as worst error；By worst error by gradient backpropagation, shared network layer is calculated With each layer in identification network layer of error；Go out the gradient of each layer parameter according to each layer of error calculation, repaiied according to gradient Just shared network layer and the parameter for identifying respective layer in network layer；Shared network layer and identification network layer after calculation optimization parameter Error between the predictive text information of output and known text information, using the error as worst error；

Iteration is performed worst error through gradient backpropagation, calculates each layer in shared network layer and identification network layer Error；Go out the gradient of each layer parameter according to each layer of error calculation, network layer and identification net are shared according to gradient modification The parameter of respective layer in network layers, until meeting default first training completion condition.

First training completion condition in above-described embodiment, including：

Error between the text information of prediction and the text information of mark is less than the first preset value；Or iteration prediction number More than or equal to the first preset times.

In network training, the stop condition of network training can be judged according to error amount or according to iteration Frequency of training is judged or thought by other skilled in the art can be with the stop condition of deconditioning, the present embodiment The realization for facilitating those skilled in the art to the present embodiment method is only used for, is not used in limitation the present embodiment method.

Another embodiment of the training method of present invention detection identification network, on the basis of the various embodiments described above, behaviour Make the text box information training detection network layer of the text box information and mark in 303 based on prediction, including：

Text box information based on prediction and the error between the text box information of mark to detect the parameter of network layer into Row adjustment；

In the present embodiment, the parameter in detection network layer by reversed gradient method can also be trained, had Body training process can include：Using the error predicted between text box information and known text frame information as worst error；It will Worst error calculates detection network layer and (since shared network layer has trained, shares network layer at this time by gradient backpropagation Parameter do not need to retraining) in each layer of error；Go out the gradient of each layer parameter according to each layer of error calculation, according to The parameter of respective layer in gradient modification detection network layer；The prediction text box letter of detection network layer output after calculation optimization parameter Error between breath and known text frame information, using the error as worst error；

Iteration is performed worst error through gradient backpropagation, calculates each layer in detection network layer of error；According to Each layer of error calculation goes out the gradient of each layer parameter, and the parameter of respective layer in network layer is detected according to gradient modification, until Meet default second training completion condition.

Second training completion condition in above-described embodiment, including：

Error between the text box information of prediction and the text box information of mark is less than the second preset value；Or iteration prediction Number is greater than or equal to the second preset times.

Another embodiment of the training method of present invention detection identification network, on the basis of the various embodiments described above, inspection Survey the classification information that layer feature includes each pixel in pending image；Wherein, classification information is used for through different information indicatings Whether corresponding pixel is word classification；Optionally, classification information specifically can represent that non-legible classification and 1 represents text by 0 Word classification represents that non-legible classification and 0 represents word classification by 1.

Operation 303 includes：

Text box information includes：Text box classification information and text box location information；Text box classification information is used to represent Whether word is included in text box；Text box location information includes bottom left upper in any pixel point to text box in pending image Right distance and the rotation angle of text box.In the present embodiment, net is being identified to detection by the pending image of sample image Before network is trained, the pending image to sample image is needed to be labeled, by every in image pending to sample image The classification of a pixel is labeled, and to determine the position of text box, the classification usually marked includes text and non-textual (can use 1 and 0 mark), by being labelled with text and non-textual, it is possible to determine the corresponding text box information of text box for including text.

In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, by pending The classification information of each pixel obtains the text box information that pending image includes word in image, including：

The length of pending image and width are narrowed down to setting ratio by the classification information based on pending image respectively, according to picture Pending image is divided into multiple rectangle frames by plain position relationship；Classification information based on pixel each inside rectangle frame is denoted as The rectangle frame of text information obtains text box；

Setting through this embodiment, by pending image labeling for only (classification information passes through 1 table including 1 and 0 image Show that word classification, 0 represent non-legible classification or represents that non-legible classification, 0 represent word classification by 1), and in network class In the process, it is possible to the problem of existence position is inaccurate, at this point, the length of text box and width are narrowed down to setting ratio respectively (such as： It is long to be reduced into original 0.6 times with wide), shadow of the text position inaccuracy to algorithm can be reduced by reducing the size of text box It rings；And the location information of determining text box is the minimum enclosed rectangle by finding text box, passes through the boundary rectangle The range information of each pixel distance text frame up and down in text box is obtained, the angle information of text box is then to be based on this most The rotation angle of small boundary rectangle and the positive rectangle placed.

The a still further embodiment of the training method of present invention detection identification network, on the basis of the various embodiments described above, behaviour Make 302, including：

Text box information based on pending image labeling obtains corresponding text box feature, by text box feature with sharing First inclusion layer feature of network layer output carries out Fusion Features；

In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, based on pending The text box information of image labeling obtains corresponding text box feature, including：

Perspective transform is carried out to the text box information of mark, text box is partitioned into from pending image, based on what is be partitioned into Text box generates corresponding text box feature.

t_x=l-x₀

t_y=t-y₀

Scale=dat_h/(t+b)

dat_ω=scale × (l+r)

In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, from pending figure As being partitioned into text box, including：

Based on perspective transformation matrix, the text box is partitioned into from pending image.

In the specific example that the training method the various embodiments described above of identification network are detected in the present invention, become based on perspective Matrix is changed, text box is partitioned into from pending image, including：

Fig. 4 is the structure diagram of training device one embodiment that present invention detection identifies network.The dress of the embodiment It puts available for realizing the above-mentioned each method embodiment of the present invention.As shown in figure 4, the device of the embodiment includes：

Image input units 41, for pending image to be inputted detection identification network.

Wherein, detection identification network includes shared network layer, detection network layer and identification network layer；Pending image labeling There is a text box information and a text information that text box includes.

First training unit 42, for exporting the first inclusion layer feature through shared network layer；By the first inclusion layer feature and The text box information input identification network layer of pending image labeling, the word letter that identified network layer prediction text box includes Breath；The shared network layer of text information training and the identification network layer of text information and mark based on prediction, until meeting First training completion condition.

Second training unit 43, for by the shared network layer after the input training of pending image, it is trained after it is shared Network layer exports the second inclusion layer feature；Second inclusion layer feature is inputted into detection network layer, network layer, which is predicted, after testing waits to locate The detection layers feature of image is managed, the text box information of word is included based on the pending image of detection layers feature acquisition；Based on pre- The text box information of survey and the text box information training detection network layer of mark, until meeting the second training completion condition.

Based on the training device of a kind of detection identification network that the above embodiment of the present invention provides, pass through pending figure first As the shared network layer of training and identification network layer, pending image is inputted into trained shared network layer and untrained detection Network layer obtains prediction text box information, the text box information training detection network of text box information and mark based on prediction Layer；When training detects network layer, shared network layer and detection branches layer are detected into network layer as a network, to this net Network is trained, since shared network layer has trained, which just realizes to detection branches layer detection network layer Training, trained shared network layer, identification branch layer identification network layer and detection branches layer detection network layer composing training are good Detection identification network, obtained detection identification network can realize the detection and identification of word, and due to sharing network simultaneously The presence of layer, reduces the feature extraction to image of repetition, has gently changed network structure, reduce the complexity in time and space, Reduce model volume.

In the specific example that training device above-described embodiment of identification network is detected in the present invention, the first training is single Member, specifically for the error between the text information based on prediction and the text information of mark to shared network layer and identification network Network parameter values in layer are adjusted；Iteration performs treats place by adjusting the shared network layer after parameter and identification network layer The text information for obtaining prediction is identified in reason image, until meeting the first training completion condition.

The default first training completion condition met in above-described embodiment, including：

Another embodiment of the training device of present invention detection identification network, on the basis of the various embodiments described above, the Two training units, specifically for the error between the text box information based on prediction and the text box information of mark to detecting network The parameter of layer is adjusted；Iteration performs is detected pending image acquisition in advance by adjusting the detection network layer after parameter The text box information of survey, until meeting default second training completion condition.

Iteration is performed worst error through gradient backpropagation, calculates each layer in detection network layer of error；According to Each layer of error calculation goes out the gradient of each layer parameter, and the parameter of respective layer in network layer is detected according to gradient modification, until Meet the second training completion condition.

The default second training completion condition met in above-described embodiment, including：

Another embodiment of the training device of present invention detection identification network, on the basis of the various embodiments described above,

Detection layers feature includes the classification information of each pixel in pending image；Classification information is used to pass through different information Indicate whether corresponding pixel is word classification；

Second training unit 43, specifically for obtaining pending image by the classification information of each pixel in pending image Include the text box information of word.

Text box information includes：Text box classification information and text box location information；Text box classification information is used to represent Whether word is included in text box；Text box location information includes bottom left upper in any pixel point to text box in pending image Right distance and the rotation angle of text box.In the present embodiment, detection identification network is being instructed by pending image It before white silk, needs to be labeled pending image, be labeled by the classification to pixel each in pending image, with true Determine the position of text box, the classification usually marked includes text and non-textual (can use 1 and 0 mark), by being labelled with text With it is non-textual, it is possible to determine to include the corresponding text box information of text box of text.

In the specific example that the training device the various embodiments described above of identification network are detected in the present invention,

Second training unit, including：

The a still further embodiment of the training device of present invention detection identification network, on the basis of the various embodiments described above, the One training unit 42, including：

Characteristic extracting module, for obtaining corresponding text box feature based on the text box information of pending image labeling, First inclusion layer feature of text box feature and the output of shared network layer is subjected to Fusion Features；

File prediction module, for identifying network layer based on the text information in the feature prediction text box after fusion.

In the specific example that the training device the various embodiments described above of identification network are detected in the present invention, feature extraction mould Specifically for carrying out perspective transform to the text box information of mark, text box is partitioned into from pending image for block, based on being partitioned into Text box generate corresponding text box feature.

In the specific example that the training device the various embodiments described above of identification network are detected in the present invention, feature extraction mould Block, including：

In the specific example that the training device the various embodiments described above of identification network are detected in the present invention, text box segmentation Module specifically for performing matrix multiplication operation to perspective transformation matrix and pending image, obtains one and pending image The identical segmentation image of size, each image of dividing only include a text box in the upper left corner.

One side according to embodiments of the present invention, a kind of electronic equipment provided, including processor, processor includes this Invent the detection identification device of any of the above-described embodiment or the training device of detection identification network.

One side according to embodiments of the present invention, a kind of electronic equipment provided, including：Memory, can for storing Execute instruction；

And processor, for communicating to perform executable instruction any of the above-described implementation thereby completing the present invention with memory The operation of example detection recognition method or the training method of detection identification network.

A kind of one side according to embodiments of the present invention, the computer storage media provided, can for storing computer The instruction of reading, described instruction are performed any of the above-described embodiment detection recognition method of the execution present invention or detection identification network Training method operation.

The embodiment of the present invention additionally provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down Plate computer, server etc..Below with reference to Fig. 5, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present application or service The structure diagram of the electronic equipment 500 of device：As shown in figure 5, computer system 500 includes one or more processors, communication Portion etc., one or more of processors are for example：One or more central processing unit (CPU) 501 and/or one or more Image processor (GPU) 513 etc., processor can according to the executable instruction being stored in read-only memory (ROM) 502 or From the executable instruction that storage section 508 is loaded into random access storage device (RAM) 503 perform various appropriate actions and Processing.Communication unit 512 may include but be not limited to network interface card, and the network interface card may include but be not limited to IB (Infiniband) network interface card.

Processor can communicate with read-only memory 502 and/or random access storage device 530 to perform executable instruction, It is connected by bus 504 with communication unit 512 and is communicated through communication unit 512 with other target devices, is implemented so as to complete the application The corresponding operation of any one method that example provides, for example, pending image is inputted detection identification network；It is defeated through shared network layer Go out the inclusion layer feature of pending image；Inclusion layer feature is inputted into detection network layer, network layer exports pending figure after testing The detection layers feature of picture includes the text box information of word based on the pending image of detection layers feature acquisition；By inclusion layer spy Text box information of seeking peace input identification network layer, identified network layer export the word content in text box.

In addition, in RAM 503, it can also be stored with various programs and data needed for device operation.CPU501、ROM502 And RAM503 is connected with each other by bus 504.In the case where there is RAM503, ROM502 is optional module.RAM503 is stored Executable instruction is written in executable instruction into ROM502 at runtime, and it is above-mentioned logical that executable instruction performs processor 501 The corresponding operation of letter method.Input/output (I/O) interface 505 is also connected to bus 504.Communication unit 512 can be integrally disposed, It may be set to be with multiple submodule (such as multiple IB network interface cards), and in bus link.

I/O interfaces 505 are connected to lower component：Importation 506 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section 508 including hard disk etc.； And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 510, as needed in order to be read from thereon Computer program be mounted into storage section 508 as needed.

Need what is illustrated, framework as shown in Figure 5 is only a kind of optional realization method, can root during concrete practice The component count amount and type of above-mentioned Fig. 5 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection Into on CPU, communication unit separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiments Each fall within protection domain disclosed by the invention.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it is machine readable including being tangibly embodied in Computer program on medium, computer program are included for the program code of the method shown in execution flow chart, program code It may include the corresponding instruction of corresponding execution method and step provided by the embodiments of the present application, detected for example, pending image is inputted Identify network；The inclusion layer feature of pending image is exported through sharing network layer；Inclusion layer feature is inputted into detection network layer, warp The detection layers feature of the pending image of network output layer is detected, obtaining pending image based on detection layers feature includes word Text box information；By inclusion layer feature and text box information input identification network layer, identified network layer is exported in text box Word content.In such embodiments, which can be downloaded and pacified from network by communications portion 509 It fills and/or is mounted from detachable media 511.When the computer program is performed by central processing unit (CPU) 501, perform The above-mentioned function of being limited in the present processes.

Methods and apparatus of the present invention, equipment may be achieved in many ways.For example, software, hardware, firmware can be passed through Or any combinations of software, hardware, firmware realize methods and apparatus of the present invention, equipment.The step of for method Sequence is stated merely to illustrate, the step of method of the invention is not limited to sequence described in detail above, unless with other Mode illustrates.In addition, in some embodiments, the present invention can be also embodied as recording program in the recording medium, this A little programs include being used to implement machine readable instructions according to the method for the present invention.Thus, the present invention also covering stores to hold The recording medium of the program of row according to the method for the present invention.

Description of the invention provides for the sake of example and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those of ordinary skill in the art is enable to manage The solution present invention is so as to design the various embodiments with various modifications suitable for special-purpose.

Claims

1. a kind of detection recognition method, which is characterized in that including：

Pending image is inputted into detection identification network；The detection identification network include shared network layer, detection network layer and Identify network layer；

The inclusion layer feature of the pending image is exported through the shared network layer；The sharing feature is used to embody in image Following characteristics it is at least one：Wisp textural characteristics, edge feature, minutia；

The inclusion layer feature is inputted into the detection network layer, the inspection of the pending image is exported through the detection network layer Layer feature is surveyed, the text box information of word is included based on the detection layers feature acquisition pending image；

The inclusion layer feature and text box information are inputted into the identification network layer, the text is exported through the identification network layer Word content in this frame.

2. according to the method described in claim 1, it is characterized in that, the detection layers feature is included in the pending image respectively The classification information of pixel；Whether it is word classification that the classification information is used for through the different corresponding pixels of information indicating；

Include the text box of word by the classification information acquisition pending image of each pixel in the pending image Information, the text box information include：Text box classification information and text box location information；The text box classification information is used for It whether represents in the text box comprising word；The text box location information includes any pixel point in the pending image The rotation angle of distance and text box into the text box up and down.

3. the according to the method described in claim 2, it is characterized in that, classification by each pixel in the pending image Pending image includes the text box information of word described in acquisition of information, including：

The length of the pending image and width are narrowed down to setting ratio by the classification information based on pending image respectively, according to picture The pending image is divided into multiple rectangle frames by plain position relationship；Classification information based on pixel each inside the rectangle frame The rectangle frame for being denoted as text information obtains text box；

Any pixel point in the pending image is obtained apart from text box range information up and down and the text The rotation angle information of frame；

4. according to any methods of claim 1-3, which is characterized in that described to believe the inclusion layer feature and text box Breath inputs the identification network layer, and the text information in the text box is predicted through the identification network layer, including：

Text box information based on the output obtains corresponding text box feature, by the text box feature and the shared net The inclusion layer feature of network layers output carries out Fusion Features；

5. a kind of training method for detecting identification network, which is characterized in that including：

The first inclusion layer feature is exported through the shared network layer；By the first inclusion layer feature and the pending image mark The text box information of note inputs the identification network layer, predicts that the word that the text box includes is believed through the identification network layer Breath；The text information of text information and mark based on prediction trains the shared network layer and the identification network layer, until Meet the first training completion condition, the sharing feature is used to embody at least one of following characteristics in image：Wisp texture Feature, edge feature, minutia；

By the shared network layer after the input training of pending image, the shared network layer after the training exports the second inclusion layer Feature；The second inclusion layer feature is inputted into the detection network layer, the pending figure is predicted through the detection network layer The detection layers feature of picture includes the text box information of word based on the detection layers feature acquisition pending image；Base The detection network layer is trained in the text box information of prediction and the text box information of mark, until meeting the second training completes item Part.

6. a kind of detection identification device, which is characterized in that including：

Input unit, for pending image to be inputted detection identification network；The detection identification network include shared network layer, Detect network layer and identification network layer；

Low layer extraction unit, it is described shared for exporting the inclusion layer feature of the pending image through the shared network layer Feature is used to embody at least one of following characteristics in image：Wisp textural characteristics, edge feature, minutia；

Text box detection unit, it is defeated through the detection network layer for the inclusion layer feature to be inputted the detection network layer Go out the detection layers feature of the pending image, obtaining the pending image based on the detection layers feature includes word Text box information；

Word recognition unit, for the inclusion layer feature and text box information to be inputted the identification network layer, through the knowledge Other network layer exports the word content in the text box.

7. a kind of training device for detecting identification network, which is characterized in that including：

Image input units, for pending image to be inputted detection identification network；Wherein, the detection identification network is included altogether Enjoy network layer, detection network layer and identification network layer；The pending image labeling has text box information and text box to include Text information；

First training unit, for exporting the first inclusion layer feature through the shared network layer；By the first inclusion layer feature The identification network layer is inputted with the text box information of the pending image labeling, the text is predicted through the identification network layer The text information that this frame includes；The text information of text information based on prediction and mark trains the shared network layer and described Identify network layer, until meet the first training completion condition, the sharing feature is for embodying in image following characteristics at least One：Wisp textural characteristics, edge feature, minutia；

Second training unit, for pending image to be inputted to the shared network layer after training, the shared net after the training Network layers export the second inclusion layer feature；The second inclusion layer feature is inputted into the detection network layer, through the detection network Layer predicts the detection layers feature of the pending image, and obtaining the pending image based on the detection layers feature includes text The text box information of word；The text box information of text box information and mark based on prediction trains the detection network layer, until Meet the second training completion condition.

8. a kind of electronic equipment, which is characterized in that including processor, the detection that the processor includes described in claim 6 is known The training device of detection identification network described in other device or claim 7.

9. a kind of electronic equipment, which is characterized in that including：Memory, for storing executable instruction；

And processor, appointed for communicating with the memory with performing the executable instruction so as to complete Claims 1-4 The operation of the training method of detection identification network described in a detection recognition method of anticipating or claim 5.

10. a kind of computer storage media, for storing computer-readable instruction, which is characterized in that described instruction is held Perform claim requires detection recognition method described in 1 to 4 any one or the instruction of the detection identification network described in claim 5 during row Practice the operation of method.