CN110147785A

CN110147785A - Image-recognizing method, relevant apparatus and equipment

Info

Publication number: CN110147785A
Application number: CN201810274802.XA
Authority: CN
Inventors: 李辉
Original assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2019-08-20
Anticipated expiration: 2038-03-29
Also published as: CN110147785B

Abstract

The invention discloses a kind of image-recognizing methods, comprising: carries out binary conversion treatment to image, obtains binary map；Described image includes multiple characters；Skeletal extraction is carried out to the binary map, extracts the framework information of the multiple character；Style of writing information is extracted from the framework information；The style of writing information includes the location information between style of writing feature point number and adjacent style of writing characteristic point；By style of writing information described in the chronicle recognition engine analysis based on deep learning network, the multiple character and intercharacter positional relationship information are identified.The invention also discloses a kind of pattern recognition device and equipment, artificial design features are not necessarily to, and do not need to do character separation, the prior art is solved for there are the character of adhesion, separation algorithms cannot be handled well, leads to the technical problem that recognition accuracy is low.

Description

Image-recognizing method, relevant apparatus and equipment

Technical field

It is by method, relevant apparatus and equipment the present invention relates to computer field more particularly to image.

Background technique

Optical character identification (Optical Character Recognition, OCR) refers to that electronic equipment (such as scans Instrument or digital camera) check the character printed on paper, its shape is determined by the mode for detecting dark, bright, then uses character recognition Shape is translated into the process of computword by method.Wherein, misclassification rate or recognition accuracy are to measure OCR performance quality One important indicator.

Currently, the application field of OCR mathematical character identification is very extensive, alternative keyboard completes high speed in many occasions for it Words input is humane.Such as the identification typing of block letter manuscript is carried out with OCR, this be many office sectors commonly using method it One；The block letter that automatic segmentation can be also carried out to the complicated space of a whole page of the mixings such as figure, image and text identifies；Also by right Mail Automated Sorting System is realized in the identification of handwriting digital；And realize handwritten form list data automatic input, it can be extensive The input of the list datas such as declaration form, application form applied to every profession and trades such as government, the tax, insurance, quotient, medical treatment, finance, factories and miness With processing, etc..

In the prior art, the character in image is identified, when especially being identified to mathematical formulae, often first Binary conversion treatment is carried out to image, then carries out character separation, cutting extracts single mathematical character, and extracts mathematical character Feature, then according to the positional relationship of intercharacter using stochastic context can not grammar rule carry out mathematic(al) representation derivation it is raw At mathematical formulae.Then the above-mentioned prior art causes to identify for there are the character of adhesion, separation algorithms cannot be handled well Accuracy rate is low.

Summary of the invention

The technical problem to be solved by the embodiment of the invention is that providing a kind of image-recognizing method, a kind of image recognition Device, a kind of image recognition apparatus and a kind of computer readable storage medium solve the prior art for there are the words of adhesion Symbol, separation algorithms cannot be handled well, lead to the technical problem that recognition accuracy is low.

In order to solve the above-mentioned technical problem, the one aspect of the embodiment of the present invention discloses a kind of image recognition side Method, comprising:

Binary conversion treatment is carried out to image, obtains binary map；Described image includes multiple characters；

Skeletal extraction is carried out to the binary map, extracts the framework information of the multiple character；

Style of writing information is extracted from the framework information；The style of writing information includes style of writing feature point number and adjacent pen Touch the location information between characteristic point；

By style of writing information described in the chronicle recognition engine analysis based on deep learning network, the multiple character is identified And intercharacter positional relationship information.

It is described that skeletal extraction is carried out to the binary map in conjunction with a kind of above-mentioned image-recognizing method, comprising:

Corrosion treatment is iterated to the binary map, the not new pixel of the binary map after relatively last corrosion Point is corroded；Wherein each iteration corrosion includes the pixel successively traversed in the binary map, to the picture for meeting specified requirements Vegetarian refreshments is corroded.

In conjunction with a kind of above-mentioned image-recognizing method, the pixel for meeting specified requirements includes meeting following either condition Target pixel points:

The number for the pixel that two-value is 1 in 8 adjacent pixels around target pixel points is more than or equal to first threshold, is less than Equal to second threshold；The first threshold is less than the second threshold；

Check that 8 adjacent pixel, the binary sequence of two neighboring pixel are around target pixel points in a clockwise direction 01 number is equal to third threshold value；

It is 0 there are the two-value of at least one pixel in 4 relatively nearest neighbor pixels of distance；The distance packet It includes at a distance from the center to the center of the target pixel points of the pixel adjacent with target pixel points.

It is described that the style of writing information is passed through into the timing based on deep learning network in conjunction with a kind of above-mentioned image-recognizing method It identifies engine, identifies the multiple character and intercharacter positional relationship information, comprising:

The style of writing information is subjected to spy by convolutional neural networks (Convolutional Neural Network, CNN) Sign is extracted；

The feature of extraction is input in shot and long term memory network (Long Short-Term Memory, LSTM) and carries out word Symbol identification, identifies the multiple character and intercharacter positional relationship information.

In conjunction with a kind of above-mentioned image-recognizing method, the shot and long term memory network LSTM is two-way LSTM.

It is described to include: to image progress binary conversion treatment in conjunction with a kind of above-mentioned image-recognizing method

Using maximum stable extremal region (Maximally Stable Extremal Regions, MSER) algorithm to figure As carrying out binary conversion treatment.

In conjunction with a kind of above-mentioned image-recognizing method, the multiple character includes mathematic(al) representation；

It is described identify the multiple character and intercharacter positional relationship information after, further includes: according to what is identified The multiple character exports La Taihe (LaTex) expression formula.

In conjunction with a kind of above-mentioned image-recognizing method, the style of writing information of extracting from the framework information includes:

It is traversed for the framework information according to connected domain, extracts style of writing characteristic point；Wherein in the feelings of stroke bifurcated Under condition, the lesser style of writing characteristic point of deflection of advantage distillation and upper style of writing characteristic point.

The embodiment of the present invention discloses a kind of pattern recognition device on the other hand, comprising:

Processing unit obtains binary map for carrying out binary conversion treatment to image；The image includes multiple characters；

Extraction unit extracts the framework information of multiple character for carrying out skeletal extraction to the binary map；

Information unit is extracted, for extracting style of writing information from the framework information；The style of writing information includes style of writing characteristic point Location information between number and adjacent style of writing characteristic point；

Recognition unit is used to identify by style of writing information described in the chronicle recognition engine analysis based on deep learning network The multiple character and intercharacter positional relationship information.

In conjunction with a kind of above-mentioned pattern recognition device, the extraction unit is specifically used for being iterated corrosion to the binary map It handles, the not new pixel of the binary map after relatively last corrosion is corroded；Wherein the corrosion of each iteration include according to Pixel in the secondary traversal binary map, corrodes the pixel for meeting specified requirements.

In conjunction with a kind of above-mentioned pattern recognition device, the pixel for meeting specified requirements includes meeting following either condition Target pixel points:

It is 0 there are the two-value of at least one pixel in nearest neighbor pixel；The distance includes and target Distance of the center of the adjacent pixel of pixel to the center of the target pixel points.

In conjunction with a kind of above-mentioned pattern recognition device, the recognition unit includes:

Feature extraction unit, for the style of writing information to be carried out feature extraction by convolutional neural networks CNN；

Character recognition unit carries out character recognition for the feature of extraction to be input in shot and long term memory network LSTM, Identify multiple character and intercharacter positional relationship information.

In conjunction with a kind of above-mentioned pattern recognition device, the multiple character includes mathematic(al) representation；

The multiple character that the recognition unit output identifies includes: according to the multiple character output identified LaTex expression formula.

In conjunction with a kind of above-mentioned pattern recognition device, the extraction information unit is specifically used for, and presses for the framework information It is traversed according to connected domain, extracts style of writing characteristic point；Wherein in the case where stroke bifurcated, advantage distillation and upper style of writing feature The lesser style of writing characteristic point of deflection of point.

The embodiment of the present invention discloses a kind of image recognition apparatus, including processor and memory on the other hand, described Processor and memory are connected with each other, wherein for storing application code, the processor is configured the memory For calling said program code, a kind of above-mentioned image-recognizing method is executed.

The embodiment of the present invention discloses a kind of computer readable storage medium, the computer storage medium on the other hand It is stored with computer program, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor It states processor and executes a kind of such as above-mentioned image-recognizing method.

Implement the embodiment of the present invention, by carrying out skeletal extraction to binary map, extracts the framework information of multiple characters, so Style of writing information is extracted from framework information afterwards, style of writing information is passed through into the chronicle recognition engine based on deep learning network, identification Multiple characters and intercharacter positional relationship information are not necessarily to artificial design features, and do not need to do character separation, solve existing There is technology for there are the character of adhesion, separation algorithms cannot be handled well, leads to the technical problem that recognition accuracy is low；It is special Other embodiment of the present invention carries out the identification of numerical character by the deep learning identification model based on timing, will be mentioned by CNN The feature taken inputs i.e. exportable LaTex expression formula in two-way LSTM network, does not need to be split the character of image, also not Need to analyze the spatial relation of intercharacter, what these information were all obtained by the study of deep learning identification model, that is, it realizes It identifies end to end, therefore the embodiment of the present invention is adapted to Various Complex scene, recognition accuracy is greatly improved.

Detailed description of the invention

In order to illustrate the embodiment of the present invention or technical solution in the prior art, embodiment or the prior art will be retouched below Attached drawing needed in stating is briefly described.

Fig. 1 is the flow diagram of pattern recognition method provided in an embodiment of the present invention；

Fig. 2 is the schematic diagram of input picture provided in an embodiment of the present invention；

Fig. 3 is the schematic diagram of binary map provided in an embodiment of the present invention；

Fig. 4 is the schematic diagram that image framework provided in an embodiment of the present invention extracts；

Fig. 5 is the structural schematic diagram of pixel provided in an embodiment of the present invention；

Fig. 6 is the structural schematic diagram of the pixel of another embodiment provided by the invention；

Fig. 7 is the exemplary construction schematic diagram of the pixel of another embodiment provided by the invention；

Fig. 8 is the schematic diagram that the image framework of another embodiment provided by the invention extracts；

Fig. 9 a is the schematic diagram of style of writing information provided in an embodiment of the present invention；

Fig. 9 b is the schematic diagram of the style of writing information of another embodiment provided by the invention；

Figure 10 is the schematic illustration of chronicle recognition engine provided in an embodiment of the present invention；

Figure 11 is the structural schematic diagram of LSTM network provided in an embodiment of the present invention；

Figure 12 is the schematic illustration of the chronicle recognition engine of another embodiment provided by the invention；

Figure 13 is the structural schematic diagram of two-way LSTM network provided in an embodiment of the present invention；

Figure 14 is the structural schematic diagram of pattern recognition device provided in an embodiment of the present invention；

Figure 15 is the structural schematic diagram of recognition unit provided in an embodiment of the present invention；

Figure 16 is the structural schematic diagram of image recognition apparatus provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.

It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

In the specific implementation, terminal or equipment described in the embodiment of the present invention include but is not limited to such as desktop computer, The portable mobile termianls such as laptop computer, tablet computer, intelligent terminal such as smart phone, smartwatch, intelligent glasses.

A kind of image-recognizing method, pattern recognition device and the image that embodiment provides in order to better understand the present invention are known Other equipment is below first described the image recognition scene of the embodiment of the present invention.The image recognition of the embodiment of the present invention be After pattern recognition device or image recognition apparatus get the image for needing to identify, which includes multiple characters, is e.g. counted Formula is learned, the process of the character in image is identified and exported to the image.The character of output carries out letter convenient for related personnel It ceases typing or carries out letter sorting or the relevant information etc. matching convenient for subsequent searches convenient for mail system.

A kind of image-recognizing method provided in an embodiment of the present invention, pattern recognition device and image are known with reference to the accompanying drawing Other equipment is described in detail.The flow diagram of pattern recognition method provided in an embodiment of the present invention as shown in Figure 1, can be with The following steps are included:

Step S100: binary conversion treatment is carried out to image, obtains binary map；

Specifically, the image in the embodiment of the present invention may include multiple characters；To the binaryzation (Image of image It Binarization) is exactly to set 0 or 255 for the gray value of the pixel on image, to obtain binary map, that is, will Whole image shows the process of apparent black and white effect.The pixel that gray value after two-value can be 0 by the embodiment of the present invention Two-value be expressed as 0, the two-value of pixel that gray value is 255 is expressed as 1.

The present invention in one embodiment, Binarization methods can be maximum using the best Affinely invariant region of performance Stable extremal region (Maximally Stable Extremal Regions, MSER) algorithm extracts connected region and mistake Too small, excessive and abnormal length-width ratio region is filtered, binary map is exported.With specific reference to Fig. 2 shows the embodiment of the present invention mention The schematic diagram of the input picture of confession, the image in Fig. 2 includes multiple characters, and multiple character constitutes a mathematic(al) representation； After carrying out binary conversion treatment to the image by step S100, binary map provided in an embodiment of the present invention as shown in Figure 3 is obtained Schematic diagram, output the image that apparent black and white effect is presented.

Step S102: skeletal extraction is carried out to the binary map, extracts the framework information of multiple character；

Specifically, the schematic diagram that image framework provided in an embodiment of the present invention as shown in Figure 4 extracts, image framework extract Exactly extract the center pixel profile of target on the image, that is to say, that be to be subject to target's center, refine to target.Bone Frame extraction algorithm can be divided into iteration and non-iterative two major classes, in iterative algorithm, and be divided into parallel iteration and sequential iteration two Kind, etc..

The present invention in one embodiment, corrosion treatment can be iterated to the binary map, until relatively upper one The not new pixel of binary map after secondary corrosion is corroded；Wherein each iteration corrosion includes successively traversing in the binary map Pixel corrodes the pixel for meeting specified requirements.

It should be noted that the corrosion in the embodiment of the present invention can refer to the certain portions for removing image in morphology Point, it can specifically refer to and delete the certain pixels of object bounds, then corroding to binary map can refer to two-value in binary map For 1 pixel point deletion, that is to say, that so that the pixel that the two-value is 1 becomes the pixel that two-value is 0.

Specifically, the specified requirements, such as this hair can be arranged in the embodiment of the present invention according to the Skeleton demand of itself The pixel for meeting specified requirements in bright may include the target pixel points for meeting following either condition:

The number for the pixel that two-value is 1 in 8 adjacent pixels around condition a, target pixel points is more than or equal to the first threshold Value is less than or equal to second threshold；The first threshold is less than the second threshold；Specifically, following equation 1 can be referred to:

First threshold≤B (P1)≤second threshold formula 1

Wherein it is possible to which the structural schematic diagram of the pixel provided in an embodiment of the present invention with reference to shown in Fig. 5, P1 want for us The target pixel points for judging whether to corrode (or leaving out), around P1 8 adjacent pixels labeled as P2, P3, P4, P5, P6, P7, P8, P9；The embodiment of the present invention is by taking the two-value of pixel is 0 or 1 as an example, then B (P1) refers to central pixel point P1 (i.e. target picture Vegetarian refreshments) around two-value is 1 in 8 adjacent pixels pixel number, that is, B (P1)=P2+P3+P4+P5+P6+P7+ P8+P9.The first threshold can be 2 in one of the embodiments, and second threshold can be 6.

Condition b, 8 adjacent pixel, the two-value of two neighboring pixel around target pixel points are checked in a clockwise direction The number that sequence is 01 is equal to third threshold value；Specifically, following equation 2 can be referred to:

A (P1)=third threshold formula 2

Wherein it is possible to the structural schematic diagram of the pixel of another embodiment provided by the invention with reference to shown in Fig. 6, up time Needle direction i.e. from P3 to P4 to P5 to P6, and so on to return to the direction of P3 from P2；A (P1) is to check in a clockwise direction 8 adjacent pixel around target pixel points, the number that the binary sequence of two neighboring pixel is 01.

The third threshold value can be 1 in one of the embodiments, then the present invention shown in Fig. 7 mentions by taking Fig. 7 as an example The exemplary construction schematic diagram of the pixel of another embodiment supplied, can be seen that the two of two neighboring pixel from the example in left side The number that value sequence is 01 is 2, is sequence 01 from P2 to P3, and is sequence 01 from P6 to P7, then not meeting condition b；And The number that can be seen that the binary sequence of two neighboring pixel is 01 from the example on right side is 1, is only sequence from P9 to P2 01, then meeting condition b, then corrode the P1 point.

It is 0 there are the two-value of at least one pixel in condition c, 4 relatively nearest neighbor pixels of distance；It should be away from From include the pixel adjacent with target pixel points center to the center of the target pixel points at a distance from.Specifically, Ke Yican Examine following equation 3:

P2*P4*P6*P8=0 formula 3

Wherein it is possible to which the structural schematic diagram of the pixel provided in an embodiment of the present invention with reference to shown in above-mentioned Fig. 5, is with P1 Target pixel points, distance P1 relatively nearest neighbor pixel be P2, P4, P6 and P8 respectively, that is to say, that P2, P4, P6 and The distance at the center of P1 is arrived at the center of P8 respectively, will less than P3, P5, P7 and P9 center arrive respectively P1 center distance； Particularly ideally, the distance that the center of P1 is arrived at the center of P2, P4, P6 and P8 respectively is equal, and all for apart from most Close neighbor pixel, i.e. condition of embodiment of the present invention c can also be in nearest neighbor pixel, there are at least one The two-value of pixel is 0.Citing then meeting condition c, then corrodes the P1 point if the two-value of P2 is 0.If P2, P4, P6 and The two-value of P8 is not 0, then does not meet condition c.

It is possible to further judge P2*P4*P6=0 or P4*P6*P8=0 is when current iteration is odd-times iteration No establishment, works as establishment, then meets condition c, corrodes the P1 point；When current iteration is even-times iteration, P2*P4*P8 is judged Whether=0 or P2*P6*P8=0 is true, works as establishment, then meets condition c, corrodes the P1 point.

By taking the binary map shown in Fig. 3 as an example, skeletal extraction is carried out by step S102, extracts the skeleton of multiple character Information, obtained effect picture, the signal that can be extracted with reference to the image framework of another embodiment provided by the invention shown in Fig. 8 Figure realizes the Skeleton of character picture by the expansion of successive ignition, corrosion, so that the target in image becomes increasingly Carefully.

Step S104: style of writing information is extracted from the framework information；

Specifically, the embodiment of the present invention extracts style of writing information by style of writing extraction algorithm from the framework information, such as schemes The schematic diagram of style of writing information provided in an embodiment of the present invention shown in 9a, the style of writing information in the embodiment of the present invention may include pen Touch the location information between feature point number and adjacent style of writing characteristic point；In Fig. 9 a, each point is style of writing characteristic point, phase There are exist in positional relationship, such as Fig. 9 a from style of writing characteristic point a to adjacent style of writing characteristic point b between adjacent style of writing characteristic point Positional relationship can indicate the deflection from style of writing characteristic point a to adjacent style of writing characteristic point b by Vector Message.

In one of the embodiments, from framework information extract style of writing information may include for the framework information according to Connected domain is traversed, and style of writing characteristic point is extracted；Wherein in the case where stroke bifurcated, advantage distillation and upper style of writing characteristic point The lesser style of writing characteristic point of deflection.Connected domain in the embodiment of the present invention can refer to the connected region of style of writing characteristic point； Stroke bifurcated in the embodiment of the present invention can refer to that prolonging some direction since some style of writing characteristic point carries out style of writing characteristic point Traversal when, when next connected style of writing characteristic point there are it is multiple when, then there is stroke bifurcated；Side in the embodiment of the present invention Refer to existing deflection between style of writing characteristic point that current style of writing characteristic point is connected with upper one to angle, specifically can be from time The angle in the direction in the direction and current style of writing characteristic point of traversal for the upper one connected style of writing characteristic point gone through.Specifically, as schemed The schematic diagram of the style of writing information of another embodiment provided by the invention shown in 9b, style of writing information is the x's in Fig. 9 a in Fig. 9 b The amplification of style of writing information shows figure, since style of writing characteristic point c, traverses next style of writing characteristic point d according to connected domain, works as style of writing Characteristic point e comes into existence bifurcated, and bifurcated has style of writing characteristic point f, style of writing characteristic point g and style of writing characteristic point h, then first traversal side The style of writing characteristic point f for being 0 degree to angle, the style of writing characteristic point g that secondly traversal direction angle is 90 degree, last traversal direction angle are 270 The style of writing characteristic point h of degree.

Step S106: by style of writing information described in the chronicle recognition engine analysis based on deep learning network, this is identified Multiple characters and intercharacter positional relationship information.

Wherein, the chronicle recognition engine of the embodiment of the present invention can be using based on shot and long term memory network (Long Short- Term Memory, LSTM) deep learning network.Specifically, after the style of writing information that input step S104 is obtained, network can be with Feature is extracted by convolutional neural networks (Convolutional Neural Network, CNN), then the feature of extraction is inputted It completes the identification of multiple character and intercharacter positional relationship information into LSTM network, and may finally export and identify Multiple character.

The schematic illustration of chronicle recognition engine provided in an embodiment of the present invention as shown in Figure 10 can be referred to, input Style of writing information includes the location information between style of writing feature point number and adjacent style of writing characteristic point, is extracted by CNN network 10 Then feature, the convolutional layer that the port number by 3*3 twice is 64 carry out the processing of pond layer, using the port number of 3*3 twice For 128 convolutional layer, the processing of pond layer is then carried out, then the convolutional layer that the port number of 3*3 is 256 twice carries out pond layer Then processing, the convolutional layer that finally port number of 3*3 is 512 twice carry out the feature that layer processing output in pond is extracted.The present invention Embodiment is not limited to be not limited to the convolution of 3*3 in Figure 10, can also be 5*5 etc., the feature of extraction can be divided into multiple timing Then the style of writing information of unit sequentially inputs LSTM network to complete the knowledge of multiple character and intercharacter positional relationship information Not, multiple character that final output identifies.The structure of specific LSTM network can be with reference to the present invention as shown in Figure 11 The structural schematic diagram for the LSTM network that embodiment provides, by taking the image in Fig. 2 as an example, then 11 can be extracted from CNN network The style of writing information of each timing unit is chronologically passed through well-designed referred to as " door " by the style of writing information of a timing unit Structure removes or increases information into cell state, finally the i.e. exportable multiple character identified.

Implement the embodiment of the present invention, by carrying out skeletal extraction to binary map, extracts the framework information of multiple characters, so Style of writing information is extracted from framework information afterwards, style of writing information is passed through into the chronicle recognition engine based on deep learning network, identification Multiple characters and intercharacter positional relationship information are not necessarily to artificial design features, and do not need to do character separation, solve existing There is technology for there are the character of adhesion, separation algorithms cannot be handled well, leads to the technical problem that recognition accuracy is low.

Still further, the principle signal of the chronicle recognition engine of another embodiment provided by the invention as shown in Figure 12 Scheme, the LSTM in the step S106 of the embodiment of the present invention can be two-way LSTM, specifically can be with reference to the present invention shown in Figure 13 The structural schematic diagram for the two-way LSTM network that embodiment provides can be with from CNN network then equally by taking the image in Fig. 2 as an example The style of writing information of each timing unit is chronologically passed through well-designed be referred to as by the style of writing information for extracting 11 timing units It removes for the structure of " door " or increases information into cell state, finally the i.e. exportable multiple character identified.

Multiple characters in the embodiment of the present invention may include mathematic(al) representation in one of the embodiments, then defeated The multiple character identified out may include: according to the multiple character output LaTex expression formula identified.The present invention is implemented Example carries out the identification of numerical character by the deep learning identification model based on timing, the feature extracted by CNN is inputted double It is exportable LaTex expression formula into LSTM network, does not need to be split the character of image, does not also need analysis intercharacter Spatial relation, these information all by deep learning identification model study obtain, that is, realize and identify end to end, because This embodiment of the present invention is adapted to Various Complex scene, and recognition accuracy is greatly improved.

For the ease of better implementing the above scheme of the embodiment of the present invention, the present invention is also corresponding to be provided a kind of image and knows Other device is described in detail with reference to the accompanying drawing:

The structural schematic diagram of pattern recognition device provided in an embodiment of the present invention as shown in Figure 14, pattern recognition device 14 It may include: processing unit 140, extraction unit 142, extract information unit 144 and recognition unit 146, wherein

Processing unit 140 is used to carry out binary conversion treatment to image, obtains binary map；The image includes multiple characters；

Extraction unit 142 is used to carry out skeletal extraction to the binary map, extracts the framework information of multiple character；

Information unit 144 is extracted to be used to extract style of writing information from the framework information；The style of writing information includes style of writing feature Location information between point number and adjacent style of writing characteristic point；

Recognition unit 146 is used to know by style of writing information described in the chronicle recognition engine analysis based on deep learning network Not Chu the multiple character and intercharacter positional relationship information, export the multiple character identified.

Wherein, extraction unit 142 is specifically used for being iterated corrosion treatment to the binary map, until relatively last corrosion The not new pixel of binary map afterwards is corroded；Wherein each iteration corrosion includes the pixel successively traversed in the binary map Point corrodes the pixel for meeting specified requirements.

The pixel that the embodiment of the present invention meets specified requirements may include the target pixel points for meeting following either condition:

The number for the pixel that two-value is 1 in 8 adjacent pixels around condition a, target pixel points is more than or equal to the first threshold Value is less than or equal to second threshold；The first threshold is less than the second threshold；

Condition b, 8 adjacent pixel, the two-value of two neighboring pixel around target pixel points are checked in a clockwise direction The number that sequence is 01 is equal to third threshold value；

It is 0 there are the two-value of at least one pixel in condition c, 4 relatively nearest neighbor pixels of distance；It should be away from From include the pixel adjacent with target pixel points center to the center of the target pixel points at a distance from.

The present invention in one embodiment, extracting information unit 1404 can be specifically used for pressing for the framework information It is traversed according to connected domain, extracts style of writing characteristic point；Wherein in the case where stroke bifurcated, advantage distillation and upper style of writing feature The lesser style of writing characteristic point of deflection of point.

Specifically, extraction unit of the embodiment of the present invention 142 can be extracted from the framework information by style of writing extraction algorithm Style of writing information out, the schematic diagram of the style of writing information provided in an embodiment of the present invention as shown in Fig. 9 a, the pen in the embodiment of the present invention Touching information may include the location information between style of writing feature point number and adjacent style of writing characteristic point；In Fig. 9 a, Mei Gedian As style of writing characteristic point, between adjacent style of writing characteristic point there are in positional relationship, such as Fig. 9 a from style of writing characteristic point a to adjacent Style of writing characteristic point b there are positional relationship, can be indicated by Vector Message from style of writing characteristic point a to adjacent style of writing feature The deflection of point b.

It may include being directed to be somebody's turn to do that extraction unit 142 extracts style of writing information from framework information in one of the embodiments, Framework information is traversed according to connected domain, extracts style of writing characteristic point；Wherein in the case where stroke bifurcated, advantage distillation with it is upper The lesser style of writing characteristic point of the deflection of one style of writing characteristic point.Specifically, another implementation provided by the invention as shown in Fig. 9 b The schematic diagram of the style of writing information of example, style of writing information is that the amplification of the style of writing information of the x in Fig. 9 a shows figure in Fig. 9 b, from style of writing spy Sign point c starts, and according to the connected domain next style of writing characteristic point d of traversal, when style of writing characteristic point e comes into existence bifurcated, bifurcated has pen Characteristic point f, style of writing characteristic point g and style of writing characteristic point h are touched, then the style of writing characteristic point f that first traversal deflection is 0 degree, secondly The style of writing characteristic point g that traversal direction angle is 90 degree, the style of writing characteristic point h that last traversal direction angle is 270 degree.

The present invention in one embodiment, the structure of recognition unit provided in an embodiment of the present invention as shown in Figure 15 Schematic diagram, recognition unit 146 may include feature extraction unit 1460 and character recognition unit 1462, wherein

Feature extraction unit 1460 is used to the style of writing information carrying out feature extraction by convolutional neural networks CNN；

Character recognition unit 1462 is used to for the feature of extraction to be input to progress character knowledge in shot and long term memory network LSTM Not, multiple character and intercharacter positional relationship information are identified.

The present invention in one embodiment, shot and long term memory network LSTM can be two-way LSTM.

The present invention in one embodiment, multiple character may include mathematic(al) representation；

The chronicle recognition engine of the embodiment of the present invention can be using based on shot and long term memory network (Long Short-Term Memory, LSTM) deep learning network.Specifically, after extracting the obtained style of writing information of information unit 144, network can be by Convolutional neural networks (Convolutional Neural Network, CNN) are input to extract feature, then by the feature of extraction The identification of multiple character and intercharacter positional relationship information, multiple word that final output identifies are completed in LSTM network Symbol.The schematic illustration of chronicle recognition engine provided in an embodiment of the present invention shown in Figure 10

The schematic illustration of chronicle recognition engine provided in an embodiment of the present invention as shown in Figure 10 can be referred to, input Style of writing information includes the location information between style of writing feature point number and adjacent style of writing characteristic point, is extracted by CNN network 10 Feature, the embodiment of the present invention are not limited to be not limited to the convolution of 3*3 in Figure 10, can also be 5*5 etc., feature extraction unit 1460 The feature of extraction can be divided into the style of writing information of multiple timing units, sequentially input LSTM network then to complete multiple character And the identification of intercharacter positional relationship information, multiple character that final output identifies.The structure of specific LSTM network The structural schematic diagram that LSTM network provided in an embodiment of the present invention as shown in Figure 11 can be referred to, by taking the image in Fig. 2 as an example, The style of writing information of 11 timing units can so be extracted from CNN network, character recognition unit 1462 is by each timing unit Style of writing information chronologically remove by the well-designed structure for being referred to as " door " or increase information into cell state, It is finally the exportable multiple character identified.

Still further, the principle signal of the chronicle recognition engine of another embodiment provided by the invention as shown in Figure 12 Figure, the LSTM of the embodiment of the present invention can be two-way LSTM, specifically can be with reference to provided in an embodiment of the present invention shown in Figure 13 The structural schematic diagram of two-way LSTM network, then equally by taking the image in Fig. 2 as an example, when can extract 11 from CNN network The style of writing information of sequence unit, character recognition unit 1462 chronologically pass through the style of writing information of each timing unit well-designed The referred to as structure of " door " removes or increases information into cell state, finally the i.e. exportable multiple word identified Symbol.

Multiple characters in the embodiment of the present invention may include mathematic(al) representation in one of the embodiments, then knowing It may include: according to the multiple character output LaTex expression identified that other unit 146, which exports the multiple character identified, Formula.The embodiment of the present invention carries out the identification of numerical character by the deep learning identification model based on timing, will be extracted by CNN Feature input in two-way LSTM network i.e. exportable LaTex expression formula, do not need to be split the character of image, be not required to yet The spatial relation of intercharacter is analyzed, what these information were all obtained by the study of deep learning identification model, that is, realize end To the identification at end, therefore the embodiment of the present invention is adapted to Various Complex scene, and recognition accuracy is greatly improved.

For the ease of better implementing the above scheme of the embodiment of the present invention, the present invention is also corresponding to be provided a kind of image and knows Other equipment is described in detail with reference to the accompanying drawing:

The structural schematic diagram of image recognition apparatus provided in an embodiment of the present invention as shown in Figure 16, image recognition apparatus 16 It may include processor 161, input unit 162, recognition unit 163, memory 164 and communication unit 165, it is processor 161, defeated Entering unit 162, recognition unit 163, memory 164 and communication unit 165 can be connected with each other by bus 166.Memory 164 It can be high speed RAM memory, be also possible to non-volatile memory (non-volatile memory), for example, at least one A magnetic disk storage, memory 704 include the flash in the embodiment of the present invention.Memory 164 optionally can also be at least one A storage system for being located remotely from aforementioned processor 161.Memory 164 may include operation for storing application code System, network communication module, Subscriber Interface Module SIM and image recognition program, communication unit 165 are used to carry out with external unit Information exchange；Processor 161 is configured for calling the program code, executes following steps:

Binary conversion treatment is carried out to the image of input, obtains binary map；The image includes multiple characters；

Skeletal extraction is carried out to the binary map, extracts the framework information of multiple character；

Style of writing information is extracted from the framework information；The style of writing information includes that style of writing feature point number and adjacent style of writing are special Location information between sign point；

The style of writing information is passed through into the chronicle recognition engine based on deep learning network, identifies multiple character and character Between position relation information, export the multiple character identified.

In one of them embodiment, processor 161 carries out skeletal extraction to the binary map, may include:

Corrosion treatment is iterated to the binary map, the not new pixel of the binary map after relatively last corrosion It is corroded；Wherein each iteration corrosion includes the pixel successively traversed in the binary map, to the pixel for meeting specified requirements Corroded.

In one of them embodiment, the pixel for meeting specified requirements includes the target picture for meeting following either condition Vegetarian refreshments:

It is 0 there are the two-value of at least one pixel in 4 relatively nearest neighbor pixels of distance；The distance includes With target pixel points at a distance from the center of adjacent pixel to the center of the target pixel points.

In one of them embodiment, which is passed through the timing based on deep learning network by processor 161 It identifies engine, identifies multiple character and intercharacter positional relationship information, may include:

The style of writing information is subjected to feature extraction by convolutional neural networks CNN；

The feature of extraction is input in shot and long term memory network LSTM and carries out character recognition, identify multiple character and Intercharacter positional relationship information.

In one of them embodiment, shot and long term memory network LSTM is two-way LSTM.

In one of them embodiment, multiple characters may include mathematic(al) representation；

Processor 161 exports the multiple character identified, may include: according to the multiple character output identified LaTex expression formula.

In one of them embodiment, processor 161 extracts style of writing information from the framework information and may include:

It is traversed for the framework information according to connected domain, extracts style of writing characteristic point；Wherein the stroke bifurcated the case where Under, the lesser style of writing characteristic point of deflection of advantage distillation and upper style of writing characteristic point.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims

1. a kind of image-recognizing method characterized by comprising

By style of writing information described in the chronicle recognition engine analysis based on deep learning network, identify the multiple character and Intercharacter positional relationship information.

2. the method as described in claim 1, which is characterized in that described to carry out skeletal extraction to the binary map, comprising:

Corrosion treatment is iterated to the binary map, the not new pixel quilt of the binary map after relatively last corrosion Corrosion；Wherein each iteration corrosion includes the pixel successively traversed in the binary map, to the pixel for meeting specified requirements Corroded.

3. method according to claim 2, which is characterized in that the pixel for meeting specified requirements includes meeting following appoint The target pixel points of one condition:

The number for the pixel that two-value is 1 in 8 adjacent pixels around target pixel points is more than or equal to first threshold, is less than or equal to Second threshold；The first threshold is less than the second threshold；

Check that 8 adjacent pixel around target pixel points, the binary sequence of two neighboring pixel are 01 in a clockwise direction Number is equal to third threshold value；

It is 0 there are the two-value of at least one pixel in 4 relatively nearest neighbor pixels of distance；The distance include with Distance of the center of the adjacent pixel of target pixel points to the center of the target pixel points.

4. the method as described in claim 1, which is characterized in that described to pass through the style of writing information based on deep learning network Chronicle recognition engine, identify the multiple character and intercharacter positional relationship information, comprising:

The feature of extraction is input in shot and long term memory network LSTM and carries out character recognition, identifies the multiple character and word Position relation information between symbol.

5. the method as described in claim 1, which is characterized in that described to include: to image progress binary conversion treatment

Binary conversion treatment is carried out to image using maximum stable extremal region MSER algorithm.

6. method as claimed in claim 4, which is characterized in that the multiple character includes mathematic(al) representation；

It is described identify the multiple character and intercharacter positional relationship information after, further includes: according to identifying Multiple characters export LaTex expression formula.

7. the method as described in claim 1, which is characterized in that it is described from the framework information extract style of writing information include:

It is traversed for the framework information according to connected domain, extracts style of writing characteristic point；Wherein in the case where stroke bifurcated, The lesser style of writing characteristic point of deflection of advantage distillation and upper style of writing characteristic point.

8. a kind of pattern recognition device characterized by comprising

Information unit is extracted, for extracting style of writing information from the framework information；The style of writing information includes style of writing feature point number And the location information between adjacent style of writing characteristic point；

Recognition unit is used to identify described by style of writing information described in the chronicle recognition engine analysis based on deep learning network Multiple characters and intercharacter positional relationship information.

9. device as claimed in claim 8, which is characterized in that the extraction unit is specifically used for changing to the binary map For corrosion treatment, the not new pixel of binary map after relatively last corrosion is corroded；Wherein each iteration corrosion Including successively traversing the pixel in the binary map, the pixel for meeting specified requirements is corroded.

10. device as claimed in claim 9, which is characterized in that the pixel for meeting specified requirements includes below meeting The target pixel points of either condition:

11. device as claimed in claim 8, which is characterized in that the recognition unit includes:

Character recognition unit carries out character recognition for the feature of extraction to be input in shot and long term memory network LSTM, identification Multiple character and intercharacter positional relationship information.

12. device as claimed in claim 11, which is characterized in that the multiple character includes mathematic(al) representation；

The recognition unit is also used to according to the multiple character output LaTex expression formula identified.

13. device as claimed in claim 8, which is characterized in that the extraction information unit is specifically used for, for the skeleton Information is traversed according to connected domain, extracts style of writing characteristic point；Wherein in the case where stroke bifurcated, advantage distillation with upper one Touch the lesser style of writing characteristic point of deflection of characteristic point.

14. a kind of image recognition apparatus, which is characterized in that including processor and memory, the processor and memory are mutual Connection, wherein the memory is configured for calling described program generation for storing application code, the processor Code executes the method according to claim 1 to 7.

15. a kind of computer readable storage medium, which is characterized in that the computer storage medium is stored with computer program, The computer program includes program instruction, and described program instruction makes the processor execute such as right when being executed by a processor It is required that the described in any item methods of 1-7.