CN110147785B - Image recognition method, related device and equipment - Google Patents

Image recognition method, related device and equipment Download PDF

Info

Publication number
CN110147785B
CN110147785B CN201810274802.XA CN201810274802A CN110147785B CN 110147785 B CN110147785 B CN 110147785B CN 201810274802 A CN201810274802 A CN 201810274802A CN 110147785 B CN110147785 B CN 110147785B
Authority
CN
China
Prior art keywords
characters
information
stroke
image
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810274802.XA
Other languages
Chinese (zh)
Other versions
CN110147785A (en
Inventor
李辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810274802.XA priority Critical patent/CN110147785B/en
Publication of CN110147785A publication Critical patent/CN110147785A/en
Application granted granted Critical
Publication of CN110147785B publication Critical patent/CN110147785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/293Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of characters other than Kanji, Hiragana or Katakana

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses an image identification method, which comprises the following steps: carrying out binarization processing on the image to obtain a binary image; the image comprises a plurality of characters; performing skeleton extraction on the binary image to extract skeleton information of the characters; extracting stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points; and analyzing the stroke information through a time sequence recognition engine based on a deep learning network, and recognizing the characters and the position relation information among the characters. The invention also discloses an image recognition device and equipment, which do not need manual design characteristics and character separation, and solve the technical problem of low recognition accuracy rate caused by the fact that separation algorithm cannot well process the adhered characters in the prior art.

Description

Image recognition method, related device and equipment
Technical Field
The invention relates to the field of computers, in particular to an image quilt method, a related device and equipment.
Background
Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a printed Character on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text using a Character Recognition method. The false recognition rate or the recognition accuracy rate is an important index for measuring the good and bad performance of the OCR.
At present, the application field of OCR mathematical character recognition is very wide, and the OCR mathematical character recognition can replace a keyboard to finish high-speed character input humanity in many occasions. For example, OCR is used to perform recognition entry of print documents, which is one of the methods frequently used by many office departments; the method also can carry out automatic segmentation print recognition on complex layouts such as graphs, images, texts and the like; the automatic mail sorting system is realized by recognizing the handwritten numbers; and the handwriting body surface form data can be automatically input, and the handwriting body surface form data can be widely applied to the input and processing of form data such as statement forms, questionnaires and the like in various industries such as governments, tax, insurance, commerce, medical treatment, finance, factories and mines and the like.
In the prior art, when characters in an image are identified, particularly when a mathematical formula is identified, the image is often binarized, then characters are separated, a single mathematical character is extracted by segmentation, the characteristics of the mathematical character are extracted, and then the mathematical formula is generated by deducing the mathematical expression by using a random context grammar-incapable rule according to the position relation between the characters. Then, for the characters with adhesion, the separation algorithm cannot process the characters well, so that the recognition accuracy is low.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide an image recognition method, an image recognition device, an image recognition apparatus, and a computer-readable storage medium, which solve the technical problem in the prior art that a separation algorithm cannot process the characters with adhesion well, resulting in low recognition accuracy.
In order to solve the above technical problem, one aspect of the embodiments of the present invention discloses an image recognition method, including:
carrying out binarization processing on the image to obtain a binary image; the image comprises a plurality of characters;
performing skeleton extraction on the binary image to extract skeleton information of the characters;
extracting stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points;
and analyzing the stroke information through a time sequence recognition engine based on a deep learning network, and recognizing the characters and the position relation information among the characters.
With reference to the above image recognition method, the performing skeleton extraction on the binary image includes:
carrying out iterative corrosion treatment on the binary image until no new pixel point is corroded compared with the binary image subjected to the last corrosion; and each iteration corrosion comprises traversing pixel points in the binary image in sequence and corroding the pixel points meeting specified conditions.
With reference to the image recognition method, the pixels meeting the specified conditions include target pixels meeting any one of the following conditions:
the number of pixels with binary value 1 in 8 adjacent pixels around the target pixel is greater than or equal to a first threshold and less than or equal to a second threshold; the first threshold is less than the second threshold;
checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of two adjacent pixel points is 01 is equal to a third threshold value;
in 4 adjacent pixel points which are relatively nearest, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to a target pixel to a center of the target pixel.
In combination with the above image recognition method, the passing the stroke information through a time sequence recognition engine based on a deep learning network to recognize the multiple characters and the information of the position relationship between the characters includes:
performing feature extraction on the stroke information by a Convolutional Neural Network (CNN);
and inputting the extracted features into a Long Short-Term Memory network (LSTM) for character recognition, and recognizing the characters and the position relation information among the characters.
In combination with the image recognition method, the long-short term memory network LSTM is a bidirectional LSTM.
With reference to the foregoing image recognition method, the performing binarization processing on the image includes:
and (3) carrying out binarization processing on the image by adopting a Maximum Stable Extreme Region (MSER) algorithm.
In combination with the above-mentioned image recognition method, the plurality of characters include mathematical expressions;
after the plurality of characters and the information of the position relationship between the characters are identified, the method further comprises the following steps: outputting a Lateh (LaTex) expression based on the identified plurality of characters.
With reference to the foregoing image recognition method, the extracting stroke information from the skeleton information includes:
traversing the skeleton information according to a connected domain, and extracting stroke feature points; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation.
Another aspect of an embodiment of the present invention discloses an image recognition apparatus, including:
the processing unit is used for carrying out binarization processing on the image to obtain a binary image; the image includes a plurality of characters;
an extraction unit, configured to perform skeleton extraction on the binary image, and extract skeleton information of the plurality of characters;
an extraction information unit for extracting stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points;
the recognition unit is used for analyzing the stroke information through a time sequence recognition engine based on a deep learning network, and recognizing the characters and the position relation information among the characters.
With reference to the image recognition apparatus, the extraction unit is specifically configured to perform iterative etching processing on the binary image until no new pixel point is etched with respect to the binary image after the last etching; and each iteration corrosion comprises traversing pixel points in the binary image in sequence and corroding the pixel points meeting specified conditions.
With reference to the image recognition apparatus, the pixels meeting the specified conditions include target pixels meeting any one of the following conditions:
the number of pixels with binary value 1 in 8 adjacent pixels around the target pixel is greater than or equal to a first threshold and less than or equal to a second threshold; the first threshold is less than the second threshold;
checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of two adjacent pixel points is 01 is equal to a third threshold value;
in the nearest adjacent pixel points, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to a target pixel to a center of the target pixel.
In combination with the above image recognition apparatus, the recognition unit includes:
the characteristic extraction unit is used for extracting the characteristics of the stroke information through a Convolutional Neural Network (CNN);
and the character recognition unit is used for inputting the extracted features into the long-short term memory network LSTM to perform character recognition and recognizing the characters and the position relation information among the characters.
In combination with the above-mentioned image recognition apparatus, the plurality of characters include mathematical expressions;
the recognizing unit outputting the recognized characters includes: and outputting a LaTex expression according to the plurality of recognized characters.
With reference to the image recognition apparatus, the information extraction unit is specifically configured to traverse the skeleton information according to a connected domain, and extract a stroke feature point; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation.
In another aspect of the embodiment of the present invention, an image recognition apparatus is disclosed, which includes a processor and a memory, where the processor and the memory are connected to each other, where the memory is used for storing application program codes, and the processor is configured to call the program codes to execute an image recognition method as described above.
Another aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, the computer program comprising program instructions, which, when executed by a processor, cause the processor to execute an image recognition method as described above.
By implementing the embodiment of the invention, the skeleton of the binary image is extracted, the skeleton information of a plurality of characters is extracted, then the stroke information is extracted from the skeleton information, the stroke information passes through a time sequence recognition engine based on a deep learning network, and the plurality of characters and the position relation information among the characters are recognized, so that the characteristics do not need to be designed manually, and the character separation does not need to be carried out, thereby solving the technical problem of low recognition accuracy rate caused by the fact that separation algorithm cannot be well processed for the characters with adhesion in the prior art; particularly, the embodiment of the invention identifies the digital characters through a time sequence-based deep learning identification model, and inputs the characteristics extracted through CNN into a bidirectional LSTM network to output a LaTex expression without segmenting the characters of the image and analyzing the spatial position relationship among the characters, and the information is obtained by the deep learning identification model, namely, the end-to-end identification is realized, so the embodiment of the invention can adapt to various complex scenes, and the identification accuracy is greatly improved.
Drawings
For the purpose of illustrating embodiments of the present invention or solutions in the prior art, the drawings used in the description of the embodiments or solutions in the prior art will be briefly described below.
FIG. 1 is a schematic flow chart diagram illustrating a pattern recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an input image provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of a binary map provided by an embodiment of the present invention;
FIG. 4 is a diagram illustrating image skeleton extraction provided by an embodiment of the invention;
fig. 5 is a schematic structural diagram of a pixel provided in an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a pixel point according to another embodiment of the present invention;
fig. 7 is a schematic structural diagram of an example of a pixel according to another embodiment of the present invention;
FIG. 8 is a schematic diagram of image skeleton extraction according to another embodiment of the present invention;
FIG. 9a is a schematic diagram of stroke information provided by an embodiment of the present invention;
FIG. 9b is a schematic diagram of stroke information according to another embodiment of the present invention;
FIG. 10 is a schematic diagram of a timing identification engine provided by an embodiment of the invention;
fig. 11 is a schematic structural diagram of an LSTM network provided by an embodiment of the present invention;
FIG. 12 is a schematic diagram of a timing identification engine according to another embodiment of the present invention;
fig. 13 is a schematic structural diagram of a bidirectional LSTM network provided by an embodiment of the present invention;
fig. 14 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention;
FIG. 15 is a schematic structural diagram of an identification unit provided in an embodiment of the present invention;
fig. 16 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
In specific implementations, the terminal or device described in the embodiments of the present invention includes, but is not limited to, portable mobile terminals such as desktop computers, laptop computers, tablet computers, smart terminals such as smart phones, smart watches, smart glasses, and the like.
In order to better understand an image recognition method, an image recognition device, and an image recognition apparatus provided in the embodiments of the present invention, an image recognition scene in the embodiments of the present invention is described first. The image recognition in the embodiment of the present invention is a process of recognizing an image including a plurality of characters, for example, a mathematical formula, and outputting the characters in the image after the image recognition device or the image recognition apparatus acquires the image to be recognized. The output characters are convenient for relevant personnel to enter information, or for a postal system to sort letters, or for subsequent searching of relevant information matched with the letters, and the like.
An image recognition method, an image recognition apparatus, and an image recognition device according to embodiments of the present invention are described in detail below with reference to the accompanying drawings. Fig. 1 shows a flow chart of a pattern recognition method according to an embodiment of the present invention, which may include the following steps:
step S100: carrying out binarization processing on the image to obtain a binary image;
specifically, the image in the embodiment of the present invention may include a plurality of characters; binarization (Image Binarization) of an Image is a process of setting the gray value of a pixel point on the Image to be 0 or 255 so as to obtain a binary Image, i.e. the whole Image presents an obvious black-white effect. The embodiment of the invention can represent the binary value of the pixel point with the gray value of 0 as 0 after the binary value, and represent the binary value of the pixel point with the gray value of 255 as 1.
In one embodiment of the present invention, the binarization algorithm may adopt a Maximum Stable Extremum Region (MSER) algorithm with the best performance for affine invariant Regions to extract connected Regions, filter out Regions with small size, large size and abnormal aspect ratio, and output a binary image. Referring specifically to fig. 2, which is a schematic diagram of an input image provided by an embodiment of the present invention, the image in fig. 2 includes a plurality of characters that form a mathematical expression; after the image is binarized in step S100, a schematic diagram of the binary image provided by the embodiment of the present invention as shown in fig. 3 is obtained, and an image exhibiting a significant black and white effect is output.
Step S102: performing skeleton extraction on the binary image to extract skeleton information of the characters;
specifically, as shown in fig. 4, the image skeleton extraction is provided in the embodiment of the present invention, and the image skeleton extraction is to extract a central pixel contour of the target on the image, that is, to refine the target with reference to the target center. The framework extraction algorithm can be divided into two categories of iteration and non-iteration, and in the iteration algorithm, the framework extraction algorithm is divided into two categories of parallel iteration and sequential iteration, and the like.
In one embodiment of the present invention, iterative etching processing may be performed on the binary image until no new pixel point is etched with respect to the binary image after the last etching; and each iterative corrosion comprises traversing pixel points in the binary image in sequence and corroding the pixel points meeting specified conditions.
It should be noted that the erosion in the embodiment of the present invention may refer to removing some parts of the image in morphology, and specifically may refer to deleting some pixels in the boundary of the object, and then the erosion on the binary image may refer to deleting a pixel point with a binary value of 1 in the binary image, that is, changing the pixel point with the binary value of 1 into a pixel point with a binary value of 0.
Specifically, the specified condition may be set according to the self skeletonization requirement, for example, the pixel point meeting the specified condition in the present invention may include a target pixel point meeting any one of the following conditions:
the condition a is that the number of pixels with binary value 1 in 8 adjacent pixels around a target pixel is greater than or equal to a first threshold and less than or equal to a second threshold; the first threshold is less than the second threshold; specifically, the following formula 1 may be referred to:
formula 1 where the first threshold is less than or equal to B (P1) and less than or equal to the second threshold
Referring to fig. 5, which shows a schematic structural diagram of a pixel provided in the embodiment of the present invention, P1 is a target pixel that we want to determine whether to corrode (or delete), and pixels adjacent to 8 around P1 are marked as P2, P3, P4, P5, P6, P7, P8, and P9; in the embodiment of the present invention, taking binary value of a pixel point as 0 or 1 as an example, B (P1) refers to the number of pixel points with binary value 1 in 8 adjacent pixel points around the center pixel point P1 (i.e., the target pixel point), that is, B (P1) = P2+ P3+ P4+ P5+ P6+ P7+ P8+ P9. In one embodiment, the first threshold may be 2 and the second threshold may be 6.
B, checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of the two adjacent pixel points is 01 is equal to a third threshold value; specifically, the following formula 2 may be referred to:
a (P1) = third threshold value formula 2
Referring to fig. 6, a schematic structural diagram of a pixel point according to another embodiment of the present invention may be shown, where the direction is clockwise, i.e., from P3 to P4 to P5 to P6, and so on, and the direction is from P2 to P3; a (P1) is the number of times that 8 adjacent pixels around the target pixel are viewed in the clockwise direction, and the binary sequence of the two adjacent pixels is 01.
In one embodiment, the third threshold may be 1, and then, taking fig. 7 as an example, fig. 7 shows an example structural diagram of a pixel point according to another embodiment of the present invention, it can be seen from the example on the left that the number of times that the binary sequence of two adjacent pixel points is 01 is 2, the sequence from P2 to P3 is 01, and the sequence from P6 to P7 is 01, and then the condition b is not met; and the right example shows that the number of times that the binary sequence of two adjacent pixel points is 01 is 1, and only if the binary sequence from P9 to P2 is 01, the P1 point is corroded if the condition b is met.
Under the condition c, in 4 adjacent pixel points which are relatively closest to each other, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to the target pixel to a center of the target pixel. Specifically, the following formula 3 may be referred to:
p2 × P4 × P6 × P8=0 formula 3
With reference to the structural schematic diagram of the pixel points provided in the embodiment of the present invention shown in fig. 5, taking P1 as a target pixel point, the adjacent pixel points relatively closest to P1 are P2, P4, P6, and P8, that is, the distances from the centers of P2, P4, P6, and P8 to the center of P1 are all smaller than the distances from the centers of P3, P5, P7, and P9 to the center of P1; in a particularly ideal situation, the distances from the centers of P2, P4, P6, and P8 to the center of P1 are equal, and all are nearest neighboring pixels, that is, condition c in the embodiment of the present invention may also be that, in the nearest neighboring pixels, at least one pixel has a binary value of 0. For example, if the binary value of P2 is 0, the condition c is met, and the P1 point is corroded. If none of the binary values of P2, P4, P6, and P8 is 0, the condition c is not met.
Further, when the iteration is an odd number of iterations, it may be determined whether P2 × P4 × P6=0 or P4 × P6 × P8=0 is established, and if so, the condition c is met, and the P1 point is corroded; when the iteration is even iterations, whether P2 × P4 × P8=0 or P2 × P6 × P8=0 is determined, and if yes, the condition c is met, and the P1 point is corroded.
Taking the binary image shown in fig. 3 as an example, the skeleton extraction is performed through step S102 to extract skeleton information of the plurality of characters, and the obtained effect image can refer to the schematic diagram of image skeleton extraction of another embodiment provided by the present invention shown in fig. 8, and skeletonization of the character image is realized through multiple iterations of expansion and erosion, so that the target in the image becomes thinner and thinner.
Step S104: extracting stroke information from the skeleton information;
specifically, in the embodiment of the present invention, the stroke information is extracted from the skeleton information by using a stroke extraction algorithm, for example, as shown in a schematic diagram of the stroke information provided in the embodiment of the present invention in fig. 9a, the stroke information in the embodiment of the present invention may include the number of stroke feature points and position information between adjacent stroke feature points; as shown in fig. 9a, each point is a brush-tip feature point, and a positional relationship exists between adjacent brush-tip feature points, for example, a positional relationship exists between a brush-tip feature point a and an adjacent brush-tip feature point b in fig. 9a, and a direction angle from the brush-tip feature point a to the adjacent brush-tip feature point b can be represented by vector information.
In one embodiment, extracting the stroke information from the skeleton information may include traversing the skeleton information according to a connected domain, and extracting stroke feature points; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation. The connected domain in the embodiment of the invention can be a region connected with the pen-touch characteristic points; the stroke bifurcation in the embodiment of the invention can mean that when traversing the stroke feature points from a certain stroke feature point along a certain direction, when a plurality of next connected stroke feature points exist, the stroke bifurcation occurs; the direction angle in the embodiment of the present invention refers to a direction angle existing between the current stroke feature point and the previous connected stroke feature point, and specifically may be an included angle between a direction of traversing the previous connected stroke feature point and a direction of traversing the current stroke feature point. Specifically, as shown in fig. 9b, which is a schematic diagram of the pen-touch information according to another embodiment provided by the present invention, the pen-touch information in fig. 9b is an enlarged display of the pen-touch information x in fig. 9a, and when a pen-touch feature point c starts to branch, and a pen-touch feature point f, a pen-touch feature point g, and a pen-touch feature point h branch from a pen-touch feature point c, a next pen-touch feature point d is traversed according to a connected domain, the pen-touch feature point f with a direction angle of 0 degree is traversed first, the pen-touch feature point g with a direction angle of 90 degrees is traversed second, and the pen-touch feature point h with a direction angle of 270 degrees is traversed last.
Step S106: and analyzing the stroke information through a time sequence recognition engine based on a deep learning network to recognize the plurality of characters and the position relation information among the characters.
The timing sequence recognition engine of the embodiment of the invention can adopt a deep learning network based on a Long Short-Term Memory network (LSTM). Specifically, after the stroke information obtained in step S104 is input, the Network may extract features by a Convolutional Neural Network (CNN), and then input the extracted features into the LSTM Network to complete the recognition of the multiple characters and the information of the position relationship between the characters, and finally may output the recognized multiple characters.
Referring to the schematic diagram of the timing recognition engine provided in the embodiment of the present invention as shown in fig. 10, the input stroke information includes the number of stroke feature points and the position information between adjacent stroke feature points, the features are extracted through the CNN network 10, the convolution layer with the channel number of 3 × 3 twice being 64 is processed, then the pooling layer processing is performed, the convolution layer with the channel number of 3 × 3 twice being 128 is processed, then the pooling layer processing is performed, the convolution layer with the channel number of 3 × 3 twice being 256 is processed, then the pooling layer processing is performed, finally the convolution layer with the channel number of 3 × 3 twice being 512 is processed, and then the extracted features are output through the pooling layer processing. The embodiment of the present invention is not limited to the convolution of 3 × 3 in fig. 10, and may also be 5 × 5, and the like, and the extracted features may be divided into stroke information of a plurality of time sequence units, and then sequentially input to the LSTM network to complete the recognition of the plurality of characters and the information of the position relationship between the characters, and finally output the plurality of recognized characters. For a specific structure of the LSTM network, referring to the schematic structural diagram of the LSTM network provided by the embodiment of the present invention shown in fig. 11, taking the image in fig. 2 as an example, the stroke information of 11 time sequence units can be extracted from the CNN network, the stroke information of each time sequence unit is removed or added to the cell state through a well-designed structure called a "gate" according to time sequence, and finally the plurality of recognized characters can be output.
According to the embodiment of the invention, the skeleton extraction is carried out on the binary image, the skeleton information of a plurality of characters is extracted, then the stroke information is extracted from the skeleton information, the stroke information passes through the time sequence recognition engine based on the deep learning network, the plurality of characters and the position relation information among the characters are recognized, the characteristics do not need to be designed manually, and the character separation does not need to be carried out, so that the technical problem of low recognition accuracy rate caused by the fact that separation algorithm cannot be well processed for the characters with adhesion in the prior art is solved.
Still further, as shown in fig. 12, which is a schematic diagram of a timing recognition engine according to another embodiment of the present invention, the LSTM in step S106 according to the embodiment of the present invention may be a bidirectional LSTM, and specifically, refer to a schematic diagram of a structure of a bidirectional LSTM network according to the embodiment of the present invention shown in fig. 13, so that, also taking the image in fig. 2 as an example, stroke information of 11 timing units may be extracted from the CNN network, the stroke information of each timing unit may be removed or added to a cell state through a well-designed structure called a "gate" according to a timing sequence, and finally, the plurality of recognized characters may be output.
In one embodiment, the plurality of characters in the embodiments of the present invention may include a mathematical expression, and outputting the identified plurality of characters may include: and outputting a LaTex expression according to the plurality of recognized characters. The embodiment of the invention identifies the digital characters through the time sequence-based deep learning identification model, and inputs the characteristics extracted through CNN into the bidirectional LSTM network to output the LaTex expression without segmenting the characters of the image or analyzing the spatial position relationship among the characters, and the information is obtained by the deep learning identification model, namely, the end-to-end identification is realized, so the embodiment of the invention can adapt to various complex scenes, and the identification accuracy is greatly improved.
In order to better implement the above scheme of the embodiment of the present invention, the present invention further provides an image recognition apparatus, which is described in detail below with reference to the accompanying drawings:
as shown in fig. 14, which is a schematic structural diagram of an image recognition apparatus provided in an embodiment of the present invention, the image recognition apparatus 14 may include: a processing unit 140, an extraction unit 142, an extracted information unit 144, and a recognition unit 146, wherein,
the processing unit 140 is configured to perform binarization processing on the image to obtain a binary image; the image includes a plurality of characters;
the extracting unit 142 is configured to perform skeleton extraction on the binary image, and extract skeleton information of the plurality of characters;
the extraction information unit 144 is configured to extract stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points;
the recognition unit 146 is configured to recognize the plurality of characters and the information of the position relationship between the characters by analyzing the stroke information through a timing recognition engine based on a deep learning network, and output the recognized plurality of characters.
The extracting unit 142 is specifically configured to perform iterative etching processing on the binary image until no new pixel point is etched with respect to the binary image after the last etching; and each iteration corrosion comprises traversing pixel points in the binary image in sequence and corroding the pixel points meeting specified conditions.
The pixel points meeting the specified conditions in the embodiment of the invention can comprise target pixel points meeting any one of the following conditions:
the condition a is that the number of pixel points with binary value 1 in 8 adjacent pixel points around the target pixel point is more than or equal to a first threshold value and less than or equal to a second threshold value; the first threshold is less than the second threshold;
b, checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of two adjacent pixel points is 01 is equal to a third threshold;
under the condition c, in 4 adjacent pixel points which are relatively nearest, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to the target pixel to a center of the target pixel.
In one embodiment of the present invention, the information extracting unit 1404 may be specifically configured to traverse according to a connected domain for the skeleton information, and extract a brush stroke feature point; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation.
Specifically, the extracting unit 142 in the embodiment of the present invention may extract the stroke information from the skeleton information through a stroke extraction algorithm, as shown in a schematic diagram of the stroke information provided in the embodiment of the present invention shown in fig. 9a, where the stroke information in the embodiment of the present invention may include the number of the stroke feature points and the position information between the adjacent stroke feature points; as shown in fig. 9a, each point is a brush-tip feature point, and a positional relationship exists between adjacent brush-tip feature points, for example, a positional relationship exists between a brush-tip feature point a and an adjacent brush-tip feature point b in fig. 9a, and a direction angle from the brush-tip feature point a to the adjacent brush-tip feature point b can be represented by vector information.
In one embodiment, the extracting unit 142 may extract the stroke information from the skeleton information, including traversing the skeleton information according to the connected domain, and extracting the stroke feature points; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation. Specifically, as shown in fig. 9b, the stroke information in fig. 9b is an enlarged display diagram of the stroke information of x in fig. 9a, and from the stroke feature point c, the next stroke feature point d is traversed according to the connected domain, and when the stroke feature point e starts to diverge and is bifurcated with the stroke feature point f, the stroke feature point g, and the stroke feature point h, the stroke feature point f with the direction angle of 0 degree is traversed preferentially, the stroke feature point g with the direction angle of 90 degrees is traversed secondarily, and the stroke feature point h with the direction angle of 270 degrees is traversed finally.
In one embodiment of the present invention, as shown in fig. 15, which is a schematic structural diagram of the recognition unit provided in the embodiment of the present invention, the recognition unit 146 may include a feature extraction unit 1460 and a character recognition unit 1462, wherein,
the feature extraction unit 1460 is configured to perform feature extraction on the stroke information by using a convolutional neural network CNN;
the character recognition unit 1462 is configured to input the extracted features to the long-term and short-term memory network LSTM for character recognition, and recognize the plurality of characters and the positional relationship information between the characters.
In one embodiment of the present invention, the long short term memory network LSTM may be a bidirectional LSTM.
In one embodiment of the present invention, the plurality of characters may include a mathematical expression;
the timing sequence recognition engine of the embodiment of the invention can adopt a deep learning network based on a Long Short-Term Memory network (LSTM). Specifically, after the stroke information obtained by the information unit 144 is extracted, the Network may extract features by a Convolutional Neural Network (CNN), and then input the extracted features into the LSTM Network to complete the recognition of the plurality of characters and the information of the position relationship between the characters, and finally output the plurality of recognized characters. FIG. 10 is a schematic diagram of a timing recognition engine according to an embodiment of the present invention
Referring to fig. 10, which is a schematic diagram of a time sequence recognition engine provided in an embodiment of the present invention, the input stroke information includes the number of stroke feature points and position information between adjacent stroke feature points, and features are extracted through the CNN network 10, the embodiment of the present invention is not limited to convolution with 3 × 3 in fig. 10, and may also be convolution with 5 × 5, and the features extracted by the feature extraction unit 1460 may be divided into stroke information of a plurality of time sequence units, and then the stroke information is sequentially input to the LSTM network to complete recognition of the plurality of characters and position relationship information between characters, and finally the plurality of recognized characters are output. For a specific structure of the LSTM network, referring to the schematic structural diagram of the LSTM network provided by the embodiment of the present invention shown in fig. 11, taking the image in fig. 2 as an example, the stroke information of 11 time sequence units can be extracted from the CNN network, the character recognition unit 1462 removes or adds information to the cell state according to the stroke information of each time sequence unit through a well-designed structure called "gate", and finally, the plurality of recognized characters can be output.
By implementing the embodiment of the invention, the skeleton of the binary image is extracted, the skeleton information of a plurality of characters is extracted, the stroke information is extracted from the skeleton information, the stroke information passes through the time sequence recognition engine based on the deep learning network, the plurality of characters and the position relation information among the characters are recognized, the characteristics do not need to be designed manually, and the character separation does not need to be carried out, so that the technical problem of low recognition accuracy caused by the fact that the separation algorithm cannot be well processed for the characters with adhesion in the prior art is solved.
Still further, as shown in fig. 12, which is a schematic diagram of a timing recognition engine according to another embodiment of the present invention, an LSTM according to an embodiment of the present invention may be a bidirectional LSTM, and specifically, refer to a schematic diagram of a structure of a bidirectional LSTM network according to an embodiment of the present invention shown in fig. 13, so that, also taking an image in fig. 2 as an example, stroke information of 11 timing units may be extracted from a CNN network, a character recognition unit 1462 removes or adds information to a cell state from the stroke information of each timing unit according to a time sequence through a well-designed structure called a "gate", and finally, the plurality of recognized characters may be output.
In one embodiment, the plurality of characters in the embodiment of the present invention may include mathematical expressions, and the outputting of the recognized plurality of characters by the recognition unit 146 may include: and outputting a LaTex expression according to the plurality of recognized characters. The embodiment of the invention identifies the digital characters through the time sequence-based deep learning identification model, inputs the characteristics extracted through CNN into the bidirectional LSTM network, and can output the LaTex expression without segmenting the characters of the image and analyzing the space position relationship among the characters, and the information is obtained by the deep learning identification model, namely the end-to-end identification is realized.
In order to better implement the above solution of the embodiment of the present invention, the present invention further provides an image recognition apparatus, which is described in detail below with reference to the accompanying drawings:
as shown in fig. 16, which is a schematic structural diagram of the image recognition apparatus provided in the embodiment of the present invention, the image recognition apparatus 16 may include a processor 161, an input unit 162, a recognition unit 163, a memory 164, and a communication unit 165, and the processor 161, the input unit 162, the recognition unit 163, the memory 164, and the communication unit 165 may be connected to each other by a bus 166. The memory 164 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory, and the memory 704 includes a flash in an embodiment of the present invention. The memory 164 may optionally be at least one memory system located remotely from the processor 161. The memory 164 is used for storing application program codes and may include an operating system, a network communication module, a user interface module, and an image recognition program, and the communication unit 165 is used for information interaction with an external unit; processor 161 is configured to call the program code to perform the following steps:
carrying out binarization processing on an input image to obtain a binary image; the image includes a plurality of characters;
performing skeleton extraction on the binary image to extract skeleton information of the characters;
extracting stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points;
and the stroke information passes through a time sequence recognition engine based on a deep learning network, the characters and the position relation information among the characters are recognized, and the recognized characters are output.
In one embodiment, the processor 161 performs skeleton extraction on the binary image, which may include:
carrying out iterative corrosion treatment on the binary image until no new pixel point is corroded compared with the binary image subjected to the last corrosion; and each iterative corrosion comprises traversing pixel points in the binary image in sequence and corroding the pixel points meeting specified conditions.
In one embodiment, the pixel points meeting the specified condition include target pixel points meeting any one of the following conditions:
the number of pixels with binary value 1 in 8 adjacent pixels around the target pixel is greater than or equal to a first threshold and less than or equal to a second threshold; the first threshold is less than the second threshold;
checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of two adjacent pixel points is 01 is equal to a third threshold value;
in 4 adjacent pixel points which are relatively nearest, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to the target pixel to a center of the target pixel.
In one embodiment, the process 161 of passing the stroke information through a time sequence recognition engine based on a deep learning network to recognize the plurality of characters and the information of the position relationship between the characters may include:
extracting the characteristics of the stroke information by a Convolutional Neural Network (CNN);
and inputting the extracted features into a long-short term memory network (LSTM) for character recognition, and recognizing the characters and the position relation information among the characters.
In one embodiment, the long short term memory network LSTM is a two-way LSTM.
In one embodiment thereof, the plurality of characters may comprise a mathematical expression;
the processor 161 outputting the recognized plurality of characters may include: and outputting a LaTex expression according to the plurality of recognized characters.
In one embodiment, the processor 161 extracting the stroke information from the skeleton information may include:
traversing the skeleton information according to the connected domain, and extracting the stroke feature points; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation.
By implementing the embodiment of the invention, the skeleton extraction is carried out on the binary image, the skeleton information of a plurality of characters is extracted, then the stroke information is extracted from the skeleton information, the stroke information passes through a time sequence recognition engine based on a deep learning network, and the plurality of characters and the position relation information among the characters are recognized, so that the characteristics are not required to be manually designed, and the character separation is not required, thereby solving the technical problem of low recognition accuracy rate caused by the fact that the separation algorithm cannot be well processed for the characters with adhesion in the prior art; particularly, the embodiment of the invention identifies the digital characters through a time sequence-based deep learning identification model, and inputs the characteristics extracted through CNN into a bidirectional LSTM network to output a LaTex expression without segmenting the characters of the image and analyzing the spatial position relationship among the characters, and the information is obtained by the deep learning identification model, namely, the end-to-end identification is realized, so the embodiment of the invention can adapt to various complex scenes, and the identification accuracy is greatly improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (15)

1. An image recognition method, comprising:
carrying out binarization processing on the image to obtain a binary image; the image comprises a plurality of characters;
performing skeleton extraction on the binary image to extract skeleton information of the characters;
extracting stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points;
analyzing the stroke information through a time sequence recognition engine based on a deep learning network, and recognizing the plurality of characters and the position relation information among the characters;
after obtaining the stroke information, extracting features through a convolutional neural network included in the time sequence recognition engine based on the deep learning network, and inputting the extracted features into a long-short term memory network included in the time sequence recognition engine based on the deep learning network to complete recognition of the characters and the position relation information among the characters.
2. The method of claim 1, wherein the skeleton extraction of the binary image comprises:
carrying out iterative corrosion treatment on the binary image until no new pixel point is corroded compared with the binary image subjected to the last corrosion; and each iterative corrosion comprises traversing the pixel points in the binary image in sequence and corroding the pixel points meeting the specified conditions.
3. The method of claim 2, wherein the pixels meeting the specified condition comprise target pixels meeting any one of the following conditions:
the number of pixels with binary value 1 in 8 adjacent pixels around the target pixel is greater than or equal to a first threshold and less than or equal to a second threshold; the first threshold is less than the second threshold;
checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of two adjacent pixel points is 01 is equal to a third threshold value;
in 4 adjacent pixel points which are relatively nearest, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to a target pixel to a center of the target pixel.
4. The method of claim 1, wherein the identifying the plurality of characters and the inter-character positional relationship information by analyzing the stroke information through a deep learning network-based timing recognition engine comprises:
performing feature extraction on the stroke information by a Convolutional Neural Network (CNN);
and inputting the extracted features into a long-short term memory network (LSTM) for character recognition, and recognizing the characters and the position relation information among the characters.
5. The method as claimed in claim 1, wherein the binarizing processing on the image comprises:
and (4) carrying out binarization processing on the image by adopting a MSER algorithm.
6. The method of claim 4, wherein the plurality of characters comprise a mathematical expression;
after the plurality of characters and the information of the position relationship between the characters are identified, the method further comprises the following steps: and outputting a LaTex expression according to the plurality of recognized characters.
7. The method of claim 1, wherein the extracting stroke information from the skeletal information comprises:
traversing the skeleton information according to a connected domain, and extracting stroke feature points; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation.
8. An image recognition apparatus, comprising:
the processing unit is used for carrying out binarization processing on the image to obtain a binary image; the image includes a plurality of characters;
an extraction unit, configured to perform skeleton extraction on the binary image, and extract skeleton information of the plurality of characters;
an extraction information unit for extracting stroke information from the skeleton information; the stroke information comprises the number of the stroke characteristic points and position information between the adjacent stroke characteristic points;
the recognition unit is used for analyzing the stroke information through a time sequence recognition engine based on a deep learning network and recognizing the characters and the position relation information among the characters;
after obtaining the stroke information, the recognition unit is configured to extract features through a convolutional neural network included in the deep learning network-based timing sequence recognition engine, and input the extracted features into a long-term and short-term memory network included in the deep learning network-based timing sequence recognition engine to complete recognition of the plurality of characters and the information on the position relationship between the characters.
9. The apparatus according to claim 8, wherein the extracting unit is specifically configured to perform iterative erosion processing on the binary image until no new pixel point is eroded relative to the binary image after the last erosion; and each iterative corrosion comprises traversing the pixel points in the binary image in sequence and corroding the pixel points meeting the specified conditions.
10. The apparatus of claim 9, wherein the pixels meeting the specified condition comprise target pixels meeting any of the following conditions:
the number of pixels with binary value 1 in 8 adjacent pixels around the target pixel is greater than or equal to a first threshold and less than or equal to a second threshold; the first threshold is less than the second threshold;
checking 8 adjacent pixel points around the target pixel point in a clockwise direction, wherein the frequency that the binary sequence of two adjacent pixel points is 01 is equal to a third threshold value;
in 4 adjacent pixel points which are relatively nearest, the binary value of at least one pixel point is 0; the distance includes a distance from a center of a pixel adjacent to a target pixel to a center of the target pixel.
11. The apparatus of claim 8, wherein the identification unit comprises:
the characteristic extraction unit is used for extracting the characteristics of the stroke information through a Convolutional Neural Network (CNN);
and the character recognition unit is used for inputting the extracted features into the long-short term memory network LSTM to perform character recognition and recognizing the characters and the position relation information among the characters.
12. The apparatus of claim 11, wherein the plurality of characters comprise mathematical expressions;
the identification unit is further used for outputting a LaTex expression according to the identified characters.
13. The apparatus of claim 8, wherein the information extracting unit is specifically configured to extract a brush-touch feature point by traversing the skeleton information according to a connected component; and preferentially extracting the stroke feature points with smaller direction angles with the previous stroke feature point under the condition of stroke bifurcation.
14. An image recognition device comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store application program code, and wherein the processor is configured to invoke the program code to perform the method of any of claims 1-7.
15. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.
CN201810274802.XA 2018-03-29 2018-03-29 Image recognition method, related device and equipment Active CN110147785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810274802.XA CN110147785B (en) 2018-03-29 2018-03-29 Image recognition method, related device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810274802.XA CN110147785B (en) 2018-03-29 2018-03-29 Image recognition method, related device and equipment

Publications (2)

Publication Number Publication Date
CN110147785A CN110147785A (en) 2019-08-20
CN110147785B true CN110147785B (en) 2023-01-10

Family

ID=67588309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810274802.XA Active CN110147785B (en) 2018-03-29 2018-03-29 Image recognition method, related device and equipment

Country Status (1)

Country Link
CN (1) CN110147785B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104945A (en) * 2019-12-17 2020-05-05 上海博泰悦臻电子设备制造有限公司 Object identification method and related product
CN111428593A (en) * 2020-03-12 2020-07-17 北京三快在线科技有限公司 Character recognition method and device, electronic equipment and storage medium
CN112800987B (en) * 2021-02-02 2023-07-21 中国联合网络通信集团有限公司 Chinese character processing method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996347A (en) * 2006-09-14 2007-07-11 浙江大学 Visualized reproduction method based on handwriting image
CN104408455A (en) * 2014-11-27 2015-03-11 上海理工大学 Adherent character partition method
CN105512692A (en) * 2015-11-30 2016-04-20 华南理工大学 BLSTM-based online handwritten mathematical expression symbol recognition method
CN105654127A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 End-to-end-based picture character sequence continuous recognition method
CN106407971A (en) * 2016-09-14 2017-02-15 北京小米移动软件有限公司 Text recognition method and device
CN107273897A (en) * 2017-07-04 2017-10-20 华中科技大学 A kind of character recognition method based on deep learning
CN107403180A (en) * 2017-06-30 2017-11-28 广州广电物业管理有限公司 A kind of numeric type equipment detection recognition method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8873813B2 (en) * 2012-09-17 2014-10-28 Z Advanced Computing, Inc. Application of Z-webs and Z-factors to analytics, search engine, learning, recognition, natural language, and other utilities

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996347A (en) * 2006-09-14 2007-07-11 浙江大学 Visualized reproduction method based on handwriting image
CN104408455A (en) * 2014-11-27 2015-03-11 上海理工大学 Adherent character partition method
CN105512692A (en) * 2015-11-30 2016-04-20 华南理工大学 BLSTM-based online handwritten mathematical expression symbol recognition method
CN105654127A (en) * 2015-12-30 2016-06-08 成都数联铭品科技有限公司 End-to-end-based picture character sequence continuous recognition method
CN106407971A (en) * 2016-09-14 2017-02-15 北京小米移动软件有限公司 Text recognition method and device
CN107403180A (en) * 2017-06-30 2017-11-28 广州广电物业管理有限公司 A kind of numeric type equipment detection recognition method and system
CN107273897A (en) * 2017-07-04 2017-10-20 华中科技大学 A kind of character recognition method based on deep learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A Character Recognition Algorithm for Unhealthy-text Embedded in Web Images;Sun Yan 等;《Proceedings of 14th Youth Conference on Communication》;20091231;第453-458页 *
Generic Text Recognition using Long Short-Term Memory Networks;Adnan Ul-Hasan;《reserchgate》;20160111;第1-179页 *
Segmentation-free Handwritten Chinese Text Recognition with LSTM-RNN;Ronaldo Messina 等;《2015 13th International Conference on Document Analysis and Recognition (ICDAR)》;20150823;第171-175页 *
Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks;Jun Liu 等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20171219;第27卷(第4期);第1-13页 *
一种书法字骨架提取优化方法;张九龙 等;《西安理工大学学报》;20160330;第32卷(第1期);第35-38页 *
基于二维RNN的CAPTCHA识别;陈睿 等;《小型微型计算机系统》;20140315;第504-508页 *

Also Published As

Publication number Publication date
CN110147785A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
Bhowmik et al. Text and non-text separation in offline document images: a survey
Saba et al. Annotated comparisons of proposed preprocessing techniques for script recognition
CN106980856B (en) Formula identification method and system and symbolic reasoning calculation method and system
CN109685065B (en) Layout analysis method and system for automatically classifying test paper contents
WO2010092952A1 (en) Pattern recognition device
US20140193029A1 (en) Text Detection in Images of Graphical User Interfaces
CN110147785B (en) Image recognition method, related device and equipment
EP3539051A1 (en) System and method of character recognition using fully convolutional neural networks
CN112329779A (en) Method and related device for improving certificate identification accuracy based on mask
KR20110051374A (en) Apparatus and method for processing data in terminal having touch screen
CN111340023A (en) Text recognition method and device, electronic equipment and storage medium
CN112115921A (en) True and false identification method and device and electronic equipment
Ayesh et al. A robust line segmentation algorithm for Arabic printed text with diacritics
Amin et al. Hand printed Arabic character recognition system
US10217020B1 (en) Method and system for identifying multiple strings in an image based upon positions of model strings relative to one another
Wicht et al. Camera-based sudoku recognition with deep belief network
Selvi et al. Recognition of Arabic numerals with grouping and ungrouping using back propagation neural network
Shi et al. Image enhancement for degraded binary document images
CN116030472A (en) Text coordinate determining method and device
Nasiri et al. A new binarization method for high accuracy handwritten digit recognition of slabs in steel companies
CN116229098A (en) Image recognition method based on mask contour tracking and related products
CN111488870A (en) Character recognition method and character recognition device
Bouchakour et al. Printed arabic characters recognition using combined features and cnn classifier
Omachi et al. Structure extraction from decorated characters using multiscale images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant