WO2021196013A1 - 单词识别方法、设备及存储介质 - Google Patents
单词识别方法、设备及存储介质 Download PDFInfo
- Publication number
- WO2021196013A1 WO2021196013A1 PCT/CN2020/082566 CN2020082566W WO2021196013A1 WO 2021196013 A1 WO2021196013 A1 WO 2021196013A1 CN 2020082566 W CN2020082566 W CN 2020082566W WO 2021196013 A1 WO2021196013 A1 WO 2021196013A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- recognized
- image
- character
- horizontal position
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 claims description 48
- 102100032202 Cornulin Human genes 0.000 claims description 33
- 101000920981 Homo sapiens Cornulin Proteins 0.000 claims description 33
- 238000011176 pooling Methods 0.000 claims description 18
- 238000013527 convolutional neural network Methods 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 238000013518 transcription Methods 0.000 claims description 7
- 230000035897 transcription Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 230000007787 long-term memory Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims description 6
- 230000006403 short-term memory Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 238000005452 bending Methods 0.000 abstract description 6
- 238000013519 translation Methods 0.000 description 27
- 230000006870 function Effects 0.000 description 20
- 238000010606 normalization Methods 0.000 description 13
- 230000004913 activation Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 10
- 238000012937 correction Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to the technical field of text recognition, in particular to a word recognition method, equipment and storage medium.
- the translation pen is one of the tools for word recognition when people read materials or books.
- Some translation pens (such as point translation pens) are relatively effective in recognizing clear and flat text content.
- the text is oblique, perspective, or curved, etc., which may make the text relatively obscure (for example, when it is used in a point translation scene), the accuracy of text recognition needs to be further improved.
- the present invention aims to solve one of the technical problems in the related art at least to a certain extent.
- the present invention proposes a word recognition method for this purpose, which can effectively improve the accuracy and recognition effect of word recognition, and improve the user experience.
- the invention also provides an electronic device.
- the present invention also provides a non-volatile computer-readable storage medium.
- An embodiment of the present invention provides a word recognition method, which includes the following steps:
- the words that are not in a horizontal position such as inclination and bending are stretched into a horizontal position, and then the word recognition is performed, thereby effectively improving the accuracy and the recognition effect of word recognition , To enhance the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
- Another embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor.
- the processor executes the program, the following words are realized Identification methods, including:
- the translation pen when the translation pen performs word recognition, it stretches the words that are not in a horizontal position such as tilt or bend into a horizontal position, and then performs word recognition, thereby effectively improving the accuracy and recognition effect of word recognition. Improve the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
- Another embodiment of the present invention provides a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the following word recognition method is implemented:
- the translation pen when the translation pen performs word recognition, it stretches words that are not in a horizontal position such as inclination and bend into a horizontal position, and then performs word recognition, thereby effectively improving word recognition
- FIG. 1 is a schematic flowchart of a word recognition method provided by an embodiment of the present invention
- FIG. 2 is a schematic diagram of collecting an image of a word to be recognized according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of a process of text correction as described in step 102 according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of determining the geometric position of a word to be recognized according to an embodiment of the present invention.
- FIG. 5 is a schematic diagram of selecting a base point according to an embodiment of the present invention.
- Fig. 6a is an example diagram of text correction provided by an embodiment of the present invention.
- Fig. 6b is an example diagram of another text correction provided by an embodiment of the present invention.
- FIG. 7 is a schematic flowchart of another word recognition method provided by an embodiment of the present invention.
- Figure 8 is a schematic structural diagram of a CRNN algorithm provided by related technologies
- FIG. 9 is a schematic structural diagram of an improved CRNN algorithm provided by an embodiment of the present invention.
- FIG. 1 is a schematic flowchart of a word recognition method provided by an embodiment of the present invention.
- the word recognition method of the embodiment of the present application can be used in translation scenarios, such as the recognition function of a translation pen.
- translation scenarios such as the recognition function of a translation pen.
- the following embodiments take the point translation pen as an example.
- the embodiment of the present invention provides a word recognition method.
- the translation pen performs word recognition, it stretches words that are not in a horizontal position, such as a tilt or a bend, into a horizontal position, and then performs word recognition, thereby effectively improving word recognition.
- the word recognition method includes the following steps:
- Step 101 Collect an image of a word to be recognized.
- the word recognition method provided in the embodiment of the present invention may be executed by a processor, and the processor may be set in an electronic device such as a terminal or a cloud server provided in the embodiment of the present invention.
- the specific implementation of the terminal includes translation devices such as point-translation pens, point-reading machines, etc., which are not limited by the present invention.
- the image of the word that needs to be recognized provided by the scanning head of the translation pen can be obtained.
- the user specifies the word to be translated through the tip of the translation pen.
- the relative position of the pen tip and the camera is fixed, so the relative position of the pen tip in the image is fixed. Therefore, in this embodiment, the following method can be used to collect the words to be recognized.
- the image of the word including:
- Step 102 Identify the edge range of each character of the word to be recognized from the word image to be recognized, determine the geometric position of the word to be recognized, and stretch the geometric position of the word to be recognized to a horizontal position.
- the word to be recognized is horizontal and clear in the image, it can be recognized directly; but if the word to be recognized is in the image tilt, perspective, bending, etc., in actual application, the word recognition may not be recognized. Accurate or identify low-efficiency issues. Therefore, in the embodiment of the present invention, when recognizing, words that are not in a horizontal position, such as oblique, curved, etc., are stretched to a horizontal position, so that the accuracy and recognition effect of word recognition can be improved.
- Step 103 Recognize the word to be recognized in the horizontal position.
- the translation pen when the translation pen performs word recognition, it stretches words that are not in a horizontal position such as inclination and bending into a horizontal position, and then performs word recognition, thereby effectively improving the accuracy of word recognition. And recognition effect, and enhance the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
- step 102 specifically includes:
- Step 201 Determine the edge range of each character in the word to be recognized using the maximum extreme value stable region algorithm.
- the embodiment of the present invention can also determine the edge range of each character through other algorithms, such as projection method, character detection algorithm based on machine learning or deep learning, etc.
- the maximum extreme value stable region algorithm is taken as an example, and it is not done. Specific restrictions.
- MSER Maximum Stable Extreme Regions
- the marginal range of the position where the characters exceeding the preset threshold for example, the characters whose number accounts for more than 80% in a word
- Step 202 Determine the initial circumscribed rectangle of the character according to the edge range of the character, use the center of the initial circumscribed rectangle as the center of rotation to determine the position of the main axis; continuously rotate and translate the main axis to determine the boundary, and obtain the smallest area by comparing the size of the area enclosed by the boundary Circumscribed rectangle; take any vertex of the word image to be recognized as the origin, take the horizontal direction as the x-axis, and the vertical direction as the y-axis to establish a coordinate system; determine the coordinates of the four vertices of the smallest circumscribed rectangle, according to the coordinates of the four vertices Determine the height and width of the smallest enclosing rectangle, and use the angle between one of the widths and the x-axis as the character orientation angle; determine the geometric position of the word to be recognized according to the orientation angles of all characters.
- a coordinate system is established with the upper left corner of the entire image as the origin, the horizontal as the x-axis, and the vertical as the y-axis.
- the coordinates of the four vertices of the smallest bounding rectangle can be obtained as (x 0 , y 0 ), (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), and then Determine the height and width of the smallest bounding rectangle.
- the orientation of the character can be determined; then, according to the angle of the orientation of the character, it can be determined whether the character is inclined or horizontal. According to the orientation of each character, that is It can be determined as a whole whether the word is inclined, curved, or horizontal, so that the geometric position of the word can be determined.
- Step 203 Taking the center point of the smallest enclosing rectangle as the center point of the character, connect the center points of all characters from left to right in order to obtain the text center line; sample multiple center points at equal intervals on the text center line, and sample them Take one point on each of the two heights of the character orientation corresponding to the point as the reference point; calculate the thin-plate spline interpolation transformation matrix according to the reference point, and perform bilinear interpolation sampling to stretch the geometric position of the word to be recognized into horizontal position.
- the selected two points are equivalent to taking a midpoint on the bottom edge and drawing a line perpendicular to the bottom edge. This line passes through the center of the rectangle and intersects the top edge. At one point. These two points are the points on the bottom and top sides of the rectangle. All sampling points are taken in this way to obtain the reference point. Of course, more points can also be taken on the line.
- the embodiment of the present invention takes two points as an example, which is not specifically limited here.
- step 201 illustratively, the MSER algorithm is first used to determine the marginal range of each character (each letter is a character) in "statements".
- step 202 exemplarily, the center point and character orientation of each character of the word text statements are determined according to the method of step 202.
- step 203 exemplarily, the center points of multiple characters are sequentially connected to obtain the center line of the word text statement. Sampling 5 points at equal intervals on the center line, and take two points along the character direction at a distance of half the height of the character for each of these 5 points (that is, the midpoint of the bottom and top sides of the rectangular box where each point is located) , The 10 points taken out along the character direction for the equally spaced sampling of 5 points are the reference points. Finally, after the reference points are obtained, the thin-plate spline interpolation transformation matrix can be calculated, and bilinear interpolation sampling is performed to obtain the corrected "statements" image.
- the embodiment of the present invention can effectively stretch the words of oblique, perspective, curved, etc. to the level, and at the same time make the image smoother during the stretching process, and the text is not easy to appear due to the stretching.
- the distortion and deformation of the extension can effectively stretch the words of oblique, perspective, curved, etc. to the level, and at the same time make the image smoother during the stretching process, and the text is not easy to appear due to the stretching.
- the distortion and deformation of the extension can effectively stretch the words of oblique, perspective, curved, etc.
- the embodiment of the present invention can effectively stretch the oblique, perspective, and curved word text to the level, and at the same time make the image smoother during the stretching process, and it is unlikely that the text will be distorted due to stretching. Circumstances, improve the reliability of word text correction.
- this embodiment provides another word recognition method to illustrate how to perform word recognition after image correction.
- This embodiment and the previous embodiment have their own emphasis on the description content. The steps that are not described here can be referred to each other.
- the word recognition method includes the following steps:
- Step 701 Collect an image of a word to be recognized.
- Step 702 Identify the position of each character of the word to be recognized from the word image to be recognized, determine the geometric attributes of the text, and stretch the word image to be recognized into a horizontal image according to the geometric attributes.
- step 701 and step 702 please refer to the explanation of step 101 and step 102. To avoid redundancy, details are not described here.
- Step 703 Recognize the word to be recognized in the horizontal position through the CRNN algorithm.
- the embodiment of the present invention can effectively improve the calculation efficiency of the algorithm, thereby improving the efficiency and accuracy of word recognition, which will be described in detail below:
- the existing CRNN algorithm structure includes: convolutional neural network layer, recurrent neural network layer and transcription layer, specifically:
- the convolutional neural network layer is composed of the first to seventh convolutional layers (Convolution1-7 in the figure), the maximum pooling layer (Maxpooling) and the custom network layer (Map-to-Sequence), mainly through the convolutional layer Image feature extraction.
- the custom network layer is the "bridge" between the convolutional neural network layer and the recurrent neural network layer.
- the cyclic neural network layer which includes two bidirectional long and short-term memory network layers, is used to predict the feature vector sequence of the word to be recognized to obtain the prediction result.
- two bidirectional LSTM layers are used to learn the semantic relationship between the text sequences.
- the transcription layer is used to decode the prediction result into characters, and remove the space characters and repeated characters to obtain the recognition result of the word to be recognized.
- the embodiment of the present invention improves on the existing CRNN algorithm structure as shown in FIG. 8 and replaces the second to sixth convolutional layers with the designed convolution block.
- the improved algorithm is shown in FIG.
- the difference between CRNN algorithm and convolutional neural network layer structure is different.
- the cyclic neural network layer and the transcription layer have the same structure as the existing CRNN algorithm. To avoid redundancy, we will not repeat them here.
- the convolutional neural network layer is composed of a convolution layer (Convolution), a convolution block (Conv Bkock), and a maximum pooling layer (Maxpooling).
- the convolutional neural network layer is used to extract features of the image to be recognized.
- the convolutional layer includes convolution (Conv), batch normalization (Batch Normalization), and activation function (Relu).
- the convolutional neural network layer includes the first convolution layer, the first convolution block, the first maximum pooling layer, the second convolution block, the second maximum pooling layer, and the third convolution block in logical order.
- the convolution block includes the first convolution sublayer, the first batch of normalization processing and activation function sublayers, the depth separable convolution sublayer based on hole convolution, and the second batch of normalization processing in a logical order.
- the activation function sub-layer, the second convolution sub-layer and the third batch of normalization processing and the activation function sub-layer includes the first convolution sublayer, the first batch of normalization processing and activation function sublayers, the depth separable convolution sublayer based on hole convolution, and the second batch of normalization processing in a logical order.
- the existing CRNN algorithm only uses the convolutional layer to extract the feature vector sequence of the word to be recognized; while the embodiment of the present invention performs image feature extraction through the convolutional layer and convolution block.
- the embodiment of the present invention effectively increases the perception field of the algorithm, improves the algorithm’s ability to judge similar characters, and improves the accuracy of the algorithm’s recognition, reduces the amount of calculation, and speeds up the calculation efficiency, so it is more suitable for translation pens.
- the efficiency and accuracy of word recognition can be improved when the word to be recognized is recognized through the improved CRNN algorithm.
- the method further includes: scaling the image of the word to be recognized in the horizontal position to a preset height and a preset width.
- the preset height and preset width can be set according to the actual situation and are not limited here.
- the word recognition method of the embodiment of the present invention improves the accuracy and speed of word recognition by improving the CRNN algorithm for word recognition, thereby reducing the time required for recognition and improving the efficiency of recognition.
- the present invention also provides an electronic device.
- the electronic device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
- the processor executes the program, the following word recognition method is implemented, including:
- identifying the edge range of each character of the word to be recognized from the word image to be recognized includes:
- the maximum extreme value stable region algorithm is used to determine the edge range of each character in the word to be recognized.
- determining the geometric position of the word to be recognized includes:
- the geometric position of the word to be recognized is stretched to a horizontal position, which specifically includes:
- the thin plate spline interpolation transformation matrix is calculated according to the reference point, and bilinear interpolation sampling is performed to stretch the geometric position of the word to be recognized into a horizontal position.
- the CRNN algorithm recognizes the word to be recognized in the horizontal position.
- the method further includes:
- the image of the word to be recognized in the horizontal position is scaled to a preset height and a preset width.
- the CRNN algorithm includes:
- the convolutional neural network layer includes the first to second convolutional layers, the first to fifth convolutional blocks, the first to fourth maximum pooling layers and the custom network layer, through the convolutional layer and The convolution block extracts the feature vector sequence of the word to be recognized in the scaled image.
- the convolutional neural network layer includes the first convolutional layer, the first convolutional block, the first maximum pooling layer, the second convolutional block, the second maximum pooling layer, and the third volume in a logical order.
- Build block third largest pooling layer, fourth convolution block, fourth largest pooling layer, fifth convolution block, second convolution layer and custom network layer.
- each convolution block is composed of first to second convolution sublayers, first to third batches of normalization processing and activation function sublayers, and depth separable convolution sublayers based on hole convolution.
- the convolution block includes a first convolution sublayer, a first batch of normalization processing and activation function sublayers, a depth separable convolution sublayer based on hole convolution, and a second batch of normalization sublayers in a logical order.
- the processing and activation function sublayer, the second convolution sublayer and the third batch of normalization processing and activation function sublayers are included in the convolution block.
- the CRNN algorithm also includes:
- the cyclic neural network layer which includes two bidirectional long and short-term memory network layers, is used to predict the feature vector sequence of the word to be recognized to obtain the prediction result.
- the CRNN algorithm also includes:
- the transcription layer is used to decode the prediction result into characters, and remove the space characters and repeated characters to obtain the recognition result of the word to be recognized.
- collecting the image of the word to be recognized specifically includes:
- the translation pen when the translation pen performs word recognition, it stretches the words that are not in a horizontal position, such as inclination and bending, into a horizontal position, and then performs word recognition, thereby effectively improving the accuracy and recognition effect of word recognition. Improve the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
- the present invention also provides a non-volatile computer-readable storage medium.
- the non-volatile computer-readable storage medium has a computer program stored thereon, and when the program is executed by the processor, the following word recognition method is implemented:
- identifying the edge range of each character of the word to be recognized from the word image to be recognized includes:
- the maximum extreme value stable region algorithm is used to determine the edge range of each character in the word to be recognized.
- determining the geometric position of the word to be recognized includes:
- the geometric position of the word to be recognized is stretched to a horizontal position, which specifically includes:
- the thin plate spline interpolation transformation matrix is calculated according to the reference point, and bilinear interpolation sampling is performed to stretch the geometric position of the word to be recognized into a horizontal position.
- the CRNN algorithm recognizes the word to be recognized in the horizontal position.
- the method further includes:
- the image of the word to be recognized in the horizontal position is scaled to a preset height and a preset width.
- the CRNN algorithm includes:
- the convolutional neural network layer includes the first to second convolutional layers, the first to fifth convolutional blocks, the first to fourth maximum pooling layers and the custom network layer, through the convolutional layer and The convolution block extracts the feature vector sequence of the word to be recognized in the scaled image.
- the convolutional neural network layer includes the first convolutional layer, the first convolutional block, the first maximum pooling layer, the second convolutional block, the second maximum pooling layer, and the third volume in a logical order.
- Build block third largest pooling layer, fourth convolution block, fourth largest pooling layer, fifth convolution block, second convolution layer and custom network layer.
- each convolution block is composed of first to second convolution sublayers, first to third batches of normalization processing and activation function sublayers, and depth separable convolution sublayers based on hole convolution.
- the convolution block includes a first convolution sublayer, a first batch of normalization processing and activation function sublayers, a depth separable convolution sublayer based on hole convolution, and a second batch of normalization sublayers in a logical order.
- the processing and activation function sublayer, the second convolution sublayer and the third batch of normalization processing and activation function sublayers are included in the convolution block.
- the CRNN algorithm also includes:
- the cyclic neural network layer which includes two bidirectional long and short-term memory network layers, is used to predict the feature vector sequence of the word to be recognized to obtain the prediction result.
- the CRNN algorithm also includes:
- the transcription layer is used to decode the prediction result into characters, and remove the space characters and repeated characters to obtain the recognition result of the word to be recognized.
- collecting the image of the word to be recognized specifically includes:
- the translation pen when the translation pen performs word recognition, it stretches words that are not in a horizontal position such as inclination and bend into a horizontal position, and then performs word recognition, thereby effectively improving word recognition
- first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present invention, “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
- a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transmit a program for use by an instruction execution system, device, or device or in combination with these instruction execution systems, devices, or devices.
- computer readable media include the following: electrical connections (electronic devices) with one or more wiring, portable computer disk cases (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable and editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
- the computer-readable medium may even be paper or other suitable medium on which the program can be printed, because it can be used, for example, by optically scanning the paper or other medium, followed by editing, interpretation, or other suitable media if necessary. The program is processed in a manner to obtain the program electronically, and then stored in the computer memory.
- each part of the present invention can be implemented by hardware, software, firmware or a combination thereof.
- multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system.
- Discrete logic gate circuits with logic functions for data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate array (PGA), field programmable gate array (FPGA), etc.
- a person of ordinary skill in the art can understand that all or part of the steps carried in the method of the foregoing embodiments can be implemented by a program instructing relevant hardware to complete.
- the program can be stored in a computer-readable storage medium. When executed, it includes one of the steps of the method embodiment or a combination thereof.
- the functional units in the various embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module.
- the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
- the aforementioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Discrimination (AREA)
Abstract
Description
Claims (21)
- 一种单词识别方法,其特征在于,包括:采集待识别的单词图像;从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及识别处于水平位置的待识别的单词。
- 根据权利要求1所述的方法,其特征在于,所述从所述待识别的单词图像中识别待识别的单词的每个字符边缘范围,具体包括:使用最大极值稳定区域算法确定所述待识别的单词中每个字符的边缘范围。
- 根据权利要求1或2所述的方法,其特征在于,所述确定所述待识别的单词的几何位置,具体包括:根据字符的边缘范围确定字符的初始外接矩形,以所述初始外接矩形中心点作旋转中心点,确定主轴位置;不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;以所述待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;确定最小外接矩形四个顶点在所述坐标系的坐标,根据四个顶点的坐标确定所述最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;根据所有字符朝向角度确定所述待识别的单词的几何位置。
- 根据权利要求3所述的方法,其特征在于,所述将所述待识别的单词的几何位置拉伸成水平位置,具体包括:以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;在所述文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;根据所述基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将所述待识别的单词的几何位置拉伸成水平位置。
- 根据权利要求1所述的方法,其特征在于,还包括:通过CRNN算法识别处于水平位置的待识别的单词。
- 根据权利要求5所述的方法,其特征在于,在通过CRNN算法识别水平位置的待 识别的单词之前,还包括:将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。
- 根据权利要求6所述的方法,其特征在于,所述CRNN算法包括:卷积神经网络层,所述卷积神经网络层包括第一至第二卷积层、第一至第五卷积块、第一至第四最大池化层和自定义网络层,通过所述第一至第二卷积层和所述第一至第五卷积块提取缩放后图像的待识别的单词的特征向量序列,其中,所述特征向量序列用于标识所述单词。
- 根据权利要求7所述的方法,其特征在于,所述CRNN算法还包括:循环神经网络层,所述循环神经网络层包括两个双向长短期记忆网络层,所述双向长短期记忆网络层用于对所述待识别的单词的特征向量序列进行预测,以得到预测结果。
- 根据权利要求8所述的方法,其特征在于,所述CRNN算法还包括:转录层,用于将所述循环神经网络层生成的所述预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
- 根据权利要求1所述的方法,其特征在于,所述采集待识别的单词图像,具体包括:以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区域,作为虚拟笔尖;分别计算所述固定大小的矩形区域和每个被检测出的文本框的重叠面积;找到重叠面积占所述固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
- 一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,以实现如下的单词识别方法,包括:采集待识别的单词图像;从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及识别处于水平位置的待识别的单词。
- 根据权利要求11所述的电子设备,其特征在于,所述从所述待识别的单词图像中识别待识别的单词的每个字符边缘范围,具体包括:使用最大极值稳定区域算法确定所述待识别的单词中每个字符的边缘范围。
- 根据权利要求11或12所述的电子设备,其特征在于,所述确定所述待识别的单词的几何位置,具体包括:根据字符的边缘范围确定字符的初始外接矩形,以所述初始外接矩形中心点作旋转中 心点,确定主轴位置;不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;以所述待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;确定最小外接矩形四个顶点在所述坐标系的坐标,根据四个顶点的坐标确定所述最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;根据所有字符朝向角度确定所述待识别的单词的几何位置。
- 根据权利要求13所述的电子设备,其特征在于,所述将所述待识别的单词的几何位置拉伸成水平位置,具体包括:以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;在所述文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;根据所述基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将所述待识别的单词的几何位置拉伸成水平位置。
- 根据权利要求13所述的电子设备,其特征在于,还包括:通过CRNN算法识别处于水平位置的待识别的单词。
- 根据如权利要求15所述的电子设备,其特征在于,在通过CRNN算法识别水平位置的待识别的单词之前,还包括:将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。
- 根据权利要求15所述的电子设备,其特征在于,所述CRNN算法包括:卷积神经网络层,所述卷积神经网络层包括第一至第二卷积层、第一至第五卷积块、第一至第四最大池化层和自定义网络层,通过卷积层和卷积块提取缩放后图像的待识别的单词的特征向量序列,其中,所述特征向量序列用于标识所述单词。
- 根据权利要求17所述的电子设备,其特征在于,所述CRNN算法还包括:循环神经网络层,所述循环神经网络层包括两个双向长短期记忆网络层,用于对所述待识别的单词的特征向量序列进行预测,以得到预测结果。
- 根据权利要求18所述的电子设备,其特征在于,所述CRNN算法还包括:转录层,用于将所述循环神经网络层的所述预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
- 根据权利要求11所述的电子设备,所述采集待识别的单词图像,具体包括:以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区域,作为虚拟笔尖;分别计算所述固定大小的矩形区域和每个被检测出的文本框的重叠面积;找到重叠面积占所述固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时,实现如权利要求1-10任一项所述的单词识别方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/082566 WO2021196013A1 (zh) | 2020-03-31 | 2020-03-31 | 单词识别方法、设备及存储介质 |
US17/263,418 US11651604B2 (en) | 2020-03-31 | 2020-03-31 | Word recognition method, apparatus and storage medium |
CN202080000447.2A CN113748429A (zh) | 2020-03-31 | 2020-03-31 | 单词识别方法、设备及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/082566 WO2021196013A1 (zh) | 2020-03-31 | 2020-03-31 | 单词识别方法、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021196013A1 true WO2021196013A1 (zh) | 2021-10-07 |
Family
ID=77926983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/082566 WO2021196013A1 (zh) | 2020-03-31 | 2020-03-31 | 单词识别方法、设备及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11651604B2 (zh) |
CN (1) | CN113748429A (zh) |
WO (1) | WO2021196013A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902046A (zh) * | 2021-12-10 | 2022-01-07 | 北京惠朗时代科技有限公司 | 一种特效字体识别方法及装置 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783760B (zh) * | 2020-06-30 | 2023-08-08 | 北京百度网讯科技有限公司 | 文字识别的方法、装置、电子设备及计算机可读存储介质 |
CN115527215A (zh) * | 2022-10-10 | 2022-12-27 | 杭州睿胜软件有限公司 | 包含文本的图像处理方法、系统及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140161365A1 (en) * | 2012-12-12 | 2014-06-12 | Qualcomm Incorporated | Method of Perspective Correction For Devanagari Text |
CN104239861A (zh) * | 2014-09-10 | 2014-12-24 | 深圳市易讯天空网络技术有限公司 | 卷曲文本图像预处理方法和彩票扫描识别方法 |
CN106951896A (zh) * | 2017-02-22 | 2017-07-14 | 武汉黄丫智能科技发展有限公司 | 一种车牌图像倾斜校正方法 |
CN107516096A (zh) * | 2016-06-15 | 2017-12-26 | 阿里巴巴集团控股有限公司 | 一种字符识别方法及装置 |
CN108647681A (zh) * | 2018-05-08 | 2018-10-12 | 重庆邮电大学 | 一种带有文本方向校正的英文文本检测方法 |
CN108985137A (zh) * | 2017-06-02 | 2018-12-11 | 杭州海康威视数字技术股份有限公司 | 一种车牌识别方法、装置及系统 |
CN110321755A (zh) * | 2018-03-28 | 2019-10-11 | 中移(苏州)软件技术有限公司 | 一种识别方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI319547B (en) * | 2006-12-01 | 2010-01-11 | Compal Electronics Inc | Method for generating typographical line |
US9977976B2 (en) * | 2016-06-29 | 2018-05-22 | Konica Minolta Laboratory U.S.A., Inc. | Path score calculating method for intelligent character recognition |
US10783400B2 (en) * | 2018-04-06 | 2020-09-22 | Dropbox, Inc. | Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks |
-
2020
- 2020-03-31 US US17/263,418 patent/US11651604B2/en active Active
- 2020-03-31 CN CN202080000447.2A patent/CN113748429A/zh active Pending
- 2020-03-31 WO PCT/CN2020/082566 patent/WO2021196013A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140161365A1 (en) * | 2012-12-12 | 2014-06-12 | Qualcomm Incorporated | Method of Perspective Correction For Devanagari Text |
CN104239861A (zh) * | 2014-09-10 | 2014-12-24 | 深圳市易讯天空网络技术有限公司 | 卷曲文本图像预处理方法和彩票扫描识别方法 |
CN107516096A (zh) * | 2016-06-15 | 2017-12-26 | 阿里巴巴集团控股有限公司 | 一种字符识别方法及装置 |
CN106951896A (zh) * | 2017-02-22 | 2017-07-14 | 武汉黄丫智能科技发展有限公司 | 一种车牌图像倾斜校正方法 |
CN108985137A (zh) * | 2017-06-02 | 2018-12-11 | 杭州海康威视数字技术股份有限公司 | 一种车牌识别方法、装置及系统 |
CN110321755A (zh) * | 2018-03-28 | 2019-10-11 | 中移(苏州)软件技术有限公司 | 一种识别方法及装置 |
CN108647681A (zh) * | 2018-05-08 | 2018-10-12 | 重庆邮电大学 | 一种带有文本方向校正的英文文本检测方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902046A (zh) * | 2021-12-10 | 2022-01-07 | 北京惠朗时代科技有限公司 | 一种特效字体识别方法及装置 |
CN113902046B (zh) * | 2021-12-10 | 2022-02-18 | 北京惠朗时代科技有限公司 | 一种特效字体识别方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN113748429A (zh) | 2021-12-03 |
US11651604B2 (en) | 2023-05-16 |
US20220036112A1 (en) | 2022-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10983596B2 (en) | Gesture recognition method, device, electronic device, and storage medium | |
WO2019128646A1 (zh) | 人脸检测方法、卷积神经网络参数的训练方法、装置及介质 | |
CN110232311B (zh) | 手部图像的分割方法、装置及计算机设备 | |
WO2021196013A1 (zh) | 单词识别方法、设备及存储介质 | |
CN110147786B (zh) | 用于检测图像中的文本区域的方法、装置、设备以及介质 | |
US7302099B2 (en) | Stroke segmentation for template-based cursive handwriting recognition | |
US7369702B2 (en) | Template-based cursive handwriting recognition | |
WO2021238446A1 (zh) | 文本识别方法、设备及存储介质 | |
US10621759B2 (en) | Beautifying freeform drawings using transformation adjustments | |
CA2481828C (en) | System and method for detecting a hand-drawn object in ink input | |
JP2933801B2 (ja) | 文字の切り出し方法及びその装置 | |
WO2020228187A1 (zh) | 边缘检测方法、装置、电子设备和计算机可读存储介质 | |
CN110222703B (zh) | 图像轮廓识别方法、装置、设备和介质 | |
JP2018524734A (ja) | 複数のオブジェクトの入力を認識するためのシステムならびにそのための方法および製品 | |
WO2019041424A1 (zh) | 验证码识别方法、装置、计算机设备及计算机存储介质 | |
CN110688947A (zh) | 一种同步实现人脸三维点云特征点定位和人脸分割的方法 | |
CN107545223B (zh) | 图像识别方法及电子设备 | |
JP6877446B2 (ja) | 多重オブジェクト構造を認識するためのシステムおよび方法 | |
US10579868B2 (en) | System and method for recognition of objects from ink elements | |
US11823474B2 (en) | Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium | |
WO2022222096A1 (zh) | 手绘图形识别方法、装置和系统,以及计算机可读存储介质 | |
CN111492407A (zh) | 用于绘图美化的系统和方法 | |
US9418281B2 (en) | Segmentation of overwritten online handwriting input | |
CN111753812A (zh) | 文本识别方法及设备 | |
CN113177542A (zh) | 识别印章文字的方法、装置、设备和计算机可读介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20928790 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20928790 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20928790 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20928790 Country of ref document: EP Kind code of ref document: A1 |