WO2021196013A1 - 单词识别方法、设备及存储介质 - Google Patents

单词识别方法、设备及存储介质 Download PDF

Info

Publication number
WO2021196013A1
WO2021196013A1 PCT/CN2020/082566 CN2020082566W WO2021196013A1 WO 2021196013 A1 WO2021196013 A1 WO 2021196013A1 CN 2020082566 W CN2020082566 W CN 2020082566W WO 2021196013 A1 WO2021196013 A1 WO 2021196013A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
recognized
image
character
horizontal position
Prior art date
Application number
PCT/CN2020/082566
Other languages
English (en)
French (fr)
Inventor
黄光伟
李月
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2020/082566 priority Critical patent/WO2021196013A1/zh
Priority to US17/263,418 priority patent/US11651604B2/en
Priority to CN202080000447.2A priority patent/CN113748429A/zh
Publication of WO2021196013A1 publication Critical patent/WO2021196013A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to the technical field of text recognition, in particular to a word recognition method, equipment and storage medium.
  • the translation pen is one of the tools for word recognition when people read materials or books.
  • Some translation pens (such as point translation pens) are relatively effective in recognizing clear and flat text content.
  • the text is oblique, perspective, or curved, etc., which may make the text relatively obscure (for example, when it is used in a point translation scene), the accuracy of text recognition needs to be further improved.
  • the present invention aims to solve one of the technical problems in the related art at least to a certain extent.
  • the present invention proposes a word recognition method for this purpose, which can effectively improve the accuracy and recognition effect of word recognition, and improve the user experience.
  • the invention also provides an electronic device.
  • the present invention also provides a non-volatile computer-readable storage medium.
  • An embodiment of the present invention provides a word recognition method, which includes the following steps:
  • the words that are not in a horizontal position such as inclination and bending are stretched into a horizontal position, and then the word recognition is performed, thereby effectively improving the accuracy and the recognition effect of word recognition , To enhance the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
  • Another embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor.
  • the processor executes the program, the following words are realized Identification methods, including:
  • the translation pen when the translation pen performs word recognition, it stretches the words that are not in a horizontal position such as tilt or bend into a horizontal position, and then performs word recognition, thereby effectively improving the accuracy and recognition effect of word recognition. Improve the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
  • Another embodiment of the present invention provides a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the following word recognition method is implemented:
  • the translation pen when the translation pen performs word recognition, it stretches words that are not in a horizontal position such as inclination and bend into a horizontal position, and then performs word recognition, thereby effectively improving word recognition
  • FIG. 1 is a schematic flowchart of a word recognition method provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of collecting an image of a word to be recognized according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a process of text correction as described in step 102 according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of determining the geometric position of a word to be recognized according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of selecting a base point according to an embodiment of the present invention.
  • Fig. 6a is an example diagram of text correction provided by an embodiment of the present invention.
  • Fig. 6b is an example diagram of another text correction provided by an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of another word recognition method provided by an embodiment of the present invention.
  • Figure 8 is a schematic structural diagram of a CRNN algorithm provided by related technologies
  • FIG. 9 is a schematic structural diagram of an improved CRNN algorithm provided by an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a word recognition method provided by an embodiment of the present invention.
  • the word recognition method of the embodiment of the present application can be used in translation scenarios, such as the recognition function of a translation pen.
  • translation scenarios such as the recognition function of a translation pen.
  • the following embodiments take the point translation pen as an example.
  • the embodiment of the present invention provides a word recognition method.
  • the translation pen performs word recognition, it stretches words that are not in a horizontal position, such as a tilt or a bend, into a horizontal position, and then performs word recognition, thereby effectively improving word recognition.
  • the word recognition method includes the following steps:
  • Step 101 Collect an image of a word to be recognized.
  • the word recognition method provided in the embodiment of the present invention may be executed by a processor, and the processor may be set in an electronic device such as a terminal or a cloud server provided in the embodiment of the present invention.
  • the specific implementation of the terminal includes translation devices such as point-translation pens, point-reading machines, etc., which are not limited by the present invention.
  • the image of the word that needs to be recognized provided by the scanning head of the translation pen can be obtained.
  • the user specifies the word to be translated through the tip of the translation pen.
  • the relative position of the pen tip and the camera is fixed, so the relative position of the pen tip in the image is fixed. Therefore, in this embodiment, the following method can be used to collect the words to be recognized.
  • the image of the word including:
  • Step 102 Identify the edge range of each character of the word to be recognized from the word image to be recognized, determine the geometric position of the word to be recognized, and stretch the geometric position of the word to be recognized to a horizontal position.
  • the word to be recognized is horizontal and clear in the image, it can be recognized directly; but if the word to be recognized is in the image tilt, perspective, bending, etc., in actual application, the word recognition may not be recognized. Accurate or identify low-efficiency issues. Therefore, in the embodiment of the present invention, when recognizing, words that are not in a horizontal position, such as oblique, curved, etc., are stretched to a horizontal position, so that the accuracy and recognition effect of word recognition can be improved.
  • Step 103 Recognize the word to be recognized in the horizontal position.
  • the translation pen when the translation pen performs word recognition, it stretches words that are not in a horizontal position such as inclination and bending into a horizontal position, and then performs word recognition, thereby effectively improving the accuracy of word recognition. And recognition effect, and enhance the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
  • step 102 specifically includes:
  • Step 201 Determine the edge range of each character in the word to be recognized using the maximum extreme value stable region algorithm.
  • the embodiment of the present invention can also determine the edge range of each character through other algorithms, such as projection method, character detection algorithm based on machine learning or deep learning, etc.
  • the maximum extreme value stable region algorithm is taken as an example, and it is not done. Specific restrictions.
  • MSER Maximum Stable Extreme Regions
  • the marginal range of the position where the characters exceeding the preset threshold for example, the characters whose number accounts for more than 80% in a word
  • Step 202 Determine the initial circumscribed rectangle of the character according to the edge range of the character, use the center of the initial circumscribed rectangle as the center of rotation to determine the position of the main axis; continuously rotate and translate the main axis to determine the boundary, and obtain the smallest area by comparing the size of the area enclosed by the boundary Circumscribed rectangle; take any vertex of the word image to be recognized as the origin, take the horizontal direction as the x-axis, and the vertical direction as the y-axis to establish a coordinate system; determine the coordinates of the four vertices of the smallest circumscribed rectangle, according to the coordinates of the four vertices Determine the height and width of the smallest enclosing rectangle, and use the angle between one of the widths and the x-axis as the character orientation angle; determine the geometric position of the word to be recognized according to the orientation angles of all characters.
  • a coordinate system is established with the upper left corner of the entire image as the origin, the horizontal as the x-axis, and the vertical as the y-axis.
  • the coordinates of the four vertices of the smallest bounding rectangle can be obtained as (x 0 , y 0 ), (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), and then Determine the height and width of the smallest bounding rectangle.
  • the orientation of the character can be determined; then, according to the angle of the orientation of the character, it can be determined whether the character is inclined or horizontal. According to the orientation of each character, that is It can be determined as a whole whether the word is inclined, curved, or horizontal, so that the geometric position of the word can be determined.
  • Step 203 Taking the center point of the smallest enclosing rectangle as the center point of the character, connect the center points of all characters from left to right in order to obtain the text center line; sample multiple center points at equal intervals on the text center line, and sample them Take one point on each of the two heights of the character orientation corresponding to the point as the reference point; calculate the thin-plate spline interpolation transformation matrix according to the reference point, and perform bilinear interpolation sampling to stretch the geometric position of the word to be recognized into horizontal position.
  • the selected two points are equivalent to taking a midpoint on the bottom edge and drawing a line perpendicular to the bottom edge. This line passes through the center of the rectangle and intersects the top edge. At one point. These two points are the points on the bottom and top sides of the rectangle. All sampling points are taken in this way to obtain the reference point. Of course, more points can also be taken on the line.
  • the embodiment of the present invention takes two points as an example, which is not specifically limited here.
  • step 201 illustratively, the MSER algorithm is first used to determine the marginal range of each character (each letter is a character) in "statements".
  • step 202 exemplarily, the center point and character orientation of each character of the word text statements are determined according to the method of step 202.
  • step 203 exemplarily, the center points of multiple characters are sequentially connected to obtain the center line of the word text statement. Sampling 5 points at equal intervals on the center line, and take two points along the character direction at a distance of half the height of the character for each of these 5 points (that is, the midpoint of the bottom and top sides of the rectangular box where each point is located) , The 10 points taken out along the character direction for the equally spaced sampling of 5 points are the reference points. Finally, after the reference points are obtained, the thin-plate spline interpolation transformation matrix can be calculated, and bilinear interpolation sampling is performed to obtain the corrected "statements" image.
  • the embodiment of the present invention can effectively stretch the words of oblique, perspective, curved, etc. to the level, and at the same time make the image smoother during the stretching process, and the text is not easy to appear due to the stretching.
  • the distortion and deformation of the extension can effectively stretch the words of oblique, perspective, curved, etc. to the level, and at the same time make the image smoother during the stretching process, and the text is not easy to appear due to the stretching.
  • the distortion and deformation of the extension can effectively stretch the words of oblique, perspective, curved, etc.
  • the embodiment of the present invention can effectively stretch the oblique, perspective, and curved word text to the level, and at the same time make the image smoother during the stretching process, and it is unlikely that the text will be distorted due to stretching. Circumstances, improve the reliability of word text correction.
  • this embodiment provides another word recognition method to illustrate how to perform word recognition after image correction.
  • This embodiment and the previous embodiment have their own emphasis on the description content. The steps that are not described here can be referred to each other.
  • the word recognition method includes the following steps:
  • Step 701 Collect an image of a word to be recognized.
  • Step 702 Identify the position of each character of the word to be recognized from the word image to be recognized, determine the geometric attributes of the text, and stretch the word image to be recognized into a horizontal image according to the geometric attributes.
  • step 701 and step 702 please refer to the explanation of step 101 and step 102. To avoid redundancy, details are not described here.
  • Step 703 Recognize the word to be recognized in the horizontal position through the CRNN algorithm.
  • the embodiment of the present invention can effectively improve the calculation efficiency of the algorithm, thereby improving the efficiency and accuracy of word recognition, which will be described in detail below:
  • the existing CRNN algorithm structure includes: convolutional neural network layer, recurrent neural network layer and transcription layer, specifically:
  • the convolutional neural network layer is composed of the first to seventh convolutional layers (Convolution1-7 in the figure), the maximum pooling layer (Maxpooling) and the custom network layer (Map-to-Sequence), mainly through the convolutional layer Image feature extraction.
  • the custom network layer is the "bridge" between the convolutional neural network layer and the recurrent neural network layer.
  • the cyclic neural network layer which includes two bidirectional long and short-term memory network layers, is used to predict the feature vector sequence of the word to be recognized to obtain the prediction result.
  • two bidirectional LSTM layers are used to learn the semantic relationship between the text sequences.
  • the transcription layer is used to decode the prediction result into characters, and remove the space characters and repeated characters to obtain the recognition result of the word to be recognized.
  • the embodiment of the present invention improves on the existing CRNN algorithm structure as shown in FIG. 8 and replaces the second to sixth convolutional layers with the designed convolution block.
  • the improved algorithm is shown in FIG.
  • the difference between CRNN algorithm and convolutional neural network layer structure is different.
  • the cyclic neural network layer and the transcription layer have the same structure as the existing CRNN algorithm. To avoid redundancy, we will not repeat them here.
  • the convolutional neural network layer is composed of a convolution layer (Convolution), a convolution block (Conv Bkock), and a maximum pooling layer (Maxpooling).
  • the convolutional neural network layer is used to extract features of the image to be recognized.
  • the convolutional layer includes convolution (Conv), batch normalization (Batch Normalization), and activation function (Relu).
  • the convolutional neural network layer includes the first convolution layer, the first convolution block, the first maximum pooling layer, the second convolution block, the second maximum pooling layer, and the third convolution block in logical order.
  • the convolution block includes the first convolution sublayer, the first batch of normalization processing and activation function sublayers, the depth separable convolution sublayer based on hole convolution, and the second batch of normalization processing in a logical order.
  • the activation function sub-layer, the second convolution sub-layer and the third batch of normalization processing and the activation function sub-layer includes the first convolution sublayer, the first batch of normalization processing and activation function sublayers, the depth separable convolution sublayer based on hole convolution, and the second batch of normalization processing in a logical order.
  • the existing CRNN algorithm only uses the convolutional layer to extract the feature vector sequence of the word to be recognized; while the embodiment of the present invention performs image feature extraction through the convolutional layer and convolution block.
  • the embodiment of the present invention effectively increases the perception field of the algorithm, improves the algorithm’s ability to judge similar characters, and improves the accuracy of the algorithm’s recognition, reduces the amount of calculation, and speeds up the calculation efficiency, so it is more suitable for translation pens.
  • the efficiency and accuracy of word recognition can be improved when the word to be recognized is recognized through the improved CRNN algorithm.
  • the method further includes: scaling the image of the word to be recognized in the horizontal position to a preset height and a preset width.
  • the preset height and preset width can be set according to the actual situation and are not limited here.
  • the word recognition method of the embodiment of the present invention improves the accuracy and speed of word recognition by improving the CRNN algorithm for word recognition, thereby reducing the time required for recognition and improving the efficiency of recognition.
  • the present invention also provides an electronic device.
  • the electronic device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
  • the processor executes the program, the following word recognition method is implemented, including:
  • identifying the edge range of each character of the word to be recognized from the word image to be recognized includes:
  • the maximum extreme value stable region algorithm is used to determine the edge range of each character in the word to be recognized.
  • determining the geometric position of the word to be recognized includes:
  • the geometric position of the word to be recognized is stretched to a horizontal position, which specifically includes:
  • the thin plate spline interpolation transformation matrix is calculated according to the reference point, and bilinear interpolation sampling is performed to stretch the geometric position of the word to be recognized into a horizontal position.
  • the CRNN algorithm recognizes the word to be recognized in the horizontal position.
  • the method further includes:
  • the image of the word to be recognized in the horizontal position is scaled to a preset height and a preset width.
  • the CRNN algorithm includes:
  • the convolutional neural network layer includes the first to second convolutional layers, the first to fifth convolutional blocks, the first to fourth maximum pooling layers and the custom network layer, through the convolutional layer and The convolution block extracts the feature vector sequence of the word to be recognized in the scaled image.
  • the convolutional neural network layer includes the first convolutional layer, the first convolutional block, the first maximum pooling layer, the second convolutional block, the second maximum pooling layer, and the third volume in a logical order.
  • Build block third largest pooling layer, fourth convolution block, fourth largest pooling layer, fifth convolution block, second convolution layer and custom network layer.
  • each convolution block is composed of first to second convolution sublayers, first to third batches of normalization processing and activation function sublayers, and depth separable convolution sublayers based on hole convolution.
  • the convolution block includes a first convolution sublayer, a first batch of normalization processing and activation function sublayers, a depth separable convolution sublayer based on hole convolution, and a second batch of normalization sublayers in a logical order.
  • the processing and activation function sublayer, the second convolution sublayer and the third batch of normalization processing and activation function sublayers are included in the convolution block.
  • the CRNN algorithm also includes:
  • the cyclic neural network layer which includes two bidirectional long and short-term memory network layers, is used to predict the feature vector sequence of the word to be recognized to obtain the prediction result.
  • the CRNN algorithm also includes:
  • the transcription layer is used to decode the prediction result into characters, and remove the space characters and repeated characters to obtain the recognition result of the word to be recognized.
  • collecting the image of the word to be recognized specifically includes:
  • the translation pen when the translation pen performs word recognition, it stretches the words that are not in a horizontal position, such as inclination and bending, into a horizontal position, and then performs word recognition, thereby effectively improving the accuracy and recognition effect of word recognition. Improve the user experience. It solves the technical problem that the accuracy of text recognition needs to be further improved when the text is relatively blurred when the text is oblique, perspective, or curved in the prior art (for example, when it is applied in a point translation scene).
  • the present invention also provides a non-volatile computer-readable storage medium.
  • the non-volatile computer-readable storage medium has a computer program stored thereon, and when the program is executed by the processor, the following word recognition method is implemented:
  • identifying the edge range of each character of the word to be recognized from the word image to be recognized includes:
  • the maximum extreme value stable region algorithm is used to determine the edge range of each character in the word to be recognized.
  • determining the geometric position of the word to be recognized includes:
  • the geometric position of the word to be recognized is stretched to a horizontal position, which specifically includes:
  • the thin plate spline interpolation transformation matrix is calculated according to the reference point, and bilinear interpolation sampling is performed to stretch the geometric position of the word to be recognized into a horizontal position.
  • the CRNN algorithm recognizes the word to be recognized in the horizontal position.
  • the method further includes:
  • the image of the word to be recognized in the horizontal position is scaled to a preset height and a preset width.
  • the CRNN algorithm includes:
  • the convolutional neural network layer includes the first to second convolutional layers, the first to fifth convolutional blocks, the first to fourth maximum pooling layers and the custom network layer, through the convolutional layer and The convolution block extracts the feature vector sequence of the word to be recognized in the scaled image.
  • the convolutional neural network layer includes the first convolutional layer, the first convolutional block, the first maximum pooling layer, the second convolutional block, the second maximum pooling layer, and the third volume in a logical order.
  • Build block third largest pooling layer, fourth convolution block, fourth largest pooling layer, fifth convolution block, second convolution layer and custom network layer.
  • each convolution block is composed of first to second convolution sublayers, first to third batches of normalization processing and activation function sublayers, and depth separable convolution sublayers based on hole convolution.
  • the convolution block includes a first convolution sublayer, a first batch of normalization processing and activation function sublayers, a depth separable convolution sublayer based on hole convolution, and a second batch of normalization sublayers in a logical order.
  • the processing and activation function sublayer, the second convolution sublayer and the third batch of normalization processing and activation function sublayers are included in the convolution block.
  • the CRNN algorithm also includes:
  • the cyclic neural network layer which includes two bidirectional long and short-term memory network layers, is used to predict the feature vector sequence of the word to be recognized to obtain the prediction result.
  • the CRNN algorithm also includes:
  • the transcription layer is used to decode the prediction result into characters, and remove the space characters and repeated characters to obtain the recognition result of the word to be recognized.
  • collecting the image of the word to be recognized specifically includes:
  • the translation pen when the translation pen performs word recognition, it stretches words that are not in a horizontal position such as inclination and bend into a horizontal position, and then performs word recognition, thereby effectively improving word recognition
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present invention, “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transmit a program for use by an instruction execution system, device, or device or in combination with these instruction execution systems, devices, or devices.
  • computer readable media include the following: electrical connections (electronic devices) with one or more wiring, portable computer disk cases (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable and editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, because it can be used, for example, by optically scanning the paper or other medium, followed by editing, interpretation, or other suitable media if necessary. The program is processed in a manner to obtain the program electronically, and then stored in the computer memory.
  • each part of the present invention can be implemented by hardware, software, firmware or a combination thereof.
  • multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system.
  • Discrete logic gate circuits with logic functions for data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate array (PGA), field programmable gate array (FPGA), etc.
  • a person of ordinary skill in the art can understand that all or part of the steps carried in the method of the foregoing embodiments can be implemented by a program instructing relevant hardware to complete.
  • the program can be stored in a computer-readable storage medium. When executed, it includes one of the steps of the method embodiment or a combination thereof.
  • the functional units in the various embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
  • the aforementioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

本发明提出一种单词识别方法、设备及存储介质,其中,方法包括:采集待识别的单词图像;从待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定待识别的单词的几何位置,将待识别的单词的几何位置拉伸成水平位置;识别处于水平位置的待识别的单词。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时,对文本识别准确性还有待进一步提高的技术问题。

Description

单词识别方法、设备及存储介质 技术领域
本发明涉及文本识别技术领域,尤其涉及一种单词识别方法、设备及存储介质。
背景技术
翻译笔是人们阅读资料或书籍时进行单词识别的工具之一。一些翻译笔(如点译笔)对清晰平整的文本内容识别效果相对较好。但是,当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高。
发明内容
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本发明为此,本发明提出一种单词识别方法,可以实现有效提高单词识别的准确性和识别效果,提升用户的使用体验。
本发明还提出一种电子设备。
本发明还提出一种非易失性计算机可读存储介质。
本发明一方面实施例提出了单词识别方法,包括以下步骤:
采集待识别的单词图像;
从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及
识别处于水平位置的待识别的单词。
本发明实施例的单词识别方法,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高的技术问题。
本发明又一方面实施例提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,以实现如下的单词识别方法,包括:
采集待识别的单词图像;
从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识 别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及
识别处于水平位置的待识别的单词。
本发明实施例的电子设备,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高的技术问题。
本发明又一方面实施例提出了一种非易失性计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时,实现如下的单词识别方法:
采集待识别的单词图像;
从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及
识别处于水平位置的待识别的单词。
本发明实施例的非易失性计算机可读存储介质,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高的技术问题。
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本发明实施例所提供的一种单词识别方法的流程示意图;
图2为本发明实施例所提供的一种采集待识别的单词图像的示意图;
图3为本发明实施例所提供的一种如步骤102所述的文本矫正的流程示意图;
图4为本发明实施例所提供的一种确定待识别的单词的几何位置的示意图;
图5为本发明实施例所提供的一种选取基点的示意图;
图6a为本发明实施例所提供的一种文本矫正的示例图;
图6b为本发明实施例所提供的另一种文本矫正的示例图;
图7为本发明实施例所提供的另一种单词识别方法的流程示意图;
图8为相关技术所提供的一种CRNN算法的结构示意图;
图9为本发明实施例所提供的一种改进的CRNN算法的结构示意图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
下面参考附图描述本发明实施例的单词识别方法和装置。
图1为本发明实施例所提供的一种单词识别方法的流程示意图。
本申请实施例的单词识别方法可以用于翻译场景,如用于翻译笔的识别功能。以下实施例以点译笔为例说明。
在OCR技术对印刷体或手写体等文本(比如单词)识别时,由于文本发生倾斜、弯曲等情况,从而导致识别不准确,识别效果较差。
针对这一问题,本发明实施例提供了单词识别方法,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。如图1所示,该单词识别方法包括以下步骤:
步骤101,采集待识别的单词图像。
具体的,本发明实施例提供的单词识别方法,可以由处理器执行,该处理器可以设置在本发明实施例提供的终端或云端的服务器等电子设备中。终端的具体实施方式包括类似点译笔、点读机等翻译设备,本发明对此不作限制。
比如,在点译笔的应用场景中,可以获取点译笔的扫描头提供的需要识别的单词图像。
在具体应用时,用户通过点译笔的笔尖指定需要翻译的单词,笔尖和摄像头的相对位置固定,因此笔尖在图像中的相对位置固定,所以在本实施例中,可以采用如下的方法采集待识别的单词图像,具体包括:
以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区域,作为虚拟笔尖;其中,虚拟笔尖如图2所示,固定大小的矩形区域即为图2所示的虚拟检测框;
分别计算固定大小的矩形区域和每个被检测出的文本框的重叠面积;其中,文本框如图2所示;
找到重叠面积占固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
步骤102,从待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定待识别的单词的几何位置,将待识别的单词的几何位置拉伸成水平位置。
其中,几何位置可以为倾斜、弯曲或者水平等位置,确定单词的几何位置可以理解为:确定待识别的单词在图像中是倾斜、弯曲还是水平等情况。
如果待识别的单词在图像中处于水平清晰的情况时,可以直接识别;但是如果待识别的单词在图像中处于发生倾斜、透视、弯曲等情况时,在实际应用时,可能会出现单词识别不准确或识别效率较低的问题。因此,本发明实施例在识别时,要对倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,从而可以提高单词识别的准确性和识别效果。
步骤103,识别处于水平位置的待识别的单词。
综上,本发明实施例的单词识别方法,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高的技术问题。
在点译笔的应用场景中,若纸张是柔性材质时,采集到的图像中文本可能会发生倾斜、透视、弯曲等情况,在实际应用时,可能会出现单词识别不准确或识别效率较低的问题。因此,提高单词识别的准确性,本发明实施例首先将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,下面将对步骤102的文本矫正过程进行详细的阐述,如图3,步骤102具体包括:
步骤201,使用最大极值稳定区域算法确定待识别的单词中每个字符的边缘范围。
其中,本发明实施例还可以通过其他算法确定每个字符的边缘范围,比如投影法、基于机器学习或深度学习的字符检测算法等,在此仅以最大极值稳定区域算法作为示例,不做具体限定。
可以理解的是,在确定待识别的某个包含一个或多个字符的单词时,使用最大极值稳定区域算法(MSER,Maximally Stable Extremal Regions)确定待识别的单词的每个字符或数量占比超过预设阈值的字符(比如,一个单词中数量占比为80%以上的字符)所处位置的边缘范围。
步骤202,根据字符的边缘范围确定字符的初始外接矩形,以初始外接矩形中心点作旋转中心点,确定主轴位置;不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;以待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;确定最小外接矩形四个顶点的坐标,根据四个顶点的坐标确定最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;根据所 有字符朝向角度确定待识别的单词的几何位置。
例如,如图4所示,以整个图像的左上角点为原点,以水平为x轴,以竖直方向为y轴建立坐标系。在该坐标系下可以得到最小外接矩形的四个顶点坐标为(x 0,y 0)、(x 1,y 1)、(x 2,y 2)、(x 3,y 3),进而可以确定最小外接矩形的高和宽。以宽与x轴(水平方向)的夹角θ作为字符朝向角度,进而也就可以确定字符的朝向;然后,根据字符朝向角度的大小可以确定该字符倾斜还是水平,根据每个字符的朝向即可从整体上确定单词是倾斜、弯曲还是水平等情况,由此可以确定单词的几何位置。
步骤203,以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;在文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;根据基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将待识别的单词的几何位置拉伸成水平位置。
比如,在选取基准点时,如图5所示,选取的两个点相当于是在底边上取一个中点,垂直于底边画一条线,这条线经过矩形中心点,和顶边相交于一点。这两个点分别是矩形底边和顶边上的点,对所有采样点均按照此方式进行取点,从而得到基准点。当然,也可以在该条线上取更多的点,本发明实施例以取两个点为例,在此不做具体限定。
图3所示的方法,举例而言,如图6a所示,图像中的单词“statements”存在倾斜和弯曲的情况;如图6b所示,图像中的单词“Example”存在倾斜的情况;下面将通过图6a为例对文本矫正进行举例说明,具有如下:
步骤201中,示例性的,首先使用MSER算法确定“statements”中每个字符(每个字母即为一个字符)的边缘范围。步骤202中,示例性的,根据步骤202的方法确定单词文本statements的每个字符的中心点和字符朝向。步骤203中,示例性的,将多个字符的中心点依次顺序连接,得到单词文本statements的中心线。在中心线上等间距采样5个点,对这5个点的每一个按字符高度的一半的距离沿字符朝向取两个点(即每个点所在矩形框底边和顶边的中点),针对等间距采样5点沿字符朝向取出的这10个点就是基准点。最后,在得到基准点后即可计算薄板样条插值变换矩阵,进行双线性插值采样得到纠正后的“statements”图像。
根据图6a矫正前后的文本图像可以看出,本发明实施例可以有效地将倾斜、透视、弯曲等情况的单词拉伸至水平,同时在拉伸过程中使图像更加平滑,不易出现文字由于拉伸出现的扭曲变形的情况。
本发明实施例通过上述文本矫正方法,可以有效地将倾斜、透视、弯曲等的单词文本拉伸至水平,同时在拉伸过程中使图像更加平滑,不易出现文字由于拉伸出现的扭曲变形的情况,提高了单词文本矫正的可靠性。
基于上一实施例,本实施例提供了另一种单词识别方法,用以说明在图像矫正之后如何进行单词识别,本实施例和上一实施例在描述内容上各有侧重,各实施例之间对于未尽述步骤可相互参考。本实施例中,如图7所示,单词识别方法包括以下步骤:
步骤701,采集待识别的单词图像。
步骤702,从待识别的单词图像中识别待识别的单词的每个字符所处位置,确定文本的几何属性,并根据几何属性将待识别的单词图像拉伸成水平图像。
其中,步骤701和步骤702可以参见步骤101和步骤102的解释,为避免冗余,在此不做赘述。
步骤703,通过CRNN算法识别处于水平位置的待识别的单词。
本发明实施例通过改进现有的CRNN算法结构,可以有效提升算法的运算效率,进而提高单词识别的效率和准确性,下面将进行具体说明:
首先,如图8所示,现有的CRNN算法结构包括:卷积神经网络层、循环神经网络层和转录层,具体地:
卷积神经网络层由第一至第七卷积层(图中的Convolution1-7)、最大池化层(Maxpooling)和自定义网络层(Map-to-Sequence)组成,主要通过卷积层进行图像特征的提取。自定义网络层为卷积神经网络层与循环神经网络层之间的“桥梁”。
循环神经网络层,循环神经网络层包括两个双向长短期记忆网络层,用于对待识别的单词的特征向量序列进行预测,以得到预测结果。其中,由于文本序列的前向信息和后向信息都有助于序列的预测,因此使用2个双向LSTM层,学习文本序列之间的语义关系。
转录层,用于将预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
本发明实施例在如图8所示的现有的CRNN算法结构上进行改进,将第二至第六卷积层替换为设计的卷积块,改进后的算法如图9所示,与现有CRNN算法的区别在与卷积神经网络层结构的不同。循环神经网络层和转录层与现有CRNN算法的结构一样,为避免冗余,在此不做赘述。如图9所示,卷积神经网络层由卷积层(Convolution)、卷积块(Conv Bkock)和最大池化层(Maxpooling)组成,该卷积神经网络层用于提取待识别图像特征。其中,卷积层包括卷积(Conv)、批归一化处理(Batch Normalization)和激活函数(Relu)。
具体地:卷积神经网络层按照逻辑顺序依次包括第一卷积层、第一卷积块、第一最大池化层、第二卷积块、第二最大池化层、第三卷积块、第三最大池化层、第四卷积块、第四最大池化层、第五卷积块、第二卷积层和自定义网络层。其中,卷积块按照逻辑顺序依次包括第一卷积子层、第一批归一化处理和激活函数子层、基于空洞卷积的深度可分离卷 积子层、第二批归一化处理和激活函数子层、第二卷积子层和第三批归一化处理和激活函数子层。
以提取待识别单词的特征序列为例,现有CRNN算法仅通过卷积层进行待识别的单词的特征向量序列的提取;而本发明实施例通过卷积层、卷积块进行图像特征提取,相比较现有CRNN算法,本发明实施例有效增大了算法的感受视野,提高算法对相似字符的判断能力,并提高算法识别准确率,减少计算量,加快运算效率,从而更加适用于翻译笔使用的实际场景。进而,本发明实施例在通过改进后的CRNN算法进行待识别的单词识别时,可以提高单词识别的效率和准确性。
进一步地,在通过CRNN算法识别水平位置的待识别的单词之前,还包括:将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。预设高度和预设宽度可以根据实际情况进行设置,在此不做限定。
综上,本发明实施例的单词识别方法,通过改进CRNN算法进行单词识别,有效提高了单词识别的准确率和识别速度,进而可以缩短识别所需的时间,提高识别的效率。
为了实现上述实施例,本发明还提出一种电子设备。
该电子设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行程序时,以实现如下的单词识别方法,包括:
采集待识别的单词图像;
从待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定待识别的单词的几何位置,将待识别的单词的几何位置拉伸成水平位置;
识别处于水平位置的待识别的单词。
可选地,从待识别的单词图像中识别待识别的单词的每个字符边缘范围,具体包括:
使用最大极值稳定区域算法确定待识别的单词中每个字符的边缘范围。
进一步地,确定待识别的单词的几何位置,具体包括:
根据字符的边缘范围确定字符的初始外接矩形,以初始外接矩形中心点作旋转中心点,确定主轴位置;
不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;
以待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;
确定最小外接矩形四个顶点在坐标系的坐标,根据四个顶点的坐标确定最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;
根据所有字符朝向角度确定待识别的单词的几何位置。
进一步地,将待识别的单词的几何位置拉伸成水平位置,具体包括:
以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;
在文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;
根据基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将待识别的单词的几何位置拉伸成水平位置。
进一步地,还包括:
通过CRNN算法识别处于水平位置的待识别的单词。
进一步地,在通过CRNN算法识别水平位置的待识别的单词之前,还包括:
将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。
进一步地,CRNN算法包括:
卷积神经网络层,卷积神经网络层包括第一至第二卷积层、第一至第五卷积块、第一至第四最大池化层和自定义网络层,通过卷积层和卷积块提取缩放后图像的待识别的单词的特征向量序列。
进一步地,其中,卷积神经网络层按照逻辑顺序依次包括第一卷积层、第一卷积块、第一最大池化层、第二卷积块、第二最大池化层、第三卷积块、第三最大池化层、第四卷积块、第四最大池化层、第五卷积块、第二卷积层和自定义网络层。
进一步地,每个卷积块均由第一至第二卷积子层、第一至第三批归一化处理和激活函数子层和基于空洞卷积的深度可分离卷积子层组成。
进一步地,卷积块按照逻辑顺序依次包括第一卷积子层、第一批归一化处理和激活函数子层、基于空洞卷积的深度可分离卷积子层、第二批归一化处理和激活函数子层、第二卷积子层和第三批归一化处理和激活函数子层。
进一步地,CRNN算法还包括:
循环神经网络层,循环神经网络层包括两个双向长短期记忆网络层,用于对待识别的单词的特征向量序列进行预测,以得到预测结果。
进一步地,CRNN算法还包括:
转录层,用于将预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
进一步地,采集待识别的单词图像,具体包括:
以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区 域,作为虚拟笔尖;
分别计算固定大小的矩形区域和每个被检测出的文本框的重叠面积;
找到重叠面积占固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
需要说明的是,前述对单词识别方法实施例的解释说明也适用于该实施例的电子设备,此处不再赘述。
本发明实施例的电子设备,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高的技术问题。
为了实现上述实施例,本发明还提出一种非易失性计算机可读存储介质。
该非易失性计算机可读存储介质,其上存储有计算机程序,程序被处理器执行时,实现如下的单词识别方法:
采集待识别的单词图像;
从待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定待识别的单词的几何位置,将待识别的单词的几何位置拉伸成水平位置;
识别处于水平位置的待识别的单词。
可选地,从待识别的单词图像中识别待识别的单词的每个字符边缘范围,具体包括:
使用最大极值稳定区域算法确定待识别的单词中每个字符的边缘范围。
进一步地,确定待识别的单词的几何位置,具体包括:
根据字符的边缘范围确定字符的初始外接矩形,以初始外接矩形中心点作旋转中心点,确定主轴位置;
不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;
以待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;
确定最小外接矩形四个顶点在坐标系的坐标,根据四个顶点的坐标确定最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;
根据所有字符朝向角度确定待识别的单词的几何位置。
进一步地,将待识别的单词的几何位置拉伸成水平位置,具体包括:
以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;
在文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;
根据基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将待识别的单词的几何位置拉伸成水平位置。
进一步地,还包括:
通过CRNN算法识别处于水平位置的待识别的单词。
进一步地,在通过CRNN算法识别水平位置的待识别的单词之前,还包括:
将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。
进一步地,CRNN算法包括:
卷积神经网络层,卷积神经网络层包括第一至第二卷积层、第一至第五卷积块、第一至第四最大池化层和自定义网络层,通过卷积层和卷积块提取缩放后图像的待识别的单词的特征向量序列。
进一步地,其中,卷积神经网络层按照逻辑顺序依次包括第一卷积层、第一卷积块、第一最大池化层、第二卷积块、第二最大池化层、第三卷积块、第三最大池化层、第四卷积块、第四最大池化层、第五卷积块、第二卷积层和自定义网络层。
进一步地,每个卷积块均由第一至第二卷积子层、第一至第三批归一化处理和激活函数子层和基于空洞卷积的深度可分离卷积子层组成。
进一步地,卷积块按照逻辑顺序依次包括第一卷积子层、第一批归一化处理和激活函数子层、基于空洞卷积的深度可分离卷积子层、第二批归一化处理和激活函数子层、第二卷积子层和第三批归一化处理和激活函数子层。
进一步地,CRNN算法还包括:
循环神经网络层,循环神经网络层包括两个双向长短期记忆网络层,用于对待识别的单词的特征向量序列进行预测,以得到预测结果。
进一步地,CRNN算法还包括:
转录层,用于将预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
进一步地,采集待识别的单词图像,具体包括:
以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区域,作为虚拟笔尖;
分别计算固定大小的矩形区域和每个被检测出的文本框的重叠面积;
找到重叠面积占固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
需要说明的是,前述对单词识别方法实施例的解释说明也适用于该实施例的非易失性计算机可读存储介质,此处不再赘述。
本发明实施例的非易失性计算机可读存储介质,在翻译笔进行单词识别时,将倾斜、弯曲等未处于水平位置的单词拉伸成水平位置,然后进行单词识别,从而有效提高单词识别的准确性和识别效果,提升用户的使用体验。解决现有技术中当文本发生倾斜、透视、弯曲等情况可能使得文字相对模糊时(比如应用在点译笔场景中),对文本识别准确性还有待进一步提高的技术问题。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用 的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (21)

  1. 一种单词识别方法,其特征在于,包括:
    采集待识别的单词图像;
    从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及
    识别处于水平位置的待识别的单词。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述待识别的单词图像中识别待识别的单词的每个字符边缘范围,具体包括:
    使用最大极值稳定区域算法确定所述待识别的单词中每个字符的边缘范围。
  3. 根据权利要求1或2所述的方法,其特征在于,所述确定所述待识别的单词的几何位置,具体包括:
    根据字符的边缘范围确定字符的初始外接矩形,以所述初始外接矩形中心点作旋转中心点,确定主轴位置;
    不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;
    以所述待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;
    确定最小外接矩形四个顶点在所述坐标系的坐标,根据四个顶点的坐标确定所述最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;
    根据所有字符朝向角度确定所述待识别的单词的几何位置。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述待识别的单词的几何位置拉伸成水平位置,具体包括:
    以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;
    在所述文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;
    根据所述基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将所述待识别的单词的几何位置拉伸成水平位置。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    通过CRNN算法识别处于水平位置的待识别的单词。
  6. 根据权利要求5所述的方法,其特征在于,在通过CRNN算法识别水平位置的待 识别的单词之前,还包括:
    将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。
  7. 根据权利要求6所述的方法,其特征在于,所述CRNN算法包括:
    卷积神经网络层,所述卷积神经网络层包括第一至第二卷积层、第一至第五卷积块、第一至第四最大池化层和自定义网络层,通过所述第一至第二卷积层和所述第一至第五卷积块提取缩放后图像的待识别的单词的特征向量序列,其中,所述特征向量序列用于标识所述单词。
  8. 根据权利要求7所述的方法,其特征在于,所述CRNN算法还包括:
    循环神经网络层,所述循环神经网络层包括两个双向长短期记忆网络层,所述双向长短期记忆网络层用于对所述待识别的单词的特征向量序列进行预测,以得到预测结果。
  9. 根据权利要求8所述的方法,其特征在于,所述CRNN算法还包括:
    转录层,用于将所述循环神经网络层生成的所述预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
  10. 根据权利要求1所述的方法,其特征在于,所述采集待识别的单词图像,具体包括:
    以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区域,作为虚拟笔尖;
    分别计算所述固定大小的矩形区域和每个被检测出的文本框的重叠面积;
    找到重叠面积占所述固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
  11. 一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,以实现如下的单词识别方法,包括:
    采集待识别的单词图像;
    从所述待识别的单词图像中识别待识别的单词的每个字符的边缘范围,确定所述待识别的单词的几何位置,将所述待识别的单词的几何位置拉伸成水平位置;以及
    识别处于水平位置的待识别的单词。
  12. 根据权利要求11所述的电子设备,其特征在于,所述从所述待识别的单词图像中识别待识别的单词的每个字符边缘范围,具体包括:
    使用最大极值稳定区域算法确定所述待识别的单词中每个字符的边缘范围。
  13. 根据权利要求11或12所述的电子设备,其特征在于,所述确定所述待识别的单词的几何位置,具体包括:
    根据字符的边缘范围确定字符的初始外接矩形,以所述初始外接矩形中心点作旋转中 心点,确定主轴位置;
    不断旋转和平移主轴确定边界,通过比较边界所围区域面积大小得到面积最小的外接矩形;
    以所述待识别的单词图像的任意一个顶点为原点,以水平方向为x轴,以竖直方向为y轴建立坐标系;
    确定最小外接矩形四个顶点在所述坐标系的坐标,根据四个顶点的坐标确定所述最小外接矩形的高和宽,以其中一条宽与x轴夹角作为字符朝向角度;
    根据所有字符朝向角度确定所述待识别的单词的几何位置。
  14. 根据权利要求13所述的电子设备,其特征在于,所述将所述待识别的单词的几何位置拉伸成水平位置,具体包括:
    以最小外接矩形的中心点作为字符的中心点,将所有字符的中心点从左到右顺序连接,得到文本中心线;
    在所述文本中心线上等间距采样多个中心点,并在采样点对应的字符朝向的两条高上各取一个点,作为基准点;
    根据所述基准点计算薄板样条插值变换矩阵,并进行双线性插值采样,以将所述待识别的单词的几何位置拉伸成水平位置。
  15. 根据权利要求13所述的电子设备,其特征在于,还包括:
    通过CRNN算法识别处于水平位置的待识别的单词。
  16. 根据如权利要求15所述的电子设备,其特征在于,在通过CRNN算法识别水平位置的待识别的单词之前,还包括:
    将处于水平位置的待识别的单词的图像缩放到预设高度和预设宽度。
  17. 根据权利要求15所述的电子设备,其特征在于,所述CRNN算法包括:
    卷积神经网络层,所述卷积神经网络层包括第一至第二卷积层、第一至第五卷积块、第一至第四最大池化层和自定义网络层,通过卷积层和卷积块提取缩放后图像的待识别的单词的特征向量序列,其中,所述特征向量序列用于标识所述单词。
  18. 根据权利要求17所述的电子设备,其特征在于,所述CRNN算法还包括:
    循环神经网络层,所述循环神经网络层包括两个双向长短期记忆网络层,用于对所述待识别的单词的特征向量序列进行预测,以得到预测结果。
  19. 根据权利要求18所述的电子设备,其特征在于,所述CRNN算法还包括:
    转录层,用于将所述循环神经网络层的所述预测结果解码为字符,并去掉间隔字符和重复字符,以得到待识别的单词的识别结果。
  20. 根据权利要求11所述的电子设备,所述采集待识别的单词图像,具体包括:
    以笔尖所在位置为底边的中心,并根据图像中单词的大小设定一个固定大小的矩形区域,作为虚拟笔尖;
    分别计算所述固定大小的矩形区域和每个被检测出的文本框的重叠面积;
    找到重叠面积占所述固定大小的矩形区域比例最大的文本框,以该文本框中的单词作为待识别的单词,得到待识别的单词图像。
  21. 一种非易失性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时,实现如权利要求1-10任一项所述的单词识别方法。
PCT/CN2020/082566 2020-03-31 2020-03-31 单词识别方法、设备及存储介质 WO2021196013A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2020/082566 WO2021196013A1 (zh) 2020-03-31 2020-03-31 单词识别方法、设备及存储介质
US17/263,418 US11651604B2 (en) 2020-03-31 2020-03-31 Word recognition method, apparatus and storage medium
CN202080000447.2A CN113748429A (zh) 2020-03-31 2020-03-31 单词识别方法、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/082566 WO2021196013A1 (zh) 2020-03-31 2020-03-31 单词识别方法、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021196013A1 true WO2021196013A1 (zh) 2021-10-07

Family

ID=77926983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082566 WO2021196013A1 (zh) 2020-03-31 2020-03-31 单词识别方法、设备及存储介质

Country Status (3)

Country Link
US (1) US11651604B2 (zh)
CN (1) CN113748429A (zh)
WO (1) WO2021196013A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902046A (zh) * 2021-12-10 2022-01-07 北京惠朗时代科技有限公司 一种特效字体识别方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783760B (zh) * 2020-06-30 2023-08-08 北京百度网讯科技有限公司 文字识别的方法、装置、电子设备及计算机可读存储介质
CN115527215A (zh) * 2022-10-10 2022-12-27 杭州睿胜软件有限公司 包含文本的图像处理方法、系统及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140161365A1 (en) * 2012-12-12 2014-06-12 Qualcomm Incorporated Method of Perspective Correction For Devanagari Text
CN104239861A (zh) * 2014-09-10 2014-12-24 深圳市易讯天空网络技术有限公司 卷曲文本图像预处理方法和彩票扫描识别方法
CN106951896A (zh) * 2017-02-22 2017-07-14 武汉黄丫智能科技发展有限公司 一种车牌图像倾斜校正方法
CN107516096A (zh) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 一种字符识别方法及装置
CN108647681A (zh) * 2018-05-08 2018-10-12 重庆邮电大学 一种带有文本方向校正的英文文本检测方法
CN108985137A (zh) * 2017-06-02 2018-12-11 杭州海康威视数字技术股份有限公司 一种车牌识别方法、装置及系统
CN110321755A (zh) * 2018-03-28 2019-10-11 中移(苏州)软件技术有限公司 一种识别方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI319547B (en) * 2006-12-01 2010-01-11 Compal Electronics Inc Method for generating typographical line
US9977976B2 (en) * 2016-06-29 2018-05-22 Konica Minolta Laboratory U.S.A., Inc. Path score calculating method for intelligent character recognition
US10783400B2 (en) * 2018-04-06 2020-09-22 Dropbox, Inc. Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140161365A1 (en) * 2012-12-12 2014-06-12 Qualcomm Incorporated Method of Perspective Correction For Devanagari Text
CN104239861A (zh) * 2014-09-10 2014-12-24 深圳市易讯天空网络技术有限公司 卷曲文本图像预处理方法和彩票扫描识别方法
CN107516096A (zh) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 一种字符识别方法及装置
CN106951896A (zh) * 2017-02-22 2017-07-14 武汉黄丫智能科技发展有限公司 一种车牌图像倾斜校正方法
CN108985137A (zh) * 2017-06-02 2018-12-11 杭州海康威视数字技术股份有限公司 一种车牌识别方法、装置及系统
CN110321755A (zh) * 2018-03-28 2019-10-11 中移(苏州)软件技术有限公司 一种识别方法及装置
CN108647681A (zh) * 2018-05-08 2018-10-12 重庆邮电大学 一种带有文本方向校正的英文文本检测方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902046A (zh) * 2021-12-10 2022-01-07 北京惠朗时代科技有限公司 一种特效字体识别方法及装置
CN113902046B (zh) * 2021-12-10 2022-02-18 北京惠朗时代科技有限公司 一种特效字体识别方法及装置

Also Published As

Publication number Publication date
CN113748429A (zh) 2021-12-03
US11651604B2 (en) 2023-05-16
US20220036112A1 (en) 2022-02-03

Similar Documents

Publication Publication Date Title
US10983596B2 (en) Gesture recognition method, device, electronic device, and storage medium
WO2019128646A1 (zh) 人脸检测方法、卷积神经网络参数的训练方法、装置及介质
CN110232311B (zh) 手部图像的分割方法、装置及计算机设备
WO2021196013A1 (zh) 单词识别方法、设备及存储介质
CN110147786B (zh) 用于检测图像中的文本区域的方法、装置、设备以及介质
US7302099B2 (en) Stroke segmentation for template-based cursive handwriting recognition
US7369702B2 (en) Template-based cursive handwriting recognition
WO2021238446A1 (zh) 文本识别方法、设备及存储介质
US10621759B2 (en) Beautifying freeform drawings using transformation adjustments
CA2481828C (en) System and method for detecting a hand-drawn object in ink input
JP2933801B2 (ja) 文字の切り出し方法及びその装置
WO2020228187A1 (zh) 边缘检测方法、装置、电子设备和计算机可读存储介质
CN110222703B (zh) 图像轮廓识别方法、装置、设备和介质
JP2018524734A (ja) 複数のオブジェクトの入力を認識するためのシステムならびにそのための方法および製品
WO2019041424A1 (zh) 验证码识别方法、装置、计算机设备及计算机存储介质
CN110688947A (zh) 一种同步实现人脸三维点云特征点定位和人脸分割的方法
CN107545223B (zh) 图像识别方法及电子设备
JP6877446B2 (ja) 多重オブジェクト構造を認識するためのシステムおよび方法
US10579868B2 (en) System and method for recognition of objects from ink elements
US11823474B2 (en) Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium
WO2022222096A1 (zh) 手绘图形识别方法、装置和系统,以及计算机可读存储介质
CN111492407A (zh) 用于绘图美化的系统和方法
US9418281B2 (en) Segmentation of overwritten online handwriting input
CN111753812A (zh) 文本识别方法及设备
CN113177542A (zh) 识别印章文字的方法、装置、设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928790

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928790

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20928790

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20928790

Country of ref document: EP

Kind code of ref document: A1