CN110796130A

CN110796130A - Method, device and computer storage medium for character recognition

Info

Publication number: CN110796130A
Application number: CN201910887052.8A
Authority: CN
Inventors: 王枫; 邵帅; 俞刚
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-02-14

Abstract

The embodiment of the invention provides a method and a device for character recognition and a computer storage medium. The method comprises the following steps: acquiring a character area where characters in a target image are located; dividing the text area into a plurality of sub-areas, and determining a characteristic vector of each sub-area in the plurality of sub-areas; obtaining a feature vector of the character area according to the feature vector of each sub-area in the sub-areas; and identifying based on the feature vector of the character area to obtain an identification result of the characters in the target image. Therefore, when character recognition is carried out, the character area is divided into a plurality of sub-areas, the characteristic vector of the character area is obtained based on the characteristic vector of each sub-area, and the curve characteristic of the bent characters is considered, so that the obtained character recognition result is more accurate.

Description

Method, device and computer storage medium for character recognition

Technical Field

The present invention relates to the field of image processing, and more particularly, to a method, an apparatus, and a computer storage medium for character recognition.

Background

Detecting and recognizing text of an image may use Optical Character Recognition (OCR) techniques based on deep learning. Deep learning based OCR techniques are a class of methods that use deep neural networks for optical character detection and recognition.

The general OCR technology process is that firstly, characters in an image are detected through a detection network, then, the characteristics of a detection frame are obtained through characteristic extraction, and finally, the characters are identified through an identification network. However, in the current detection and recognition, only a rotating rectangular frame can be extracted for feature extraction, and irrelevant background information is introduced during recognition, so that the accuracy of recognition of curved characters and the like is not high, and the recognition effect is poor.

Disclosure of Invention

The invention provides a method and a device for character recognition and a computer storage medium, which can recognize curved characters in an image and improve the recognition accuracy.

According to an aspect of the present invention, there is provided a method for text recognition, comprising:

acquiring a character area where characters in a target image are located;

dividing the text area into a plurality of sub-areas, and determining a characteristic vector of each sub-area in the plurality of sub-areas;

obtaining a feature vector of the character area according to the feature vector of each sub-area in the sub-areas;

and identifying based on the feature vector of the character area to obtain an identification result of the characters in the target image.

In an embodiment of the present invention, the dividing the text area into a plurality of sub-areas includes: sampling points on the outline of the character area to obtain a plurality of sampling points; and dividing the text area into a plurality of sub-areas according to the plurality of sampling points.

In an embodiment of the present invention, the sampling points on the outline of the text region to obtain a plurality of sampling points includes: and uniformly sampling points on the outline of the character area to obtain a plurality of sampling points.

In an embodiment of the present invention, the sampling points on the outline of the text region to obtain a plurality of sampling points includes: and sampling is carried out based on the curvature change between points on the outline of the character area to obtain a plurality of sampling points.

In one embodiment of the invention, sampling based on curvature changes between points on the outline of the text region comprises: and if the absolute value of the difference between a first slope between a first point on the outline of the text area and a second point adjacent to the left side of the outline of the text area and a second slope between the second point and a third point adjacent to the left side of the outline of the text area is greater than a preset threshold value, selecting the first point as a sampling point.

In one embodiment of the invention, sampling based on curvature changes between points on the outline of the text region comprises: and if the absolute value of the difference between a first slope between a first point on the outline of the character area and a second point adjacent to the right side of the first point and a second slope between the second point and a third point adjacent to the right side of the second point is larger than a preset threshold value, selecting the first point as a sampling point.

In an embodiment of the present invention, the sampling points on the outline of the text region includes: sampling is performed in a clockwise direction or a counterclockwise direction, starting from a specific point on the contour.

In an embodiment of the present invention, the obtaining the feature vector of the text region according to the feature vector of each of the sub-regions includes: and splicing the characteristic vectors of the sub-regions along the width direction of the vector to obtain the characteristic vector of the character region.

In an embodiment of the present invention, the acquiring a text region where a text in a target image is located includes: and inputting the target image to a character detection module of a neural network to obtain the character area.

In an embodiment of the present invention, before acquiring the text region where the text in the target image is located, the method further includes: and obtaining the neural network through training based on a training data set, wherein the neural network comprises a character detection module, a feature extraction module and a character recognition module.

According to another aspect of the present invention, there is provided an apparatus for text recognition, the apparatus being configured to implement the steps of the method of the preceding aspect or any implementation manner, the apparatus comprising:

the character detection unit is used for acquiring a character area where characters in the target image are located;

the character extraction unit is used for dividing the character area into a plurality of sub-areas, determining a feature vector of each sub-area in the plurality of sub-areas, and obtaining the feature vector of the character area according to the feature vector of each sub-area in the plurality of sub-areas;

and the character recognition unit is used for recognizing based on the feature vector of the character area to obtain a recognition result of the characters in the target image.

According to a further aspect of the present invention, there is provided an apparatus for word recognition comprising a memory, a processor and a computer program stored on the memory and running on the processor, the processor when executing the computer program implementing the steps of a method for word recognition as set forth in the preceding aspect or any implementation.

According to a further aspect of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for text recognition as set forth in the preceding aspect or any implementation.

Therefore, when character recognition is carried out, the character area is divided into a plurality of sub-areas, the characteristic vector of the character area is obtained based on the characteristic vector of each sub-area, and the curve characteristic of the bent characters is considered, so that the obtained character recognition result is more accurate.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a schematic block diagram of an electronic device of an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for text recognition in accordance with an embodiment of the present invention;

FIG. 3 is a schematic view of an outline of a text region of an embodiment of the invention;

FIG. 4 is another schematic flow chart diagram of a method for text recognition in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of selecting a sample point from a plurality of points in accordance with an embodiment of the present invention;

FIGS. 6(a) - (d) are a schematic illustration of partitioned sub-regions of an embodiment of the present invention;

FIG. 7 is a schematic diagram of the various modules of a neural network of an embodiment of the present invention;

FIG. 8 is a schematic block diagram of an apparatus for text recognition in accordance with an embodiment of the present invention;

FIG. 9 is another schematic block diagram of an apparatus for text recognition in accordance with an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

The embodiment of the present invention can be applied to an electronic device, and fig. 1 is a schematic block diagram of the electronic device according to the embodiment of the present invention. The electronic device 10 shown in FIG. 1 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, an image sensor 110, and one or more non-image sensors 114, which are interconnected via a bus system 112 and/or otherwise. It should be noted that the components and configuration of the electronic device 10 shown in FIG. 1 are exemplary only, and not limiting, and that the electronic device may have other components and configurations as desired.

The processor 102 may include a Central Processing Unit (CPU) 1021 and a Graphics Processing Unit (GPU) 1022 or other forms of Processing units having data Processing capability and/or Instruction execution capability, such as a Field-Programmable gate array (FPGA) or an Advanced Reduced Instruction set Machine (Reduced Instruction set computer) Machine (ARM), and the like, and the processor 102 may control other components in the electronic device 10 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory 1041 and/or non-volatile memory 1042. The volatile Memory 1041 may include, for example, a Random Access Memory (RAM), a cache Memory (cache), and/or the like. The non-volatile Memory 1042 may include, for example, a Read-Only Memory (ROM), a hard disk, a flash Memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 102 to implement various desired functions. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, and the like.

The image sensor 110 may take images (e.g., photographs, videos, etc.) desired by the user and store the taken images in the storage device 104 for use by other components.

It should be noted that the components and structure of the electronic device 10 shown in fig. 1 are merely exemplary, and although the electronic device 10 shown in fig. 1 includes a plurality of different devices, some of the devices may not be necessary, some of the devices may be more numerous, and the like, as desired, and the invention is not limited thereto.

FIG. 2 is a schematic flow chart of a method for text recognition in accordance with an embodiment of the present invention. The method shown in fig. 2 comprises:

s210, acquiring a character area where characters in the target image are located;

s220, dividing the text area into a plurality of sub-areas, and determining a feature vector of each sub-area in the plurality of sub-areas;

s230, obtaining a feature vector of the character region according to the feature vector of each sub-region in the sub-regions;

s240, identifying based on the feature vector of the character area to obtain an identification result of the characters in the target image.

Illustratively, embodiments of the invention may be implemented using neural networks, which may include a word detection module, a feature extraction module, and a word recognition module. This neural network may be obtained by training prior to the method shown in fig. 2. In particular, the neural network may be derived by training based on a training data set comprising a large amount of training data.

Specifically, in S210, the target image may be input to a text detection module of the neural network, so as to obtain a text region where the text in the target image is located.

Alternatively, in S210, the target image may be preprocessed, and then the preprocessed target image is input to the character detection module of the neural network. Wherein the pre-processing may comprise: resizing (resize) the target image so that its size satisfies the size of the input of the neural network; the normalization (normalization) operation can then be performed again. Optionally, the preprocessing may also include other operations, such as color/grayscale processing, luminance processing, denoising processing, and the like, which are not listed here.

That is, S210 may include: the size of the target image is adjusted so that the adjusted size of the target image satisfies a preset size, for example, the preset size is 256 × 256 or the like. And then normalizing the adjusted target image to obtain a vector capable of being input into the neural network. And then inputting the normalized image to a character detection module of the neural network to obtain a character area where the characters are located.

Illustratively, the text region obtained in S210 has a boundary contour, which includes or is composed of a series of points. As an example, the outline of the character area may be as shown in fig. 3.

Specifically, the dividing of the text area into several sub-areas in S220 may include: sampling points on the outline of the character area to obtain a plurality of sampling points; and dividing the text area into a plurality of sub-areas according to the plurality of sampling points.

That is, as shown in fig. 4, S220 may include:

s2201, sampling points on the outline of the character area to obtain a plurality of sampling points.

And S2202, dividing the character area into a plurality of sub-areas according to the plurality of sampling points.

S2203, determining a feature vector of each of the plurality of sub-regions.

As an implementation, S2201 may include: and uniformly sampling points on the outline of the character area to obtain a plurality of sampling points. For example, the sampling may be performed uniformly in a clockwise or counterclockwise direction starting from a specific point on the contour, resulting in a plurality of sampling points.

For example, the leftmost point on the contour may be used as a starting point, and every M points are taken as sampling points (i.e., every M +1 points have one sampling point) in a clockwise or counterclockwise direction until the position returns to the starting point, so that N sampling points may be obtained.

As another implementation, S2201 may include: and sampling is carried out based on the curvature change between points on the outline of the character area to obtain a plurality of sampling points. Illustratively, analysis may be performed point by point in a clockwise or counterclockwise direction with a certain specific point on the contour as a starting point, and a point where the curvature change is large is determined as a sampling point.

As an example, the slope between every two adjacent points can be calculated in a clockwise or counterclockwise direction using the leftmost point on the contour as a starting point. For example, in the clockwise direction: and if the difference between a first slope between a first point on the contour and a second point adjacent to the left side of the contour and a second slope between the second point and a third point adjacent to the left side of the contour is larger than a preset threshold value, selecting the first point as a sampling point. As another example, in a counter-clockwise direction: and if the absolute value of the difference between a first slope between a first point on the outline of the character area and a second point adjacent to the right side of the first point and a second slope between the second point and a third point adjacent to the right side of the second point is larger than a preset threshold value, selecting the first point as a sampling point. This point-by-point analysis is performed until the position of the starting point is returned, so that a plurality (N, if any) of sampling points can be obtained.

For example, the above contour is obtained, that is, coordinates of each point on the contour are obtained, and a slope of a connection line between two adjacent points may be calculated according to the coordinates of each point. For example, assume that the selection of sample points is done in a clockwise direction. Three points in sequence on the contour are shown in fig. 5, where P1, P2, and P3 represent the first, second, and third points, respectively. If the coordinates of the first point P1 are (x1, y1), the coordinates of the second point P2 adjacent to the first point P1 are (x2, y2), and the coordinates of the third point P3 adjacent to the second point P2 are (x3, y3), the first slope of the connecting line between the first point P1 and the second point P2 is k1 ═ (y2-y1)/(x2-x1), and the second slope of the connecting line between the second point P2 and the third point P3 is k2 ═ y3-y2)/(x3-x 2. If the absolute value of the difference between the first slope k1 and the second slope k2 is greater than the preset threshold, the first point P1 is taken as a sampling point. The process of selecting the sampling point in the counterclockwise direction is similar to the process of sampling in the clockwise direction in fig. 5, and is not described herein again to avoid repetition.

It can be understood that the absolute value of the difference between the first slope k1 and the second slope k2 is greater than the preset threshold, indicating that the slope change is large when proceeding from the third point P3 to the first point P1, in other words, the profile starts to become unsmooth at the first point, thus selecting the first point as one of the sampling points.

Alternatively, whether to select the first point as one of the sampling points may also be determined based on a ratio of the first slope k1 to the second slope k 2. For example, a ratio of the first slope k1 to the second slope k2 may be calculated, an absolute value of a difference between the ratio and 1 may be calculated, and if the absolute value is greater than another preset threshold, the first point may be selected as one of the sampling points.

The preset threshold and the another preset threshold may be preset according to scene requirements, precision requirements, and the like, for example, if the precision requirement is high, the preset threshold and the another preset threshold may be set to be smaller; if the precision requirement is low, the preset threshold and another preset threshold can be set to be larger.

In addition, it should be understood that a plurality of sampling points, i.e., coordinates of the plurality of sampling points, are obtained in S2201, and the coordinates of the plurality of sampling points may be stored in a list or the like, for example.

Specifically, in S2202, the text region may be divided into several sub-regions according to the positional relationship between the plurality of sampling points. Illustratively, the number of sampling points contained by different sub-regions may not be equal.

As an example, regarding the outline of the character region shown in fig. 3, fig. 6 shows that the character region shown in fig. 3 is divided into 4 sub-regions, and as shown in fig. 6(a) to 6(d), the first sub-region, the second sub-region, the third sub-region, and the fourth sub-region are sequentially shown. Wherein, the first sub-region in fig. 6(a) can be understood as including four sides, and the upper side, the left side and the lower side all include at least two sampling points, and optionally, may also include more than two sampling points; the right side includes two samples, i.e. the number of samples on the right side cannot be more than two. Wherein, in the second sub-region in fig. 6(b) and the third sub-region in fig. 6(c), the upper side and the lower side may include at least two sampling points, but both the left side and the right side include only two sampling points. In the fourth sub-region in fig. 6(d), the upper side, the lower side and the right side may include at least two sampling points, but the left side includes only two sampling points.

It can be understood that S2202 employs the idea of "curving in a straight line" such that each sub-region approximates a rectangular box in the form of a straight line. That is, the side of each sub-region shown in fig. 6 may be a straight line side, or may be a broken line side (which is close to a straight line) of a plurality of sampling points.

Specifically, in S2203, a feature vector of each sub-region may be obtained for each sub-region. Illustratively, the feature vectors of the sub-regions can be obtained by a feature extraction method.

Specifically, in S230, the feature vectors of the sub-regions may be spliced along the width direction of the vector to obtain the feature vector of the text region.

In general, a feature vector has three dimensions, a width (W) dimension, a height (H) dimension, and a Channel (CH) dimension. Stitching along the width direction in S230 means stitching along the width dimension of the feature vector.

Illustratively, the feature vectors of the various sub-regions may have the same Channel (CH) dimension, with the same or different width (W) dimensions. And splicing along the W dimension to obtain the feature vectors of the character region, wherein the feature vectors of the character region and the feature vectors of each sub-region have the same CH dimension, and the W dimension of the feature vectors of the character region is the sum of the W dimensions of the feature vectors of each sub-region.

Specifically, in S240, the feature vector of the text region may be input to a text recognition module of the neural network, so as to obtain a recognition result. For example, for the example shown in fig. 3, the recognition result may be a DOME.

Similarly, for the example shown in FIG. 3, "Le", "du", and "MARAIS" may also be identified.

Therefore, when character recognition is carried out, the character area is divided into a plurality of sub-areas, the characteristic vector of the character area is obtained based on the characteristic vector of each sub-area, and the curve characteristic of the bent characters is considered, so that the obtained character recognition result is more accurate. And moreover, the output of the character detection module is better utilized, when the region division is carried out, the curve characteristic of the outline of the character region is considered, the points on the outline are sampled, the region division is carried out based on the sampling points, the accuracy of the region division can be ensured, the reliability of the character recognition result is further ensured, and the recognition effect is better. In addition, the character detection and recognition method of the embodiment of the invention can be used for recognizing characters with any shapes, and is not limited to linear type, curve type and the like.

In the embodiment of the invention, a neural network is used during character recognition, and the neural network comprises a character detection module, a feature extraction module and a character recognition module, as shown in fig. 7. The text detection module can output text areas including contour information thereof, i.e., coordinate positions of points on the contour. The feature extraction module may output a feature vector for the text region. The text recognition module may output the recognized text.

In the training of the neural network, the training data in the training data set may include labeling information, and the labeling information may include: the character region comprises the coordinate position of the character region where the characters are located, the characteristic vector of each sub-region, the characteristic vector of the character region obtained by splicing along the W dimension, and the corresponding characters. In addition, the respective training data may have the same size, e.g., 256 × 256; or each training data is input to the neural network after passing through the resize (resize) operation and the normalization (normalization) operation.

FIG. 8 is a schematic block diagram of an apparatus for text recognition in accordance with an embodiment of the present invention. The apparatus 70 shown in fig. 8 may include a text detection unit 710, a feature extraction unit 720, and a text recognition unit 730.

A character detection unit 710, configured to obtain a character region where characters in the target image are located;

the feature extraction unit 720 is configured to divide the text region into a plurality of sub-regions, determine a feature vector of each of the plurality of sub-regions, and obtain the feature vector of the text region according to the feature vector of each of the plurality of sub-regions;

and a character recognition unit 730, configured to perform recognition based on the feature vector of the character region, so as to obtain a recognition result of the characters in the target image.

Exemplarily, the text detection unit 710 may be specifically configured to: and inputting the target image to a character detection module of a neural network to obtain the character area.

The apparatus 70 may further include a training unit configured to obtain the neural network through training based on a training data set, where the neural network includes a text detection module, a feature extraction module, and a text recognition module.

Exemplarily, the feature extraction unit 720 may include a region division sub-unit, a first feature determination sub-unit, and a second feature determination sub-unit. The region dividing unit is used for dividing the character region into a plurality of sub-regions. The first feature determination subunit is configured to determine a feature vector for each of the plurality of sub-regions. The second characteristic determining subunit is configured to obtain a characteristic vector of the text region according to the characteristic vector of each of the plurality of sub-regions.

Exemplarily, the region dividing subunit is specifically configured to: sampling points on the outline of the character area to obtain a plurality of sampling points; and dividing the text area into a plurality of sub-areas according to the plurality of sampling points.

Exemplarily, the region dividing subunit is specifically configured to: and uniformly sampling points on the outline of the character area to obtain a plurality of sampling points.

Exemplarily, the region dividing subunit is specifically configured to: and sampling is carried out based on the curvature change between points on the outline of the character area to obtain a plurality of sampling points.

Exemplarily, the region dividing subunit is specifically configured to: and if the absolute value of the difference between a first slope between a first point on the outline of the text area and a second point adjacent to the left side of the outline of the text area and a second slope between the second point and a third point adjacent to the left side of the outline of the text area is greater than a preset threshold value, selecting the first point as a sampling point. Or, the region dividing subunit is specifically configured to: and if the absolute value of the difference between a first slope between a first point on the outline of the character area and a second point adjacent to the right side of the first point and a second slope between the second point and a third point adjacent to the right side of the second point is larger than a preset threshold value, selecting the first point as a sampling point.

Exemplarily, the region dividing subunit is specifically configured to: sampling is performed in a clockwise direction or a counterclockwise direction, starting from a specific point on the contour.

Exemplarily, the second characteristic determining subunit is specifically configured to: and splicing the characteristic vectors of the sub-regions along the width direction of the vector to obtain the characteristic vector of the character region.

The apparatus 70 shown in fig. 8 can implement the method for text recognition shown in fig. 2 or fig. 4, and is not described here again to avoid repetition.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In addition, another apparatus for text recognition is provided in an embodiment of the present invention, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of the method for text recognition shown in fig. 2 or fig. 4 when executing the program.

As shown in fig. 9, the apparatus 80 may include a memory 810 and a processor 820. The memory 810 stores computer program code for implementing corresponding steps in a method for text recognition according to an embodiment of the present invention. The processor 820 is used to run the computer program code stored in the memory 810 to perform the corresponding steps of the method for text recognition according to the embodiment of the present invention, and to implement the respective modules in the apparatus described in fig. 8 according to the embodiment of the present invention.

Illustratively, the computer program code when executed by the processor 820 performs the steps of: acquiring a character area where characters in a target image are located; dividing the text area into a plurality of sub-areas, and determining a characteristic vector of each sub-area in the plurality of sub-areas; obtaining a feature vector of the character area according to the feature vector of each sub-area in the sub-areas; and identifying based on the feature vector of the character area to obtain an identification result of the characters in the target image.

In addition, an embodiment of the present invention further provides an electronic device, which may include the apparatus 70 shown in fig. 8. The electronic device can implement the method for character recognition shown in fig. 2 or fig. 4. As an example, the electronic device 70 may be the electronic device shown in fig. 1.

In addition, the embodiment of the invention also provides a computer storage medium, and the computer storage medium is stored with the computer program. The computer program, when executed by a processor, may implement the steps of the method for text recognition illustrated in fig. 2 or fig. 4, as previously described. For example, the computer storage medium is a computer-readable storage medium.

The computer storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer readable storage medium may be any combination of one or more computer readable storage media, such as containing computer readable program code for text recognition.

In addition, an embodiment of the present invention further provides a computer program or a computer program product, which includes computer program code, and when the program code is executed, the computer program causes the computer to implement the foregoing steps of the method for character recognition shown in fig. 2 or fig. 4.

Illustratively, the computer program code, when executed, enables a computer to: acquiring a character area where characters in a target image are located; dividing the text area into a plurality of sub-areas, and determining a characteristic vector of each sub-area in the plurality of sub-areas; obtaining a feature vector of the character area according to the feature vector of each sub-area in the sub-areas; and identifying based on the feature vector of the character area to obtain an identification result of the characters in the target image.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some of the modules in an item analysis apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for text recognition, the method comprising:

acquiring a character area where characters in a target image are located;

2. The method of claim 1, wherein the dividing the text region into a number of sub-regions comprises:

sampling points on the outline of the character area to obtain a plurality of sampling points;

and dividing the text area into a plurality of sub-areas according to the plurality of sampling points.

3. The method of claim 2, wherein sampling points on the outline of the text region to obtain a plurality of sampling points comprises:

and uniformly sampling points on the outline of the character area to obtain a plurality of sampling points.

4. The method of claim 2, wherein sampling points on the outline of the text region to obtain a plurality of sampling points comprises:

and sampling is carried out based on the curvature change between points on the outline of the character area to obtain a plurality of sampling points.

5. The method of claim 4, wherein sampling based on curvature changes between points on the outline of the text region comprises:

selecting a first point as a sampling point if the absolute value of the difference between a first slope between the first point and a second point adjacent to the left side of the first point on the outline of the text area and a second slope between the second point and a third point adjacent to the left side of the second point is greater than a preset threshold value;

alternatively, the first and second electrodes may be,

and if the absolute value of the difference between a first slope between a first point on the outline of the character area and a second point adjacent to the right side of the first point and a second slope between the second point and a third point adjacent to the right side of the second point is larger than a preset threshold value, selecting the first point as a sampling point.

6. The method of claim 2, wherein sampling points on the outline of the text region comprises:

sampling is performed in a clockwise direction or a counterclockwise direction, starting from a specific point on the contour.

7. The method according to claim 1, wherein obtaining the feature vector of the text region according to the feature vector of each of the sub-regions comprises:

and splicing the characteristic vectors of the sub-regions along the width direction of the vector to obtain the characteristic vector of the character region.

8. The method of claim 1, wherein the obtaining of the text area where the text in the target image is located comprises:

and inputting the target image to a character detection module of a neural network to obtain the character area.

9. The method according to claim 8, wherein before the obtaining of the text area where the text in the target image is located, further comprising:

and obtaining the neural network through training based on a training data set, wherein the neural network comprises a character detection module, a feature extraction module and a character recognition module.

10. An apparatus for text recognition, the apparatus comprising:

11. An apparatus for word recognition comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the steps of the method of any one of claims 1 to 9 are implemented when the computer program is executed by the processor.

12. A computer storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.