CN112257719A

CN112257719A - Character recognition method, system and storage medium

Info

Publication number: CN112257719A
Application number: CN202011099422.0A
Authority: CN
Inventors: 吴中山; 黎维春
Original assignee: Shenzhen Tianwei Big Data Technology Co ltd
Current assignee: Shenzhen Tianwei Big Data Technology Co ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-22

Abstract

The invention discloses a character recognition method, a character recognition system and a storage medium. A character recognition method comprises the following steps: acquiring a character database, and establishing a character recognition model according to the character database; acquiring and sending a character image to be recognized; carrying out region division on the character image to be recognized to obtain one or more character recognition region images; sequentially importing each character recognition area image into a character recognition model for character recognition, and generating and sending one or more character recognition results; and generating and sending a character recognition report according to the character recognition result. The invention has the advantages of effectively improving the character recognition efficiency and ensuring the accuracy of character recognition.

Description

Character recognition method, system and storage medium

Technical Field

The invention relates to the technical field of character recognition, in particular to a character recognition method, a character recognition system and a storage medium.

Background

With the rapid development of scientific technology, the character recognition technology is also rapidly developed and widely applied to various industries. People often need to process characters in pictures in the working process, and because the characters in the pictures cannot be edited, the characters of the pictures need to be recognized firstly.

The existing character recognition method generally leads the whole image into a character recognition model to realize the recognition of characters in the image, and the recognition accuracy rate of the recognition method is low; when the image is too large, the operation load of the system can be greatly increased by integrally identifying the image, so that the operation efficiency is low, and further the character identification efficiency is low.

Disclosure of Invention

In order to overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide a method, a system, and a storage medium for character recognition, which can effectively improve character recognition efficiency and ensure accuracy of character recognition.

The embodiment of the invention is realized by the following steps:

in a first aspect, an embodiment of the present invention provides a text recognition method, including the following steps:

acquiring a character database, and establishing a character recognition model according to the character database;

acquiring and sending a character image to be recognized;

carrying out region division on the character image to be recognized to obtain one or more character recognition region images;

sequentially importing each character recognition area image into a character recognition model for character recognition, and generating and sending one or more character recognition results;

and generating and sending a character recognition report according to the character recognition result.

When identifying image characters, firstly acquiring data in an existing character database, and establishing a character identification model according to the data in the character database, wherein the character identification model is a mathematical model for converting and identifying the image characters into character texts to be output according to character images and character data; after receiving a character recognition request, acquiring and sending a character image to be recognized, and then performing region division on the character image to be recognized to obtain one or more character recognition region images, so that each character recognition region image can be rapidly recognized in the following, the recognition accuracy and efficiency are improved, and the recognition operation efficiency is improved; and sequentially importing each character recognition area image into a character recognition model for character recognition, recognizing characters in each character recognition area image through the character recognition model, generating and sending one or more character recognition results, and generating and sending a character recognition report according to the character recognition results, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like.

The method divides an integral image into a plurality of areas, and then respectively identifies, thereby effectively reducing the operation load and improving the character identification efficiency and the identification accuracy.

Based on the first aspect, in some embodiments of the present invention, the method for performing area division on the text image to be recognized to obtain one or more text recognition area images includes the following steps:

extracting the character type of characters in the character image to be recognized;

and carrying out region division on the character image to be recognized according to the character type to obtain one or more character recognition region images.

Based on the first aspect, in some embodiments of the present invention, the method for generating and sending a text recognition report according to a text recognition result includes the following steps:

a1, judging whether only one character recognition result exists, if yes, entering the step A2; if not, go to step A3;

a2, marking the character recognition result as a unique recognition result, and generating and sending a character recognition report according to the unique recognition result;

and A3, integrating the multiple character recognition results according to the import sequence to obtain a complete recognition result, and generating and sending a character recognition report according to the complete recognition result.

Based on the first aspect, in some embodiments of the present invention, the text recognition method further includes the following steps:

comparing the character recognition result with the character image to be recognized, judging whether the character recognition in the character image to be recognized is complete or not, and if so, sending the character recognition result; if not, marking the unrecognized area image, and importing the unrecognized area image into a character recognition model for character recognition.

performing semantic consistency matching on characters in the character recognition result to obtain a consistency text;

and generating and sending a character recognition report according to the consistency text.

and optimizing the character image to be recognized by adopting an image clear processing method to obtain a clear character image to be recognized.

In a second aspect, an embodiment of the present invention provides a text recognition system, including a model establishing module, an image obtaining module, an area dividing module, a text recognition module, and a report generating module, where:

the model establishing module is used for acquiring a character database and establishing a character recognition model according to the character database;

the image acquisition module is used for acquiring and sending character images to be recognized;

the area division module is used for carrying out area division on the character image to be recognized so as to obtain one or more character recognition area images;

the character recognition module is used for sequentially importing each character recognition area image into a character recognition model for character recognition, and generating and sending one or more character recognition results;

and the report generating module is used for generating and sending a character recognition report according to the character recognition result.

When image characters are identified, firstly, data in an existing character database is obtained through a model building module, and a character identification model is built according to the data in the character database, wherein the character identification model is a mathematical model for converting and identifying the image characters into character texts to be output according to character images and character data; after receiving a character recognition request, acquiring and sending a character image to be recognized through an image acquisition module, and then performing region division on the character image to be recognized through a region division module to obtain one or more character recognition region images, so that each character recognition region image can be rapidly recognized in the following process, the recognition accuracy and efficiency are improved, and the recognition operation efficiency is improved; the character recognition method comprises the steps of sequentially importing each character recognition area image into a character recognition model through a character recognition module for character recognition, recognizing characters in each character recognition area image through the character recognition model, generating and sending one or more character recognition results, and generating and sending a character recognition report according to the character recognition results through a report generation module, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like.

The system divides an integral image into a plurality of areas, and then identifies the areas respectively, thereby effectively reducing the running load and improving the character identification efficiency and the identification accuracy.

Based on the second aspect, in some embodiments of the present invention, the region dividing module includes a type sub-module and a region sub-module, where:

the type submodule is used for extracting the character type of characters in the character image to be recognized;

and the region sub-module is used for performing region division on the character image to be recognized according to the character type so as to obtain one or more character recognition region images.

Based on the second aspect, in some embodiments of the present invention, the report generating module includes a determining sub-module, an identifying sub-module, and an integrating sub-module, wherein:

the judgment submodule is used for judging whether only one character recognition result exists or not, and if so, the identification submodule works; if not, integrating the sub-modules to work;

the identification submodule is used for marking the character identification result as a unique identification result, and generating and sending a character identification report according to the unique identification result;

and the integration submodule is used for integrating the plurality of character recognition results according to the import sequence to obtain a complete recognition result, and generating and sending a character recognition report according to the complete recognition result.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and the computer-executable instructions are configured to perform the above-mentioned character recognition method.

The embodiment of the invention at least has the following advantages or beneficial effects:

the embodiment of the invention provides a character recognition method, when image characters are recognized, firstly, data in the existing character database are obtained, a character recognition model is established according to the data in the character database, and the character recognition model is a mathematical model for converting and recognizing the image characters into character texts to be output according to character images and character data; after receiving a character recognition request, acquiring and sending a character image to be recognized, and then performing region division on the character image to be recognized to obtain one or more character recognition region images, so that each character recognition region image can be rapidly recognized in the following, the recognition accuracy and efficiency are improved, and the recognition operation efficiency is improved; and sequentially importing each character recognition area image into a character recognition model for character recognition, recognizing characters in each character recognition area image through the character recognition model, generating and sending one or more character recognition results, and generating and sending a character recognition report according to the character recognition results, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like. The method divides an integral image into a plurality of areas, and then respectively identifies, thereby effectively reducing the operation load and improving the character identification efficiency and the identification accuracy.

The embodiment of the invention also provides a character recognition system, when image characters are recognized, firstly, the data in the existing character database is obtained through the model building module, and a character recognition model is built according to the data in the character database, wherein the character recognition model is a mathematical model for converting and recognizing the image characters into character texts to be output according to the character images and the character data; after receiving a character recognition request, acquiring and sending a character image to be recognized through an image acquisition module, and then performing region division on the character image to be recognized through a region division module to obtain one or more character recognition region images, so that each character recognition region image can be rapidly recognized in the following process, the recognition accuracy and efficiency are improved, and the recognition operation efficiency is improved; the character recognition method comprises the steps of sequentially importing each character recognition area image into a character recognition model through a character recognition module for character recognition, recognizing characters in each character recognition area image through the character recognition model, generating and sending one or more character recognition results, and generating and sending a character recognition report according to the character recognition results through a report generation module, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like. The system divides an integral image into a plurality of areas, and then identifies the areas respectively, thereby effectively reducing the running load and improving the character identification efficiency and the identification accuracy.

The embodiment of the invention also provides a computer readable storage medium which can store computer executable instructions for executing the character recognition method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a method for recognizing characters according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating report generation in a text recognition method according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a text recognition system according to an embodiment of the present invention.

Icon: 100. a model building module; 200. an image acquisition module; 300. a region dividing module; 310. a type submodule; 320. a region submodule; 400. a character recognition module; 500. a report generation module; 510. a judgment submodule; 520. identifying a submodule; 530. and integrating the submodules.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the same element.

In the description of the embodiments of the present invention, "a plurality" represents at least 2.

Examples

As shown in fig. 1, in a first aspect, an embodiment of the present invention provides a text recognition method, including the following steps:

s1, acquiring a character database, and establishing a character recognition model according to the character database;

s2, acquiring and sending a character image to be recognized;

s3, carrying out region division on the character image to be recognized to obtain one or more character recognition region images;

s4, sequentially importing each character recognition area image into a character recognition model for character recognition, and generating and sending one or more character recognition results;

and S5, generating and sending a character recognition report according to the character recognition result.

When identifying image characters, firstly acquiring data in an existing character database, and establishing a character identification model according to the data in the character database, wherein the character identification model is a mathematical model for converting and identifying the image characters into character texts to be output according to character images and character data; after receiving a character recognition request, acquiring and sending a character image to be recognized, and then performing region division on the character image to be recognized to obtain one or more character recognition region images, so that each character recognition region image can be rapidly recognized in the following, the recognition accuracy and efficiency are improved, and the recognition operation efficiency is improved; and sequentially importing each character recognition area image into a character recognition model for character recognition, importing the images from top to bottom and from left to right when importing the images, recognizing characters in each character recognition area image through the character recognition model, generating and sending one or more character recognition results, and generating and sending a character recognition report according to the character recognition results, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like.

When the character image to be recognized is recognized, firstly, performing primary processing on the character image to be recognized, wherein the primary processing refers to extracting character types of characters in the character image to be recognized, the character types comprise handwriting types, machine typing types, different language types and the like, after the character types of the characters in the character image to be recognized are extracted, performing region division on the character image to be recognized according to the different character types to obtain one or more character recognition region images so as to recognize each character recognition region image subsequently.

When generating the final recognition report, firstly, after obtaining a character recognition result, judging whether only one character recognition result exists, if so, marking the character recognition result as a unique recognition result, and generating and sending a character recognition report according to the unique recognition result; and if a plurality of character recognition results exist, integrating the plurality of character recognition results according to an import sequence, namely integrating the plurality of character recognition results according to an import-first output sequence to obtain a complete recognition result, and generating and sending a character recognition report according to the complete recognition result, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like.

After the character recognition result is obtained, in order to ensure that the character image to be recognized is completely recognized, comparing the image of the character recognition result with the character image to be recognized, judging whether the character in the character image to be recognized is completely recognized or not, and if so, sending the character recognition result; if the image is not completely recognized, and part or all of the images are not recognized, the image of the unrecognized area is marked, then the image of the unrecognized area is led into the character recognition model for character recognition again, and the process of the method is not finished until the recognition is complete.

After the character recognition result is obtained, in order to ensure the continuity of the semantic meaning of the text and facilitate the reading and checking of subsequent users, a semantic analysis method is adopted to carry out semantic analysis on the characters in the character recognition result, then semantic consistency matching is carried out according to the semantic analysis result to ensure the semantic meaning of the character text to obtain a continuity text, and then a character recognition report is generated and sent according to the continuity text, wherein the character recognition report comprises character text information, text segmentation adjustment information and the like.

In order to ensure that the images are identified more efficiently subsequently, after the character images to be identified are obtained, the character images to be identified are optimized by adopting a pixel-level optimization method or an image restoration method in an image definition processing method so as to improve the definition of the images and obtain the clear character images to be identified.

In a second aspect, an embodiment of the present invention provides a text recognition system, including a model building module 100, an image obtaining module 200, an area dividing module 300, a text recognition module 400, and a report generating module 500, where:

a model establishing module 100, configured to obtain a character database, and establish a character recognition model according to the character database;

the image acquisition module 200 is used for acquiring and sending character images to be recognized;

the region dividing module 300 is configured to perform region division on the text image to be recognized to obtain one or more text recognition region images;

the character recognition module 400 is used for sequentially importing each character recognition area image into a character recognition model for character recognition, and generating and sending one or more character recognition results;

and a report generating module 500, configured to generate and send a text recognition report according to the text recognition result.

When identifying image characters, firstly, acquiring data in an existing character database through the model establishing module 100, and establishing a character identification model according to the data in the character database, wherein the character identification model is a mathematical model for converting and identifying the image characters into character texts to be output according to character images and character data; after receiving a character recognition request, acquiring and sending a character image to be recognized through the image acquisition module 200, and then performing region division on the character image to be recognized through the region division module 300 to obtain one or more character recognition region images, so that each character recognition region image can be rapidly recognized in the following process, the recognition accuracy and efficiency are improved, and the recognition operation efficiency is improved; the character recognition module 400 sequentially imports each character recognition area image into a character recognition model for character recognition, imports the images from top to bottom and from left to right in the import process, recognizes characters in each character recognition area image through the character recognition model, generates and sends one or more character recognition results, and then generates and sends a character recognition report according to the character recognition results through the report generation module 500, wherein the character recognition report comprises character text information, text paragraph information, character text image information, character type information and the like.

Based on the second aspect, in some embodiments of the present invention, the region dividing module 300 includes a type sub-module 310 and a region sub-module 320, where:

the type sub-module 310 is used for extracting the character type of the characters in the character image to be recognized;

the area sub-module 320 is configured to perform area division on the text image to be recognized according to the text type to obtain one or more text recognition area images.

When the character image to be recognized is recognized, firstly, the character image to be recognized is primarily processed through the type submodule 310, wherein the primary processing refers to extracting character types of characters in the character image to be recognized, and the character types comprise handwriting types, machine typing types, different language types and the like; after the character types of the characters in the character image to be recognized are extracted, the area sub-module 320 performs area division on the character image to be recognized according to different character types to obtain one or more character recognition area images, so as to respectively recognize each character recognition area image in the following.

Based on the second aspect, in some embodiments of the present invention, the report generating module 500 includes a determining sub-module 510, an identifying sub-module 520, and an integrating sub-module 530, wherein:

the judgment sub-module 510 is configured to judge whether there is only one character recognition result, and if yes, the identification sub-module 520 works; if not, the integration sub-module 530 works;

the identification submodule 520 is used for marking the character identification result as a unique identification result, and generating and sending a character identification report according to the unique identification result;

and an integrating sub-module 530, configured to integrate the multiple character recognition results according to the import order to obtain a complete recognition result, and generate and send a character recognition report according to the complete recognition result.

When generating the final recognition report, firstly, after obtaining the character recognition result, judging whether only one character recognition result exists through the judging submodule 510, if only one character recognition result exists, marking the character recognition result as the unique recognition result through the identifying submodule 520, and generating and sending the character recognition report according to the unique recognition result; if there are multiple character recognition results, the integration sub-module 530 integrates the multiple character recognition results according to the import sequence, that is, according to the import-first-output sequence, to obtain a complete recognition result, and generates and sends a character recognition report according to the complete recognition result, where the character recognition report includes character text information, text paragraph information, character text image information, character type information, and the like.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A character recognition method is characterized by comprising the following steps:

acquiring and sending a character image to be recognized;

2. The method of claim 1, wherein the method of dividing the text image to be recognized into regions to obtain one or more text recognition region images comprises the following steps:

3. The method of claim 1, wherein the method of generating and sending a text recognition report according to the text recognition result comprises the following steps:

4. The character recognition method of claim 1, further comprising the steps of:

5. The method of claim 1, wherein the method of generating and sending a text recognition report according to the text recognition result comprises the following steps:

6. The character recognition method of claim 1, further comprising the steps of:

7. A character recognition system is characterized by comprising a model establishing module, an image obtaining module, an area dividing module, a character recognition module and a report generating module, wherein:

8. The word recognition system of claim 7, wherein the region partitioning module comprises a type sub-module and a region sub-module, wherein:

9. The word recognition system of claim 7, wherein the report generation module comprises a determination sub-module, an identification sub-module, and an integration sub-module, wherein:

10. A computer-readable storage medium having computer-executable instructions stored thereon for performing the method of text recognition according to any one of claims 1-6.