WO2022160598A1

WO2022160598A1 - Text recognition method and device, and storage medium

Info

Publication number: WO2022160598A1
Application number: PCT/CN2021/103787
Authority: WO
Inventors: 蔡晓聪; 侯军; 伊帅
Original assignee: 北京市商汤科技开发有限公司
Priority date: 2021-01-29
Filing date: 2021-06-30
Publication date: 2022-08-04
Also published as: CN112800972A

Abstract

A text recognition method and device, and a storage medium. The method comprises: obtaining a text image comprising a text to be recognized and other texts (101); obtaining a category determination result of each character of the text image on the basis of a feature sequence corresponding to the text image (102), the category determination result being used for representing a character category; and on the basis of the category determination result, determining a target text recognition result of performing text recognition on the text to be recognized (103).

Description

Character recognition method and device, storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This disclosure claims the priority of the Chinese patent application filed on January 29, 2021 with the application number of 202110127630.5 and the invention titled "character recognition method and device, storage medium", the entire contents of which are disclosed by reference manner is incorporated herein.

technical field

The present disclosure relates to the field of computer vision, and in particular, to a character recognition method and device, and a storage medium.

Background technique

Character recognition in different application scenarios has become a major research direction in computer vision and intelligent video analysis.

However, when performing text recognition, if the collected text images include not only the text to be recognized, but also other texts, the recognition accuracy is likely to drop.

SUMMARY OF THE INVENTION

The present disclosure provides a character recognition method and device, and a storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a method for character recognition, the method comprising: acquiring a character image including characters to be recognized and other characters; The category judgment result of each character, the category judgment result is used to represent the character category; based on the category judgment result, the target character recognition result for character recognition of the to-be-recognized character is determined.

In some optional embodiments, the method further includes: determining a candidate area where the to-be-recognized character and/or the other characters in the character image are located; dividing the candidate area into a plurality of sub-areas; feature information corresponding to at least part of the sub-regions in the plurality of sub-regions, to determine the feature sequence corresponding to the text image.

In some optional embodiments, the obtaining the category judgment result of each character in the text image based on the feature sequence corresponding to the text image includes: determining the text based on the feature sequence corresponding to the text image At least one candidate character category to which each character included in the image belongs and the recognition rate of each candidate character category; for each character, the maximum recognition rate in the at least one candidate character category to which the character belongs corresponds to The candidate character category of , as the category judgment result of the character.

In some optional embodiments, the determining a target character recognition result for performing character recognition on the character to be recognized based on the category judgment result includes: for each character, according to the character category and character structure Determine the character structure corresponding to the candidate character category of the maximum recognition rate to which the character belongs; according to the candidate character category of the maximum recognition rate to which the character belongs, determine that the character belongs to the target corresponding to the text to be recognized character or an irrelevant character corresponding to the other characters; the character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the character to be recognized.

In some optional embodiments, determining that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other character according to the candidate character category with the maximum recognition rate to which the character belongs, including: In response to determining that the candidate character class with the highest recognition rate to which the character belongs is one of a plurality of first character classes or a plurality of second character classes, it is determined that the character belongs to the target character; in response to determining that the character belongs to the largest character class The candidate character class for the recognition rate is the third character class, which is determined to belong to the irrelevant character.

In some optional embodiments, the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.

In some optional embodiments, the determining the feature sequence corresponding to the text image includes: using the text image as an input of a target neural network for character category judgment on characters, and obtaining an output of the target neural network The feature sequence corresponding to the text image.

In some optional embodiments, the method further includes: acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; wherein the first text language is the Identify the text language corresponding to the text, and the second text language is a text language different from the first text language; use the sample text image as the input of the preset neural network, and use the character category label in the sample text image For supervision, the preset neural network is trained to obtain a target neural network for character category judgment on characters.

In some optional embodiments, the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes: acquiring a first text image that includes the text corresponding to the first text language candidate text image; acquiring candidate text corpus corresponding to the at least one second text language; generating the sample text image based on the candidate text corpus and the first candidate text image.

In some optional embodiments, the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes: acquiring a first text image that includes the text corresponding to the first text language A candidate text image and a second candidate text image including text corresponding to the at least one second text language; based on the first candidate text image and the second candidate text image, the sample text is generated image.

In some optional embodiments, the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.

In some optional embodiments, the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes the target data required when applying for the electronic visa. text image; the obtaining the category judgment result of each character in the text image based on the feature sequence corresponding to the text image includes: determining the target data based on the feature sequence corresponding to the text image of the target data Each character in the text image belongs to a target character corresponding to the first text language, or belongs to an irrelevant character corresponding to other characters; the determining the target character recognition result of performing character recognition on the to-be-recognized character includes: determining the target character recognition result for the character to be recognized. The target character recognition result of the character recognition performed by the target character in the text image of the target data; the method further includes: issuing the electronic visa based on the target character recognition result.

According to a second aspect of the embodiments of the present disclosure, there is provided a character recognition device, comprising: an image acquisition module for acquiring text images including characters to be recognized and other characters; a character category determination module for corresponding The feature sequence is obtained, and the category judgment result of each character in the text image is obtained, and the category judgment result is used to characterize the character category; the character recognition module is used for determining the to-be-recognized text based on the category judgment result. The target text recognition result of text recognition.

In some optional embodiments, the apparatus further includes: a region determination module, configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located; a division module, configured to The candidate region is divided into multiple sub-regions; the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.

In some optional embodiments, the character category determination module includes: a first determination submodule, configured to determine at least one candidate to which each character included in the text image belongs based on a feature sequence corresponding to the text image The character category and the recognition rate of each candidate character category; the second determination submodule is used for, for each character, the candidate character corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs category, as the category judgment result of the character.

In some optional embodiments, the character recognition module includes: a third determination submodule, configured to, for each character, determine the maximum value of the character to which the character belongs according to the correspondence between the character category and the character structure. The character structure corresponding to the candidate character category of the recognition rate; the fourth determination submodule is used to determine that the character belongs to the target character corresponding to the character to be recognized or belongs to the target character corresponding to the character to be recognized according to the candidate character category of the maximum recognition rate to which the character belongs. The irrelevant characters corresponding to the other characters are described; the fifth determination sub-module is configured to use the character structure corresponding to the target character as the target character recognition result of performing character recognition on the character to be recognized.

In some optional embodiments, the fifth determination sub-module includes: a first determination unit, configured to respond to determining that the candidate character category with the maximum recognition rate to which the character belongs is a plurality of first character categories or a plurality of first character categories One of the two character categories, determining that the character belongs to the target character; a second determining unit, configured to determine that the character belongs to the irrelevant characters.

In some optional embodiments, the feature sequence determination module includes: a sixth determination submodule, configured to use the text image as an input of a target neural network for character category judgment on characters, and obtain the target neural network The feature sequence corresponding to the text image output by the network.

In some optional embodiments, the apparatus further includes: a sample text image acquisition module, configured to acquire a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; The first text language is the text language corresponding to the text to be recognized, and the second text language is a text language different from the first text language; the training module is used to use the sample text image as a preset neural network. Input, with the character category label in the sample text image as supervision, the preset neural network is trained to obtain a target neural network for judging the character category of the characters.

In some optional embodiments, the sample text image acquisition module includes: a first acquisition sub-module for acquiring a first candidate text image including text corresponding to the first text language; a second acquisition sub-module for using for acquiring the candidate text corpus corresponding to the at least one second text language; a first generating submodule is configured to generate the sample text image based on the candidate text corpus and the first candidate text image.

In some optional embodiments, the sample character image obtaining module includes: a third obtaining sub-module, configured to obtain a first candidate character image including characters corresponding to the first character language and a first candidate character image including the at least one first character image. The second candidate text image of the text corresponding to the two text languages; the second generation sub-module is configured to generate the sample text image based on the first candidate text image and the second candidate text image.

In some optional embodiments, the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes the target data required when applying for the electronic visa. text image; the character category determination module includes: a seventh determination sub-module for determining, based on the feature sequence corresponding to the text image of the target data, that each character in the text image of the target data belongs to the first text language Corresponding target characters, or irrelevant characters belonging to other characters; the character recognition module includes: an eighth determination sub-module, used to determine the target character recognition for character recognition of the target characters in the character image of the target data Result; the apparatus further includes: an execution module, configured to issue the electronic visa based on the target character recognition result.

According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is configured to execute the character recognition method according to any one of the above-mentioned first aspect.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a character recognition device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the memory stored in the memory The executable instructions of the first aspect implement the character recognition method described in any one of the first aspects.

The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

In the embodiment of the present disclosure, for a text image that includes both the text to be recognized and other texts, the character category judgment can be performed on each character in the text image, so that based on the category judgment result, it can be determined in the text image including multiple languages. The characters corresponding to the characters to be recognized and the irrelevant characters corresponding to other characters are filtered out, and the characters corresponding to the characters to be recognized are subjected to character recognition to obtain a target character recognition result. The present disclosure performs character classification judgment on the text to be recognized and other texts, so as to filter out irrelevant characters corresponding to other texts before performing text recognition on the text to be recognized, so as to reduce the probability of misjudging other texts as texts to be recognized. In the text images of different language languages, the accuracy of text recognition of the text to be recognized is improved.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

FIG. 1 is a flowchart of a method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 2 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 3A is a schematic diagram of a scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.

FIG. 3B is a schematic diagram of another scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.

FIG. 3C is a schematic diagram of another scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.

FIG. 4 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 5 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 6 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 7 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 8 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.

FIG. 9A is a schematic structural diagram corresponding to a character recognition process according to an exemplary embodiment of the present disclosure.

FIG. 9B is a schematic diagram of a determination feature sequence according to an exemplary embodiment of the present disclosure.

FIG. 10 is a block diagram of a character recognition apparatus according to an exemplary embodiment of the present disclosure.

FIG. 11 is a schematic structural diagram of a character recognition device according to an exemplary embodiment of the present disclosure.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining."

At present, if the text image contains both the text to be recognized and other text, the text recognition can be performed by ignoring other text, that is, during the training of the text recognition model, only the text label corresponding to the text to be recognized is included in the sample text image. However, in the judgment process of the text recognition model obtained in this way, it is easy to misjudge other texts as the text to be recognized, and the accuracy cannot be guaranteed.

In order to solve the above problem, an embodiment of the present disclosure provides a character recognition solution. For a character image including the character to be recognized and other characters at the same time, character category judgment can be performed on each character in the character image, so that based on the category judgment result, the The target character recognition result of performing character recognition on the characters to be recognized.

For example, as shown in FIG. 1, FIG. 1 shows a character recognition method according to an exemplary embodiment, including the following steps:

In step 101, a text image including the text to be recognized and other texts is acquired.

In the embodiment of the present disclosure, a text image including the text to be recognized can be acquired through cameras deployed in different application scenarios. The different application scenarios include but are not limited to signboard text recognition scenarios, license plate recognition scenarios, bill recognition scenarios, and the like. Correspondingly, the acquired text images may include, but are not limited to, signboards, license plates, bills, and the like written in the first text language corresponding to the text to be recognized. In addition, in the embodiment of the present disclosure, the acquired text image also includes text content written in a second text language corresponding to other text, and the second text language includes but is not limited to a text language different from the first text language.

In the embodiment of the present disclosure, the text content written in the second text language may be the same, at least partially the same, or different from the text content written in the first text language.

For example, if the first text language is Thai, and the second text language is English, the acquired text image includes a signboard written in Thai, and also includes the same signboard content written in English. For another example, the first text language is Thai, the second text language is Chinese, and the acquired text image includes the content of the receipt written in Thai, and also includes part of the content of the receipt written in Chinese. For another example, the first text language is English, the second text language is Chinese, and the acquired text image includes text content written in Thai, and also includes completely different text content written in Chinese.

In step 102, based on the feature sequence corresponding to the character image, a result of determining the category of each character in the character image is obtained.

In this embodiment of the present disclosure, the number of feature sequences corresponding to the text image may be one or more, and each feature sequence may be composed of at least part of the feature information included in the candidate region where the text to be recognized and/or other texts are located in the text image constitute.

Wherein, the candidate area is the area where the character to be recognized and/or other characters may be located determined in the character image. The candidate region can be divided into multiple sub-regions again, and at least part of the feature information included in the candidate region can be composed of feature information corresponding to at least some of the sub-regions, and the feature information corresponding to at least some of the sub-regions refers to all the features corresponding to at least some of the sub-regions. information. For example, the multiple sub-regions include sub-region 1, sub-region 2, and sub-region 3, and at least part of the feature information included in the candidate region may be composed of all the feature information of sub-region 1 and sub-region 2.

In the embodiment of the present disclosure, further, the category judgment result for each character in the text image may be determined according to the feature sequence corresponding to the text image. The category judgment result can be used to characterize the character category.

In the embodiment of the present disclosure, a corresponding first character category may be determined for each character included in the first text language in advance, and a corresponding second character category may be determined for each Arabic numeral. All characters included in the literal language define the same one third character class. The first text language may be the text language corresponding to the text to be recognized, each character included in the first text language may refer to each letter element and each punctuation element included in the first text language, and the second text language is different A textual language in the first textual language.

For example, if the first text language is English, then the 26 letters (case sensitive) included in English and each letter and each punctuation mark in the English punctuation mark may correspond to a first character category. Arabic numerals 0 to 9 each correspond to a second character class. The second text language is any text language other than English, and it is assumed that it can include Chinese, Thai, Arabic, Korean, etc. All characters included in all the second text languages correspond to the same third character category.

In step 103, based on the category judgment result, a target character recognition result for performing character recognition on the to-be-recognized character is determined.

In the embodiment of the present disclosure, based on the above category judgment results, it is possible to determine the target characters corresponding to the characters to be recognized and the irrelevant characters belonging to other characters, filter out the irrelevant characters, and finally obtain only the characters corresponding to the characters to be recognized. The character structure of the target character is obtained, that is, the target character recognition result of performing character recognition on the character to be recognized is obtained.

In the above-mentioned embodiment, for a text image that includes both the text to be recognized and other texts, the character category judgment can be performed on each character in the text image, so that based on the category judgment result, the text image including multiple languages is determined. Characters corresponding to characters and irrelevant characters corresponding to other characters are identified, the irrelevant characters are filtered out, and the characters corresponding to the characters to be recognized are subjected to character recognition to obtain a target character recognition result. The present disclosure performs character classification judgment on the text to be recognized and other texts, so as to filter out irrelevant characters corresponding to other texts before performing text recognition on the text to be recognized, so as to reduce the probability of misjudging other texts as texts to be recognized. In the text images of different language languages, the accuracy of text recognition of the text to be recognized is improved.

In some optional embodiments, such as shown in FIG. 2 , the above method may further include steps 104 to 106:

In step 104, a candidate region where the to-be-recognized text and/or the other text is located in the text image is determined.

Wherein, the candidate area is the area where the character to be recognized and/or the other character may be located determined in the character image.

In one example, a region prediction network (Region Proposal Network, RPN) may be used to determine a candidate region where the to-be-recognized text and/or the other text may be located in the text image.

In step 105, the candidate area is divided into a plurality of sub-areas.

In the embodiment of the present disclosure, after the candidate area where the character to be recognized and/or the other character is located is determined, the candidate area may be divided into a plurality of sub-areas, and the size of each sub-area may be the same or different.

In an example, the candidate region may be divided evenly according to a preset number to obtain multiple sub-regions with the same size. For example, as shown in FIG. 3A , the candidate region is divided into three sub-regions with the same size.

In another example, the candidate region may be divided according to the preset same size, so as to obtain N sub-regions with the same size, or (N-1) sub-regions with the same size and a size with the same size as other sub-regions may be obtained Different sub-regions, such as those shown in Figure 3B. The obtained sub-regions 1 to 3 have the same size, and the size of the sub-region 4 is different from that of the other three sub-regions.

In another example, the candidate region may be divided according to a preset sequence of multiple different sizes. For example, as shown in FIG. 3C , three sub-regions with different sizes may be obtained.

In step 106, a feature sequence corresponding to the character image is determined based on feature information corresponding to at least part of the sub-regions in the plurality of sub-regions.

In the embodiment of the present disclosure, based on the feature map corresponding to the text image, feature information corresponding to each sub-region included in the candidate region can be determined. Based on the feature information corresponding to at least some of the sub-regions, that is, according to all the feature information corresponding to some or all of the multiple sub-regions, the feature sequence corresponding to the text image is obtained.

In an example, all feature information corresponding to each sub-region may correspond to one feature sequence, or all feature information corresponding to multiple sub-regions may correspond to one feature sequence, or all feature information corresponding to each sub-region may correspond to multiple feature sequences . The present disclosure does not limit this.

In another example, the order in which each sub-region appears in the text image may be determined first according to the writing order of the text, for example, the order from left to right. Further, after the feature sequence is determined according to the feature information corresponding to at least part of the sub-regions, the feature sequences are sorted according to the order in which the corresponding sub-regions appear in the text image, for example, the feature sequence that appears in the leftmost sub-region of the text image corresponds. The feature sequence is ranked first, and the feature sequence corresponding to the sub-region that appears in the rightmost sub-region of the text image is ranked last. After sorting and combining multiple feature sequences, the feature sequence corresponding to the text image is obtained.

For example, in the order from left to right, the candidate area is divided into sub-area 1, sub-area 2 and sub-area 3, at least part of the area includes sub-area 2 and sub-area 3, where sub-area 2 corresponds to feature sequences 2 and 3, sub-area 3 3 corresponds to the feature sequence 4, then the feature sequences corresponding to the text images obtained after sorting are the feature sequence 2, the feature sequence 3 and the feature sequence 4. In another example, a corresponding feature sequence may be obtained after processing, such as pooling and/or sampling, on feature information corresponding to at least part of the sub-regions. Through pooling and/or sampling, the feature information corresponding to the part with obvious features in each sub-region can be selected to determine the feature sequence. While ensuring the accuracy of the obtained feature sequence, the accuracy of determining the feature sequence corresponding to the text image can be improved. efficiency, thereby improving the efficiency of character recognition of the characters to be recognized.

In this embodiment of the present disclosure, after the feature sequence corresponding to the text image is determined, step 102 may be executed to determine the category for character category judgment for each character in the text image based on the feature sequence corresponding to the text image. critical result.

In the above-mentioned embodiment, the candidate area in which the character to be recognized and/or the other character is located in the character image may be divided into multiple sub-areas, and the corresponding feature information of all or part of the multiple sub-areas may be determined. Describe the feature sequence corresponding to the text image. In order to subsequently determine the category judgment result of character category judgment for each character in the text image based on the feature sequence corresponding to the text image, the implementation is simple and the usability is high.

In some optional embodiments, such as shown in FIG. 4 , step 102 may include step 102-1 and step 102-2.

In step 102-1, based on the feature sequence corresponding to the text image, at least one candidate character category to which each character included in the text image belongs and the recognition rate of each candidate character category are determined.

In one example, the feature sequence corresponding to the text image can be used as the input of the classifier, and the classification prediction result output by the classifier can be obtained, and the classification prediction result includes but is not limited to at least one device belonging to each character included in the text image. The selected character category, and the recognition rate corresponding to each candidate character category, that is, the probability probability value of each character belonging to the candidate character category.

For example, the text image includes 2 characters, the first character corresponds to 2 candidate character categories, and the second character corresponds to 3 candidate character categories. Among them, the possibility probability value of the first character belonging to candidate character category 1 is a, that is, the recognition rate corresponding to candidate character category 1 is a, and the probability probability value of belonging to candidate character category 2 is b, that is, the candidate character category 2 has a probability probability value of b. The recognition rate corresponding to character category 2 is b. The possibility probability values of the second character belonging to candidate character category 3, candidate character category 4, and candidate character category 5 are c, d, and e, respectively, that is, candidate character category 3, candidate character category 4, and candidate character category 5. The recognition rates of character category 5 are c, d, and e, respectively.

In step 102-2, for each character, the candidate character category corresponding to the maximum recognition rate among the at least one candidate character category to which the character belongs is used as the category judgment result of the character.

In the embodiment of the present disclosure, in order to facilitate the subsequent determination of the target character recognition result, the candidate character category corresponding to the maximum recognition rate in at least one candidate character category to which a certain character belongs may be used as the category judgment result of the character.

For example, a certain character included in the text image corresponds to two candidate character categories. Wherein, the recognition rate of the character belonging to candidate character category 1 is a, and the recognition rate of the character belonging to candidate character category 2 is b. If a is greater than b, then candidate character category 1 can be used as the category judgment result corresponding to the character.

In the above embodiment, the candidate character category to which each character included in the text image may belong and the recognition rate of each candidate character category may be determined based on the feature sequence corresponding to the text image, so that the maximum recognition rate in the candidate character category can be determined. The corresponding candidate character category is used as the category judgment result of the character category judgment for the character, and based on the category judgment result, the target character belonging to the character to be recognized and the irrelevant characters belonging to other characters can be determined later, so as to filter out the irrelevant characters, The accuracy rate of text recognition for the text to be recognized in a text image mixed with multiple text languages is improved.

In some optional embodiments, such as shown in FIG. 5 , step 103 may include step 103-1 and step 103-3.

In step 103-1, for each character, according to the correspondence between the character category and the character structure, the character structure corresponding to the candidate character category with the highest recognition rate to which the character belongs is determined.

In the embodiment of the present disclosure, different character classes and corresponding character structures are preset, for example, the character structure corresponding to character class 1 is 'a', the character structure corresponding to character class 2 is 'b', and so on. The character structure corresponding to the candidate character category with the highest recognition rate to which each character belongs may be determined based on the previously determined category judgment result and the above-mentioned corresponding relationship.

In the embodiment of the present disclosure, each character included in the first character language corresponds to a different first character category, and each first character category corresponds to a different character structure. Different Arabic numerals correspond to different second character categories, and these second character categories also correspond to different character structures, such as character structures '0', '1', and so on. All characters included in multiple second script languages may correspond to the same third character category, and this third character category may correspond to the same character structure. For example, multiple second script languages include Chinese, Arabic, Thai, etc. All characters included in the second script language may correspond to a third character category, assuming a character category 70, this character category 70 may correspond to the same character structure, for example, all correspond to the Chinese character structure 'ah'.

Of course, the above-mentioned first text language is the text language corresponding to the text to be recognized, and other text languages other than the first text language can be used as the second text language.

In the embodiment of the present disclosure, according to the above-mentioned correspondence, the character structure corresponding to the candidate character category with the maximum recognition rate to which each character belongs can be determined.

For example, a text image includes 4 characters, and the candidate character categories of the maximum recognition rate to which each character belongs are 1, 2, 3, and 70 in sequence. c. Ah.

In step 103-2, according to the candidate character category of the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other characters.

In this embodiment of the present disclosure, if it is determined that the candidate character category with the highest recognition rate to which a certain character belongs is one of multiple first character categories or multiple second character categories, then it can be determined that the character belongs to the to-be-recognized character category. The target character corresponding to the text. Wherein, the plurality of first character categories include: character categories corresponding to a plurality of characters included in the first character language, the first character language is the character language corresponding to the character to be recognized, and the plurality of second character categories include: and Character categories corresponding to multiple Arabic numerals.

If it is determined that the candidate character category with the highest recognition rate to which a certain character belongs is the third character category, then it can be determined that the character belongs to the irrelevant character corresponding to the other characters.

For example, the first character language is English, the plurality of first character classes include character classes 1 to 59, the plurality of second character classes corresponding to Arabic numerals include character classes 60 to 69, the third character class includes character class 70, and in the text image Including 4 characters, the candidate character categories of the maximum recognition rate to which each character belongs are 1, 2, 3, and 70 in sequence, then it can be determined that the first 3 characters belong to the target character, and the last character belongs to the irrelevant character.

In step 103-3, the character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the to-be-recognized character.

It has been determined before that the character structures corresponding to the four characters included in the text image are a, b, c, ah, and the last character is an irrelevant character. The character structure, thereby obtaining the target character recognition result, for example, the obtained target character recognition result is 'ab c'.

In one example, a preset program can be called to filter out the character structure corresponding to the irrelevant characters, so as to obtain the character structure corresponding to the target character. Wherein, the preset program may be a pre-written program for filtering the specified character structure. For example, if the character structure is specified as 'ah', the preset program can filter the character structure 'ah', so as to obtain the character structure of the target character corresponding to the character to be recognized.

In the above embodiment, based on the category judgment result, it can be determined whether each character in the text image belongs to the target character corresponding to the text to be recognized, or belongs to the irrelevant character corresponding to the other text, so that the character structure corresponding to the irrelevant character can be filtered out. , only the character structure of the target character corresponding to the character to be recognized is retained, and the target character recognition result of the character recognition of the character to be recognized is obtained, which improves the ability of character recognition of the character to be recognized in the character image mixed with multiple language languages. Accuracy.

In some optional embodiments, for the above step 102, the text image can be directly used as the input of the target neural network, and the feature sequence corresponding to the text image output by the target neural network can be obtained. Wherein, the target neural network is a neural network used for character category judgment on characters.

In the embodiment of the present disclosure, the target neural network is obtained by training based on the preset neural network, and the corresponding feature sequence can be determined from the text image. The preset neural network includes but is not limited to a computer vision group (Visual Geometry Group, VGG) network, a Google network (GoogLeNet) residual network (Resnet), and the like.

In the above-mentioned embodiment, the text image can be used as the input of the target neural network for judging the character category of the characters, so as to obtain the feature sequence corresponding to the text image output by the target neural network, and subsequently determine the text image based on the feature sequence corresponding to the text image. The character category corresponding to each included character is included, so that text recognition can be performed on the to-be-recognized text in the text image, which improves the accuracy of character recognition on the to-be-recognized text.

In some optional embodiments, such as shown in FIG. 6 (FIG. 6 is only an exemplary illustration, in practical applications, the execution order of the following steps 100-1 to 100-2 may not be limited to be executed before step 101), the above The method may further include step 100-1 and step 100-2.

In step 100-1, a sample text image including both the text corresponding to the first text language and the text corresponding to the second text language is acquired.

In the embodiment of the present disclosure, the above-mentioned sample text images can be directly obtained from the sample image database.

In step 100-2, the sample text image is used as the input of the preset neural network, and the character category label in the sample text image is used as the supervision to train the preset neural network to obtain the input of the preset neural network. The target neural network for character class judgment.

In the embodiment of the present disclosure, the character category label in the sample text image includes at least one of the following: at least one of the multiple first character category labels corresponding to the multiple characters included in the first text language; multiple Arabic numerals respectively at least one of the corresponding plurality of second character category labels; the same third character category label corresponding to the plurality of characters included in the plurality of second text languages.

In the embodiment of the present disclosure, a connectionist Temporal Classification (Connectionist Temporal Classification, CTC) supervised training method may be used to train a preset neural network, thereby obtaining a target neural network. Among them, the CTC supervised training method means that the neural network directly learns the input sequence without having to mark the mapping relationship between the input sequence and the output result in the training data in advance.

In the embodiment of the present disclosure, the preset neural network outputs the character category included in the sample text image, and the loss function is determined according to the difference between the output result of the preset neural network and the character category label in the sample text image, and the gradient of the network parameter is adopted. In the way of backpropagation, the preset neural network is iteratively trained to obtain the target neural network.

In the above embodiment, a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language can be obtained, and the sample text image includes a variety of character category labels. , the target neural network for character category judgment is obtained, which improves the accuracy and robustness of the target neural network.

In some optional embodiments, considering that the number of sample text images may be small, in order to ensure the accuracy and robustness of the target neural network, any one or a combination of the following methods may be used to obtain the sample text images.

In a first manner, a sample text image is generated based on a first candidate text image including text corresponding to the first text language and a candidate text corpus corresponding to the second text language.

For example, as shown in FIG. 7 , step 100 - 1 may include the following steps 201 to 203 .

In step 201, a first candidate text image including text corresponding to the first text language is acquired.

In this embodiment of the present disclosure, a first candidate text image that only includes text corresponding to the first text language may be acquired. The first text language is the text language corresponding to the text to be recognized. For example, if the text to be recognized is English, the first text language is English. If the text to be recognized is Thai, the first text language is Thai.

In step 202, the candidate text corpus corresponding to the at least one second text language is acquired.

The candidate text corpus is a sample corpus corresponding to at least one second text language, and the second text language is a text language different from the first text language. For example, the first text language is Thai, then Chinese and Arabic other than Thai , Korean, etc. can be used as the second text language.

The candidate text corpus includes, but is not limited to, multiple characters and multiple character strings composed of characters. In addition, the candidate text corpus may also include multiple characters (each character may consist of at least one character or at least one character string). ), a plurality of words (each word may be composed of at least one word and/or at least one character), and a plurality of sentences (each sentence may be composed of at least one word and/or word).

Wherein, the characters, words and/or sentences in the candidate text corpus may have semantics or no semantics, which is not limited in the present disclosure. Semantic means have linguistic meaning, such as stating one thing, describing a thing, etc., without semantic means having no linguistic meaning, for example, when multiple characters are combined to form a trademark (logo) or license plate, the combination of multiple characters is combined. Does not have any linguistic meaning.

In step 203, the sample text image is generated based on the candidate text corpus and the first candidate text image.

In the embodiment of the present disclosure, the foreground content and background content included in the first candidate text image can be obtained respectively, and the candidate text corpus and the foreground content included in the first candidate text image can be combined to obtain the sample text image. For the foreground content, the background content included in the first candidate text image is used as the background content of the sample text image, thereby generating the sample text image.

Wherein, the foreground content includes text written in the first text language, and the combination of the foreground content and the alternative text corpus includes, but is not limited to, in the case of ensuring that the foreground content and the text content of the alternative text corpus do not overlap, making the two parts of the text content in different relative positions. Relative positions include, but are not limited to, where one is positioned above, below, to the left, to the right of the other, and the like.

In the second manner, a first candidate text image including the text corresponding to the first text language and a second candidate text image including the text corresponding to the second text language are respectively acquired, thereby generating a sample text image.

For example, as shown in FIG. 8 , step 100 - 1 may include the following steps 301 to 302 .

In step 301, a first candidate text image including text corresponding to the first text language and a second candidate text image including text corresponding to the at least one second text language are acquired.

In step 302, the sample text image is generated based on the first candidate text image and the second candidate text image.

In the embodiment of the present disclosure, the foreground content included in the first candidate text image and the foreground content included in the second candidate text image can be obtained respectively, and the foreground content corresponding to the sample text image is obtained by combining the two foreground contents. The foreground content included in the first candidate text image includes text written in the first text language, the foreground content included in the second candidate text image includes text written in the second text language, and the combination of the two foreground content includes but does not It is limited to make the text content of the two parts in different relative positions under the condition that the text content of the two parts does not overlap.

The background content included in the first candidate text image or the background content included in the second candidate text image may be used as the background content corresponding to the sample text image, or a preset background image may also be used as the background content corresponding to the sample text image.

In this embodiment of the present disclosure, the background image may include, but is not limited to, different pre-set solid-color background images, background images with different background content, and the background content may be real objects, scenery, and the like.

In an implementation manner, the background images may be obtained in a corresponding manner based on the number of background images. For example, if the number of preset background images is large, at least one of the preset background images may be obtained by random sampling. . Specifically, the number of background images can be determined according to the order of magnitude corresponding to the number of background images, or according to the number interval to which the number of background images belongs, or according to the size relationship between the number of background images and the number threshold. Wherein, the order of magnitude, the division of the quantity interval, and the setting of the quantity threshold can be obtained based on the empirical value when obtaining the first candidate text image or the second candidate text image, which is not limited herein.

If the number of preset background images is small, you can randomly select a part of the background images from the existing background image database, or if there is no background image database, you can randomly combine different areas of the existing background images to obtain multiple background images. image, so as to ensure the diversity of the final sample text images.

In the embodiment of the present disclosure, after the foreground content and the background content of the sample text image are determined, the sample text image can be generated.

In the above embodiment, a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language can be obtained, which solves the problem of difficulty in obtaining the sample text image, so that the accuracy of the target neural network can be improved in the follow-up. and robustness.

In some optional embodiments, the first text language is Thai, the second text language is English, and the application scenario is a parking lot as an example. After collecting the text image of the license plate content including Thai, Arabic numerals and English, it is necessary to perform character recognition on the Thai and Arabic numerals. Among them, the license plate contents corresponding to Thai and Arabic numerals belong to the characters to be recognized, and the license plate contents corresponding to English belong to other characters.

In the embodiment of the present disclosure, the collected text image of the license plate can be used as the input of the target neural network, and the feature sequence corresponding to the text image output by the target neural network can be obtained, and then the feature sequence can be used as the input of the classifier. The controller determines at least one candidate character category to which each character included in the text image belongs and a recognition rate corresponding to each candidate character category.

Based on the above results output by the classifier, the candidate character category corresponding to the maximum recognition rate in at least one candidate character category of each character is used as the category judgment result of character category judgment for each character in the text image. Further, according to the correspondence between the character category and the character structure, the character structure corresponding to the candidate character category corresponding to the maximum recognition rate to which each character belongs is determined, and the candidate character corresponding to the maximum recognition rate to which each character belongs is determined. After determining the target characters belonging to Thai or Arabic numerals, and the irrelevant characters belonging to English, filter out the character structures corresponding to irrelevant characters from the above character structures to obtain the character structures corresponding to Thai and Arabic numerals, and finally get the pair The target character recognition result of character recognition performed on Thai characters and Arabic numerals in the character image.

For vehicles entering and leaving the parking lot, the license plates include Thai, Arabic numerals and English, and the purpose of character recognition of Thai and Arabic numerals is realized, and misjudgment is not easy to occur, and the recognition accuracy is improved.

In the implementation process, firstly, the camera deployed at the entrance and exit of the parking lot can collect the text images of the license plates including Thai, Arabic numerals and English of the vehicles entering and leaving the entrance and exit of the parking lot. It should be noted that the acquisition method of the text image may include, but is not limited to, frame selection of the video stream collected by the camera. For example, a periodic or aperiodic frame selection operation can be performed on the video stream to obtain a text image obtained by photographing the license plate of the same vehicle including Thai, Arabic numerals and English in one or more frames. In the frame selection process, one or more factors that affect the quality and/or recognition accuracy of the text image, such as the shooting angle, imaging clarity, and imaging brightness, can be considered, so as to obtain the text image input to the target neural network. Wherein, the same text image of the license plate including Thai, Arabic numerals and English input to the target neural network may include one or more, which is not limited herein. In the case of including one sheet, the recognition result of this sheet can be used as the final recognition result, and in the case of including multiple sheets, the recognition result of each sheet can be comprehensively considered, or the recognition of some text images can be comprehensively considered. As a result, in order to obtain the final recognition result, or one can be selected from multiple text images to obtain the final recognition result based on this text image, the specific implementation method is not limited here, and may include but not Limited to the cases exemplified above.

For example, as shown in FIG. 9A , in the application scenario provided by the present disclosure, the text image is a text image of a license plate including Thai, Arabic numerals and English, and the target neural network first determines the candidates where Thai, Arabic numerals and/or English characters are located. For example, as shown in FIG. 9B , it is assumed that 2 candidate regions are obtained, and the candidate region 1 is divided into 8 sub-regions as an example (the present disclosure uses 8 as an example for exemplary illustration, and the number of feature sequences obtained in practical applications can be less than 8 or more than 8), each sub-region can correspondingly obtain a feature sequence, for example, as shown in FIG. 9B . Similarly, at least one feature sequence (not shown in FIG. 9B ) can also be obtained for candidate region 2, and the combination of feature sequences corresponding to all subregions obtained by dividing the two candidate subregions is used as the feature corresponding to the text image sequence.

After obtaining the feature sequence corresponding to the text image output by the target neural network, at least one candidate character category corresponding to each character included in the text image and the recognition rate corresponding to each candidate character category can be obtained through the classifier. In the embodiment of the present disclosure, the candidate character category with the maximum recognition rate may be used as the category judgment result.

Further, according to the category judgment result, the target characters belonging to Thai and Arabic numerals and irrelevant characters belonging to English can be determined, and the character structure corresponding to the target characters is used as the target character recognition result, that is, character recognition is performed for the Thai characters and Arabic numerals therein. The target text recognition result.

In the embodiment of the present disclosure, the above-mentioned target neural network may be obtained after training the preset neural network.

In the process of training the target neural network, the sample text images can be obtained through the existing first candidate text images including Thai text and English corpus.

Alternatively, a first candidate text image including only Thai text and a second candidate text image including only English text may be separately acquired, and a sample text image is generated based on the first candidate text image and the second candidate text image.

The sample text image is used as the input of the preset neural network, and the various character labels in the sample text image are used as supervision, and the required target neural network is obtained through the CTC supervision training method. Wherein, the character category label in the sample text image includes at least one of the following: at least one of the plurality of first character category labels corresponding to the plurality of characters included in the first text language, that is, the plurality of first character category labels corresponding to the Thai characters respectively at least one of the character class labels; at least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second text languages, that is, English The same third character label corresponding to the character.

In the above embodiment, a large amount of sample training data can be obtained to meet the training accuracy requirements of the preset neural network, improve the robustness of the target neural network, and have high versatility, which can be quickly deployed to any device to realize text recognition. the goal of.

In some optional embodiments, the character recognition solution provided by the present disclosure can be used in scenarios such as signboard character recognition, bill recognition, and the above-mentioned license plate recognition. In the embodiment of the present disclosure, the character recognition scheme can also be used to issue an electronic visa.

During the issuance process of the electronic visa, the user who needs to apply for the electronic visa uploads the target data required for applying for the electronic visa, the target data includes but is not limited to at least one of the following: electronic data including at least one of round-trip flight information and hotel information. Forms, ticket information for round-trip flights, successful reservation information issued by the hotel, passport, income proof, medical examination information, and other information required for electronic visa application. After uploading the above target information, the user needs to manually check the information content in order to issue an electronic visa.

In the embodiment of the present disclosure, the user can upload the text image of the target data, and the electronic visa system can first determine the feature sequence corresponding to the text image of each target data according to the text recognition scheme provided by the present disclosure, and further, based on the feature sequence , determining that each character in the text image of the target data belongs to the target character corresponding to the first text language or to an irrelevant character corresponding to other texts except the first text language. The irrelevant characters in the text image of the target data are filtered out, and text recognition is performed on the text to be recognized in the text image of the target data, so as to obtain the target text recognition result. Wherein, the characters to be recognized include characters corresponding to the first character language, and the first character language is the character language corresponding to the electronic visa.

For example, an electronic visa will be issued in English. The electronic visa system can determine, in the text image of the target data uploaded by the user, that each character belongs to an English character or an irrelevant character corresponding to other characters. The English characters in the text image are recognized, and the target text recognition result is obtained.

Further, the electronic visa system can issue the electronic visa based on the target character recognition result. For example, the electronic visa system verifies that the user meets the conditions for issuing an electronic visa based on the target character recognition result, and automatically issues an electronic visa for the user.

In the above-mentioned embodiment, irrelevant characters corresponding to other characters can be filtered out in the text image of the target data required when applying for an electronic visa, and the text corresponding to the electronic visa in the text image of the target data can be recognized, which improves the issuance of electronic visas. accuracy, timeliness, and high availability.

Corresponding to the foregoing method embodiments, the present disclosure also provides device embodiments.

As shown in FIG. 10, FIG. 10 is a block diagram of a character recognition apparatus shown in the present disclosure according to an exemplary embodiment. The apparatus includes: an image acquisition module 410, configured to acquire a character image including characters to be recognized and other characters; a character category The determination module 420 is used to obtain the category judgment result of each character in the text image based on the feature sequence corresponding to the text image, and the category judgment result is used to characterize the character category; the text recognition module 430 is used to The category judgment result is used to determine the target character recognition result for character recognition of the to-be-recognized character.

For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place , or distributed to multiple network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present disclosure. Those of ordinary skill in the art can understand and implement it without creative effort.

An embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute any one of the above-described character recognition methods.

In some optional embodiments, embodiments of the present disclosure provide a computer program product, comprising computer-readable code, when the computer-readable code is executed on a device, the processor in the device executes any of the above implementations. The example provides instructions for the text recognition method.

In some optional embodiments, the embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, and when the instructions are executed, the computer executes the character recognition method provided by any of the foregoing embodiments.

The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

An embodiment of the present disclosure further provides a character recognition device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the above A method of character recognition as described in one item.

FIG. 11 is a schematic diagram of a hardware structure of a character recognition device provided by an embodiment of the present disclosure. The character recognition device 510 includes a processor 511 , and may further include an input device 512 , an output device 513 and a memory 514 . The input device 512, the output device 513, the memory 514 and the processor 511 are connected to each other through a bus.

Memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or portable Read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.

Input means are used for inputting data and/or signals, and output means are used for outputting data and/or signals. The output device and the input device can be independent devices or an integral device.

The processor may include one or more processors, such as one or more central processing units (CPUs). In the case where the processor is a CPU, the CPU may be a single-core CPU, or a Multi-core CPU.

Memory is used to store program codes and data for network devices.

The processor is configured to call the program code and data in the memory to execute the steps in the above method embodiments. For details, refer to the description in the method embodiment, which is not repeated here.

It can be understood that FIG. 11 only shows a simplified design of a character recognition device. In practical applications, the character recognition device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all the character recognition devices that can implement the embodiments of the present disclosure All fall within the protection scope of the present disclosure.

Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the present disclosure. within the scope of protection.

Claims

A text recognition method, comprising:

Obtain text images including text to be recognized and other text;

Based on the feature sequence corresponding to the text image, a category judgment result of each character in the text image is obtained, and the category judgment result is used to characterize the character category;

Based on the category judgment result, a target character recognition result for performing character recognition on the to-be-recognized character is determined.
The method of claim 1, further comprising:

determining the candidate area where the to-be-recognized text and/or the other text in the text image are located;

dividing the candidate area into a plurality of sub-areas;

A feature sequence corresponding to the character image is determined based on feature information corresponding to at least part of the sub-regions in the plurality of sub-regions.
The method according to claim 1 or 2, wherein the obtaining a category judgment result of each character in the text image based on the feature sequence corresponding to the text image, comprising:

determining at least one candidate character category to which each character included in the text image belongs and the recognition rate of each candidate character category based on the feature sequence corresponding to the text image;

For each of the characters,

The candidate character category corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs is taken as the category judgment result of the character.
The method according to claim 3, wherein the determining a target character recognition result for performing character recognition on the to-be-recognized character based on the category judgment result comprises:

For each of the characters,

According to the correspondence between the character category and the character structure, determine the character structure corresponding to the candidate character category with the maximum recognition rate to which the character belongs;

According to the candidate character category of the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or to an irrelevant character corresponding to the other character;

The character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the to-be-recognized character.
The method according to claim 4, wherein, according to the candidate character category with the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other characters ,include:

determining that the character belongs to the target character in response to determining that the character class with the highest recognition rate to which the character belongs is one of a plurality of first character classes or a plurality of second character classes;

The character is determined to belong to the irrelevant character in response to determining that the highest recognition rate candidate character class to which the character belongs is the third character class.
The method of claim 5, wherein,

The plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized;

The plurality of second character categories include: character categories corresponding to the plurality of Arabic numerals;

The third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein the second script languages are different script languages than the first script languages.
The method according to any one of claims 2-6, wherein the determining the feature sequence corresponding to the text image comprises:

The character image is used as the input of a target neural network for character classification judgment of characters, and a feature sequence corresponding to the character image output by the target neural network is obtained.
The method according to any one of claims 1-7, further comprising:

Obtaining a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; wherein the first text language is the text language corresponding to the to-be-recognized text, and the second text language is a script language different from the first script language;

The sample text image is used as the input of the preset neural network, and the character category label in the sample text image is used as supervision, and the preset neural network is trained to obtain the target neural network used to judge the character category of the character. network.
The method according to claim 8, wherein the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes at least one of the following:

acquiring a first candidate text image including text corresponding to the first text language; acquiring candidate text corpus corresponding to the at least one second text language; based on the candidate text corpus and the first candidate text image, generating the sample text image;

acquiring a first candidate text image including text corresponding to the first text language and a second candidate text image including text corresponding to the at least one second text language; based on the first candidate text image and all The second candidate text image is described, and the sample text image is generated.
The method according to claim 8 or 9, wherein the character category label in the sample text image includes at least one of the following:

at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language;

at least one of the plurality of second character category labels corresponding to the plurality of Arabic numerals respectively;

The same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
The method according to any one of claims 1-10, wherein,

The text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes a text image of the target data required when applying for the electronic visa;

According to the feature sequence corresponding to the text image, the category judgment result of each character in the text image is obtained, including:

Based on the feature sequence corresponding to the text image of the target data, it is determined that each character in the text image of the target data belongs to the target character corresponding to the first text language, or belongs to an irrelevant character corresponding to other texts;

The determining of the target character recognition result of performing character recognition on the to-be-recognized character includes:

Determine the target character recognition result of performing character recognition on the target character in the text image of the target data;

The method also includes:

Based on the target character recognition result, the electronic visa is issued.
A character recognition device, comprising:

An image acquisition module, used to acquire text images including text to be recognized and other text;

A character category determination module, configured to obtain a category judgment result of each character in the text image based on the feature sequence corresponding to the text image, and the category judgment result is used to characterize the character category;

A character recognition module, configured to determine a target character recognition result for character recognition of the to-be-recognized character based on the category judgment result.
A character recognition device, comprising:

processor;

a memory for storing the processor-executable instructions;

Wherein, the processor is configured to invoke the executable instructions stored in the memory to implement the character recognition method according to any one of claims 1-10.
A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the character recognition method according to any one of the preceding claims 1-10.