WO2022160598A1 - Text recognition method and device, and storage medium - Google Patents

Text recognition method and device, and storage medium Download PDF

Info

Publication number
WO2022160598A1
WO2022160598A1 PCT/CN2021/103787 CN2021103787W WO2022160598A1 WO 2022160598 A1 WO2022160598 A1 WO 2022160598A1 CN 2021103787 W CN2021103787 W CN 2021103787W WO 2022160598 A1 WO2022160598 A1 WO 2022160598A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
text
category
candidate
language
Prior art date
Application number
PCT/CN2021/103787
Other languages
French (fr)
Chinese (zh)
Inventor
蔡晓聪
侯军
伊帅
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2022160598A1 publication Critical patent/WO2022160598A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to the field of computer vision, and in particular, to a character recognition method and device, and a storage medium.
  • the recognition accuracy is likely to drop.
  • the present disclosure provides a character recognition method and device, and a storage medium.
  • a method for character recognition comprising: acquiring a character image including characters to be recognized and other characters; The category judgment result of each character, the category judgment result is used to represent the character category; based on the category judgment result, the target character recognition result for character recognition of the to-be-recognized character is determined.
  • the method further includes: determining a candidate area where the to-be-recognized character and/or the other characters in the character image are located; dividing the candidate area into a plurality of sub-areas; feature information corresponding to at least part of the sub-regions in the plurality of sub-regions, to determine the feature sequence corresponding to the text image.
  • the obtaining the category judgment result of each character in the text image based on the feature sequence corresponding to the text image includes: determining the text based on the feature sequence corresponding to the text image At least one candidate character category to which each character included in the image belongs and the recognition rate of each candidate character category; for each character, the maximum recognition rate in the at least one candidate character category to which the character belongs corresponds to The candidate character category of , as the category judgment result of the character.
  • the determining a target character recognition result for performing character recognition on the character to be recognized based on the category judgment result includes: for each character, according to the character category and character structure Determine the character structure corresponding to the candidate character category of the maximum recognition rate to which the character belongs; according to the candidate character category of the maximum recognition rate to which the character belongs, determine that the character belongs to the target corresponding to the text to be recognized character or an irrelevant character corresponding to the other characters; the character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the character to be recognized.
  • determining that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other character according to the candidate character category with the maximum recognition rate to which the character belongs including: In response to determining that the candidate character class with the highest recognition rate to which the character belongs is one of a plurality of first character classes or a plurality of second character classes, it is determined that the character belongs to the target character; in response to determining that the character belongs to the largest character class
  • the candidate character class for the recognition rate is the third character class, which is determined to belong to the irrelevant character.
  • the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.
  • the determining the feature sequence corresponding to the text image includes: using the text image as an input of a target neural network for character category judgment on characters, and obtaining an output of the target neural network The feature sequence corresponding to the text image.
  • the method further includes: acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; wherein the first text language is the Identify the text language corresponding to the text, and the second text language is a text language different from the first text language; use the sample text image as the input of the preset neural network, and use the character category label in the sample text image
  • the preset neural network is trained to obtain a target neural network for character category judgment on characters.
  • the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes: acquiring a first text image that includes the text corresponding to the first text language candidate text image; acquiring candidate text corpus corresponding to the at least one second text language; generating the sample text image based on the candidate text corpus and the first candidate text image.
  • the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes: acquiring a first text image that includes the text corresponding to the first text language A candidate text image and a second candidate text image including text corresponding to the at least one second text language; based on the first candidate text image and the second candidate text image, the sample text is generated image.
  • the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
  • the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa;
  • the text image includes the target data required when applying for the electronic visa.
  • the obtaining the category judgment result of each character in the text image based on the feature sequence corresponding to the text image includes: determining the target data based on the feature sequence corresponding to the text image of the target data
  • Each character in the text image belongs to a target character corresponding to the first text language, or belongs to an irrelevant character corresponding to other characters;
  • the determining the target character recognition result of performing character recognition on the to-be-recognized character includes: determining the target character recognition result for the character to be recognized.
  • the target character recognition result of the character recognition performed by the target character in the text image of the target data; the method further includes: issuing the electronic visa based on the target character recognition result.
  • a character recognition device comprising: an image acquisition module for acquiring text images including characters to be recognized and other characters; a character category determination module for corresponding The feature sequence is obtained, and the category judgment result of each character in the text image is obtained, and the category judgment result is used to characterize the character category; the character recognition module is used for determining the to-be-recognized text based on the category judgment result.
  • the target text recognition result of text recognition is used for determining the to-be-recognized text based on the category judgment result.
  • the apparatus further includes: a region determination module, configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located; a division module, configured to The candidate region is divided into multiple sub-regions; the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.
  • a region determination module configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located
  • a division module configured to The candidate region is divided into multiple sub-regions
  • the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.
  • the character category determination module includes: a first determination submodule, configured to determine at least one candidate to which each character included in the text image belongs based on a feature sequence corresponding to the text image The character category and the recognition rate of each candidate character category; the second determination submodule is used for, for each character, the candidate character corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs category, as the category judgment result of the character.
  • the character recognition module includes: a third determination submodule, configured to, for each character, determine the maximum value of the character to which the character belongs according to the correspondence between the character category and the character structure.
  • the character structure corresponding to the candidate character category of the recognition rate is used to determine that the character belongs to the target character corresponding to the character to be recognized or belongs to the target character corresponding to the character to be recognized according to the candidate character category of the maximum recognition rate to which the character belongs.
  • the irrelevant characters corresponding to the other characters are described; the fifth determination sub-module is configured to use the character structure corresponding to the target character as the target character recognition result of performing character recognition on the character to be recognized.
  • the fifth determination sub-module includes: a first determination unit, configured to respond to determining that the candidate character category with the maximum recognition rate to which the character belongs is a plurality of first character categories or a plurality of first character categories One of the two character categories, determining that the character belongs to the target character; a second determining unit, configured to determine that the character belongs to the irrelevant characters.
  • the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.
  • the feature sequence determination module includes: a sixth determination submodule, configured to use the text image as an input of a target neural network for character category judgment on characters, and obtain the target neural network The feature sequence corresponding to the text image output by the network.
  • the apparatus further includes: a sample text image acquisition module, configured to acquire a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language;
  • the first text language is the text language corresponding to the text to be recognized, and the second text language is a text language different from the first text language;
  • the training module is used to use the sample text image as a preset neural network.
  • the sample text image acquisition module includes: a first acquisition sub-module for acquiring a first candidate text image including text corresponding to the first text language; a second acquisition sub-module for using for acquiring the candidate text corpus corresponding to the at least one second text language; a first generating submodule is configured to generate the sample text image based on the candidate text corpus and the first candidate text image.
  • the sample character image obtaining module includes: a third obtaining sub-module, configured to obtain a first candidate character image including characters corresponding to the first character language and a first candidate character image including the at least one first character image.
  • the second candidate text image of the text corresponding to the two text languages; the second generation sub-module is configured to generate the sample text image based on the first candidate text image and the second candidate text image.
  • the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
  • the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa;
  • the text image includes the target data required when applying for the electronic visa.
  • the character category determination module includes: a seventh determination sub-module for determining, based on the feature sequence corresponding to the text image of the target data, that each character in the text image of the target data belongs to the first text language Corresponding target characters, or irrelevant characters belonging to other characters;
  • the character recognition module includes: an eighth determination sub-module, used to determine the target character recognition for character recognition of the target characters in the character image of the target data Result;
  • the apparatus further includes: an execution module, configured to issue the electronic visa based on the target character recognition result.
  • a computer-readable storage medium where the storage medium stores a computer program, and the computer program is configured to execute the character recognition method according to any one of the above-mentioned first aspect.
  • a character recognition device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the memory stored in the memory
  • the executable instructions of the first aspect implement the character recognition method described in any one of the first aspects.
  • the character category judgment can be performed on each character in the text image, so that based on the category judgment result, it can be determined in the text image including multiple languages.
  • the characters corresponding to the characters to be recognized and the irrelevant characters corresponding to other characters are filtered out, and the characters corresponding to the characters to be recognized are subjected to character recognition to obtain a target character recognition result.
  • the present disclosure performs character classification judgment on the text to be recognized and other texts, so as to filter out irrelevant characters corresponding to other texts before performing text recognition on the text to be recognized, so as to reduce the probability of misjudging other texts as texts to be recognized.
  • the accuracy of text recognition of the text to be recognized is improved.
  • FIG. 1 is a flowchart of a method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 2 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 3A is a schematic diagram of a scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.
  • FIG. 3B is a schematic diagram of another scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.
  • FIG. 3C is a schematic diagram of another scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.
  • FIG. 4 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 5 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 6 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 7 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 8 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
  • FIG. 9A is a schematic structural diagram corresponding to a character recognition process according to an exemplary embodiment of the present disclosure.
  • FIG. 9B is a schematic diagram of a determination feature sequence according to an exemplary embodiment of the present disclosure.
  • FIG. 10 is a block diagram of a character recognition apparatus according to an exemplary embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of a character recognition device according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word “if” as used herein may be interpreted as "at the time of” or "when” or "in response to determining.”
  • the text recognition can be performed by ignoring other text, that is, during the training of the text recognition model, only the text label corresponding to the text to be recognized is included in the sample text image.
  • the judgment process of the text recognition model obtained in this way it is easy to misjudge other texts as the text to be recognized, and the accuracy cannot be guaranteed.
  • an embodiment of the present disclosure provides a character recognition solution.
  • character category judgment can be performed on each character in the character image, so that based on the category judgment result, the The target character recognition result of performing character recognition on the characters to be recognized.
  • FIG. 1 shows a character recognition method according to an exemplary embodiment, including the following steps:
  • step 101 a text image including the text to be recognized and other texts is acquired.
  • a text image including the text to be recognized can be acquired through cameras deployed in different application scenarios.
  • the different application scenarios include but are not limited to signboard text recognition scenarios, license plate recognition scenarios, bill recognition scenarios, and the like.
  • the acquired text images may include, but are not limited to, signboards, license plates, bills, and the like written in the first text language corresponding to the text to be recognized.
  • the acquired text image also includes text content written in a second text language corresponding to other text, and the second text language includes but is not limited to a text language different from the first text language.
  • the text content written in the second text language may be the same, at least partially the same, or different from the text content written in the first text language.
  • the acquired text image includes a signboard written in Thai, and also includes the same signboard content written in English.
  • the first text language is Thai
  • the second text language is Chinese
  • the acquired text image includes the content of the receipt written in Thai, and also includes part of the content of the receipt written in Chinese.
  • the first text language is English
  • the second text language is Chinese
  • the acquired text image includes text content written in Thai, and also includes completely different text content written in Chinese.
  • step 102 based on the feature sequence corresponding to the character image, a result of determining the category of each character in the character image is obtained.
  • the number of feature sequences corresponding to the text image may be one or more, and each feature sequence may be composed of at least part of the feature information included in the candidate region where the text to be recognized and/or other texts are located in the text image constitute.
  • the candidate area is the area where the character to be recognized and/or other characters may be located determined in the character image.
  • the candidate region can be divided into multiple sub-regions again, and at least part of the feature information included in the candidate region can be composed of feature information corresponding to at least some of the sub-regions, and the feature information corresponding to at least some of the sub-regions refers to all the features corresponding to at least some of the sub-regions. information.
  • the multiple sub-regions include sub-region 1, sub-region 2, and sub-region 3, and at least part of the feature information included in the candidate region may be composed of all the feature information of sub-region 1 and sub-region 2.
  • the category judgment result for each character in the text image may be determined according to the feature sequence corresponding to the text image.
  • the category judgment result can be used to characterize the character category.
  • a corresponding first character category may be determined for each character included in the first text language in advance, and a corresponding second character category may be determined for each Arabic numeral. All characters included in the literal language define the same one third character class.
  • the first text language may be the text language corresponding to the text to be recognized, each character included in the first text language may refer to each letter element and each punctuation element included in the first text language, and the second text language is different A textual language in the first textual language.
  • the 26 letters (case sensitive) included in English and each letter and each punctuation mark in the English punctuation mark may correspond to a first character category.
  • Arabic numerals 0 to 9 each correspond to a second character class.
  • the second text language is any text language other than English, and it is assumed that it can include Chinese, Thai, Arabic, Korean, etc. All characters included in all the second text languages correspond to the same third character category.
  • step 103 based on the category judgment result, a target character recognition result for performing character recognition on the to-be-recognized character is determined.
  • the target characters corresponding to the characters to be recognized based on the above category judgment results, it is possible to determine the target characters corresponding to the characters to be recognized and the irrelevant characters belonging to other characters, filter out the irrelevant characters, and finally obtain only the characters corresponding to the characters to be recognized.
  • the character structure of the target character is obtained, that is, the target character recognition result of performing character recognition on the character to be recognized is obtained.
  • the character category judgment can be performed on each character in the text image, so that based on the category judgment result, the text image including multiple languages is determined. Characters corresponding to characters and irrelevant characters corresponding to other characters are identified, the irrelevant characters are filtered out, and the characters corresponding to the characters to be recognized are subjected to character recognition to obtain a target character recognition result.
  • the present disclosure performs character classification judgment on the text to be recognized and other texts, so as to filter out irrelevant characters corresponding to other texts before performing text recognition on the text to be recognized, so as to reduce the probability of misjudging other texts as texts to be recognized. In the text images of different language languages, the accuracy of text recognition of the text to be recognized is improved.
  • the above method may further include steps 104 to 106:
  • step 104 a candidate region where the to-be-recognized text and/or the other text is located in the text image is determined.
  • the candidate area is the area where the character to be recognized and/or the other character may be located determined in the character image.
  • a region prediction network (Region Proposal Network, RPN) may be used to determine a candidate region where the to-be-recognized text and/or the other text may be located in the text image.
  • RPN Region Proposal Network
  • step 105 the candidate area is divided into a plurality of sub-areas.
  • the candidate area may be divided into a plurality of sub-areas, and the size of each sub-area may be the same or different.
  • the candidate region may be divided evenly according to a preset number to obtain multiple sub-regions with the same size. For example, as shown in FIG. 3A , the candidate region is divided into three sub-regions with the same size.
  • the candidate region may be divided according to the preset same size, so as to obtain N sub-regions with the same size, or (N-1) sub-regions with the same size and a size with the same size as other sub-regions may be obtained Different sub-regions, such as those shown in Figure 3B.
  • the obtained sub-regions 1 to 3 have the same size, and the size of the sub-region 4 is different from that of the other three sub-regions.
  • the candidate region may be divided according to a preset sequence of multiple different sizes. For example, as shown in FIG. 3C , three sub-regions with different sizes may be obtained.
  • a feature sequence corresponding to the character image is determined based on feature information corresponding to at least part of the sub-regions in the plurality of sub-regions.
  • feature information corresponding to each sub-region included in the candidate region can be determined. Based on the feature information corresponding to at least some of the sub-regions, that is, according to all the feature information corresponding to some or all of the multiple sub-regions, the feature sequence corresponding to the text image is obtained.
  • all feature information corresponding to each sub-region may correspond to one feature sequence, or all feature information corresponding to multiple sub-regions may correspond to one feature sequence, or all feature information corresponding to each sub-region may correspond to multiple feature sequences .
  • the present disclosure does not limit this.
  • the order in which each sub-region appears in the text image may be determined first according to the writing order of the text, for example, the order from left to right. Further, after the feature sequence is determined according to the feature information corresponding to at least part of the sub-regions, the feature sequences are sorted according to the order in which the corresponding sub-regions appear in the text image, for example, the feature sequence that appears in the leftmost sub-region of the text image corresponds. The feature sequence is ranked first, and the feature sequence corresponding to the sub-region that appears in the rightmost sub-region of the text image is ranked last. After sorting and combining multiple feature sequences, the feature sequence corresponding to the text image is obtained.
  • the candidate area is divided into sub-area 1, sub-area 2 and sub-area 3, at least part of the area includes sub-area 2 and sub-area 3, where sub-area 2 corresponds to feature sequences 2 and 3, sub-area 3 3 corresponds to the feature sequence 4, then the feature sequences corresponding to the text images obtained after sorting are the feature sequence 2, the feature sequence 3 and the feature sequence 4.
  • a corresponding feature sequence may be obtained after processing, such as pooling and/or sampling, on feature information corresponding to at least part of the sub-regions. Through pooling and/or sampling, the feature information corresponding to the part with obvious features in each sub-region can be selected to determine the feature sequence. While ensuring the accuracy of the obtained feature sequence, the accuracy of determining the feature sequence corresponding to the text image can be improved. efficiency, thereby improving the efficiency of character recognition of the characters to be recognized.
  • step 102 may be executed to determine the category for character category judgment for each character in the text image based on the feature sequence corresponding to the text image. critical result.
  • the candidate area in which the character to be recognized and/or the other character is located in the character image may be divided into multiple sub-areas, and the corresponding feature information of all or part of the multiple sub-areas may be determined. Describe the feature sequence corresponding to the text image. In order to subsequently determine the category judgment result of character category judgment for each character in the text image based on the feature sequence corresponding to the text image, the implementation is simple and the usability is high.
  • step 102 may include step 102-1 and step 102-2.
  • step 102-1 based on the feature sequence corresponding to the text image, at least one candidate character category to which each character included in the text image belongs and the recognition rate of each candidate character category are determined.
  • the feature sequence corresponding to the text image can be used as the input of the classifier, and the classification prediction result output by the classifier can be obtained, and the classification prediction result includes but is not limited to at least one device belonging to each character included in the text image.
  • the text image includes 2 characters, the first character corresponds to 2 candidate character categories, and the second character corresponds to 3 candidate character categories.
  • the possibility probability value of the first character belonging to candidate character category 1 is a, that is, the recognition rate corresponding to candidate character category 1 is a
  • the probability probability value of belonging to candidate character category 2 is b, that is, the candidate character category 2 has a probability probability value of b.
  • the recognition rate corresponding to character category 2 is b.
  • the possibility probability values of the second character belonging to candidate character category 3, candidate character category 4, and candidate character category 5 are c, d, and e, respectively, that is, candidate character category 3, candidate character category 4, and candidate character category 5.
  • the recognition rates of character category 5 are c, d, and e, respectively.
  • step 102-2 for each character, the candidate character category corresponding to the maximum recognition rate among the at least one candidate character category to which the character belongs is used as the category judgment result of the character.
  • the candidate character category corresponding to the maximum recognition rate in at least one candidate character category to which a certain character belongs may be used as the category judgment result of the character.
  • a certain character included in the text image corresponds to two candidate character categories.
  • the recognition rate of the character belonging to candidate character category 1 is a
  • the recognition rate of the character belonging to candidate character category 2 is b. If a is greater than b, then candidate character category 1 can be used as the category judgment result corresponding to the character.
  • the candidate character category to which each character included in the text image may belong and the recognition rate of each candidate character category may be determined based on the feature sequence corresponding to the text image, so that the maximum recognition rate in the candidate character category can be determined.
  • the corresponding candidate character category is used as the category judgment result of the character category judgment for the character, and based on the category judgment result, the target character belonging to the character to be recognized and the irrelevant characters belonging to other characters can be determined later, so as to filter out the irrelevant characters, The accuracy rate of text recognition for the text to be recognized in a text image mixed with multiple text languages is improved.
  • step 103 may include step 103-1 and step 103-3.
  • step 103-1 for each character, according to the correspondence between the character category and the character structure, the character structure corresponding to the candidate character category with the highest recognition rate to which the character belongs is determined.
  • different character classes and corresponding character structures are preset, for example, the character structure corresponding to character class 1 is 'a', the character structure corresponding to character class 2 is 'b', and so on.
  • the character structure corresponding to the candidate character category with the highest recognition rate to which each character belongs may be determined based on the previously determined category judgment result and the above-mentioned corresponding relationship.
  • each character included in the first character language corresponds to a different first character category
  • each first character category corresponds to a different character structure.
  • Different Arabic numerals correspond to different second character categories, and these second character categories also correspond to different character structures, such as character structures '0', '1', and so on.
  • All characters included in multiple second script languages may correspond to the same third character category, and this third character category may correspond to the same character structure.
  • multiple second script languages include Chinese, Arabic, Thai, etc.
  • All characters included in the second script language may correspond to a third character category, assuming a character category 70, this character category 70 may correspond to the same character structure, for example, all correspond to the Chinese character structure 'ah'.
  • the above-mentioned first text language is the text language corresponding to the text to be recognized, and other text languages other than the first text language can be used as the second text language.
  • the character structure corresponding to the candidate character category with the maximum recognition rate to which each character belongs can be determined.
  • a text image includes 4 characters, and the candidate character categories of the maximum recognition rate to which each character belongs are 1, 2, 3, and 70 in sequence. c. Ah.
  • step 103-2 according to the candidate character category of the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other characters.
  • the candidate character category with the highest recognition rate to which a certain character belongs is one of multiple first character categories or multiple second character categories
  • the character belongs to the to-be-recognized character category.
  • the target character corresponding to the text.
  • the plurality of first character categories include: character categories corresponding to a plurality of characters included in the first character language
  • the first character language is the character language corresponding to the character to be recognized
  • the plurality of second character categories include: and Character categories corresponding to multiple Arabic numerals.
  • the candidate character category with the highest recognition rate to which a certain character belongs is the third character category, then it can be determined that the character belongs to the irrelevant character corresponding to the other characters.
  • the first character language is English
  • the plurality of first character classes include character classes 1 to 59
  • the plurality of second character classes corresponding to Arabic numerals include character classes 60 to 69
  • the third character class includes character class 70
  • the candidate character categories of the maximum recognition rate to which each character belongs are 1, 2, 3, and 70 in sequence, then it can be determined that the first 3 characters belong to the target character, and the last character belongs to the irrelevant character.
  • step 103-3 the character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the to-be-recognized character.
  • the character structures corresponding to the four characters included in the text image are a, b, c, ah, and the last character is an irrelevant character.
  • the character structure, thereby obtaining the target character recognition result for example, the obtained target character recognition result is 'ab c'.
  • a preset program can be called to filter out the character structure corresponding to the irrelevant characters, so as to obtain the character structure corresponding to the target character.
  • the preset program may be a pre-written program for filtering the specified character structure. For example, if the character structure is specified as 'ah', the preset program can filter the character structure 'ah', so as to obtain the character structure of the target character corresponding to the character to be recognized.
  • the text image can be directly used as the input of the target neural network, and the feature sequence corresponding to the text image output by the target neural network can be obtained.
  • the target neural network is a neural network used for character category judgment on characters.
  • the target neural network is obtained by training based on the preset neural network, and the corresponding feature sequence can be determined from the text image.
  • the preset neural network includes but is not limited to a computer vision group (Visual Geometry Group, VGG) network, a Google network (GoogLeNet) residual network (Resnet), and the like.
  • the text image can be used as the input of the target neural network for judging the character category of the characters, so as to obtain the feature sequence corresponding to the text image output by the target neural network, and subsequently determine the text image based on the feature sequence corresponding to the text image.
  • the character category corresponding to each included character is included, so that text recognition can be performed on the to-be-recognized text in the text image, which improves the accuracy of character recognition on the to-be-recognized text.
  • FIG. 6 is only an exemplary illustration, in practical applications, the execution order of the following steps 100-1 to 100-2 may not be limited to be executed before step 101), the above The method may further include step 100-1 and step 100-2.
  • step 100-1 a sample text image including both the text corresponding to the first text language and the text corresponding to the second text language is acquired.
  • the above-mentioned sample text images can be directly obtained from the sample image database.
  • step 100-2 the sample text image is used as the input of the preset neural network, and the character category label in the sample text image is used as the supervision to train the preset neural network to obtain the input of the preset neural network.
  • the target neural network for character class judgment.
  • the character category label in the sample text image includes at least one of the following: at least one of the multiple first character category labels corresponding to the multiple characters included in the first text language; multiple Arabic numerals respectively at least one of the corresponding plurality of second character category labels; the same third character category label corresponding to the plurality of characters included in the plurality of second text languages.
  • a connectionist Temporal Classification (Connectionist Temporal Classification, CTC) supervised training method may be used to train a preset neural network, thereby obtaining a target neural network.
  • the CTC supervised training method means that the neural network directly learns the input sequence without having to mark the mapping relationship between the input sequence and the output result in the training data in advance.
  • the preset neural network outputs the character category included in the sample text image, and the loss function is determined according to the difference between the output result of the preset neural network and the character category label in the sample text image, and the gradient of the network parameter is adopted.
  • the preset neural network is iteratively trained to obtain the target neural network.
  • a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language can be obtained, and the sample text image includes a variety of character category labels.
  • the target neural network for character category judgment is obtained, which improves the accuracy and robustness of the target neural network.
  • any one or a combination of the following methods may be used to obtain the sample text images.
  • a sample text image is generated based on a first candidate text image including text corresponding to the first text language and a candidate text corpus corresponding to the second text language.
  • step 100 - 1 may include the following steps 201 to 203 .
  • step 201 a first candidate text image including text corresponding to the first text language is acquired.
  • a first candidate text image that only includes text corresponding to the first text language may be acquired.
  • the first text language is the text language corresponding to the text to be recognized. For example, if the text to be recognized is English, the first text language is English. If the text to be recognized is Thai, the first text language is Thai.
  • step 202 the candidate text corpus corresponding to the at least one second text language is acquired.
  • the candidate text corpus is a sample corpus corresponding to at least one second text language
  • the second text language is a text language different from the first text language.
  • the first text language is Thai, then Chinese and Arabic other than Thai , Korean, etc. can be used as the second text language.
  • the candidate text corpus includes, but is not limited to, multiple characters and multiple character strings composed of characters.
  • the candidate text corpus may also include multiple characters (each character may consist of at least one character or at least one character string). ), a plurality of words (each word may be composed of at least one word and/or at least one character), and a plurality of sentences (each sentence may be composed of at least one word and/or word).
  • the characters, words and/or sentences in the candidate text corpus may have semantics or no semantics, which is not limited in the present disclosure.
  • Semantic means have linguistic meaning, such as stating one thing, describing a thing, etc., without semantic means having no linguistic meaning, for example, when multiple characters are combined to form a trademark (logo) or license plate, the combination of multiple characters is combined. Does not have any linguistic meaning.
  • step 203 the sample text image is generated based on the candidate text corpus and the first candidate text image.
  • the foreground content and background content included in the first candidate text image can be obtained respectively, and the candidate text corpus and the foreground content included in the first candidate text image can be combined to obtain the sample text image.
  • the background content included in the first candidate text image is used as the background content of the sample text image, thereby generating the sample text image.
  • the foreground content includes text written in the first text language
  • the combination of the foreground content and the alternative text corpus includes, but is not limited to, in the case of ensuring that the foreground content and the text content of the alternative text corpus do not overlap, making the two parts of the text content in different relative positions.
  • Relative positions include, but are not limited to, where one is positioned above, below, to the left, to the right of the other, and the like.
  • a first candidate text image including the text corresponding to the first text language and a second candidate text image including the text corresponding to the second text language are respectively acquired, thereby generating a sample text image.
  • step 100 - 1 may include the following steps 301 to 302 .
  • step 301 a first candidate text image including text corresponding to the first text language and a second candidate text image including text corresponding to the at least one second text language are acquired.
  • step 302 the sample text image is generated based on the first candidate text image and the second candidate text image.
  • the foreground content included in the first candidate text image and the foreground content included in the second candidate text image can be obtained respectively, and the foreground content corresponding to the sample text image is obtained by combining the two foreground contents.
  • the foreground content included in the first candidate text image includes text written in the first text language
  • the foreground content included in the second candidate text image includes text written in the second text language
  • the combination of the two foreground content includes but does not It is limited to make the text content of the two parts in different relative positions under the condition that the text content of the two parts does not overlap.
  • the background content included in the first candidate text image or the background content included in the second candidate text image may be used as the background content corresponding to the sample text image, or a preset background image may also be used as the background content corresponding to the sample text image.
  • the background image may include, but is not limited to, different pre-set solid-color background images, background images with different background content, and the background content may be real objects, scenery, and the like.
  • the background images may be obtained in a corresponding manner based on the number of background images.
  • the number of preset background images may be obtained by random sampling.
  • the number of background images can be determined according to the order of magnitude corresponding to the number of background images, or according to the number interval to which the number of background images belongs, or according to the size relationship between the number of background images and the number threshold.
  • the order of magnitude, the division of the quantity interval, and the setting of the quantity threshold can be obtained based on the empirical value when obtaining the first candidate text image or the second candidate text image, which is not limited herein.
  • the number of preset background images is small, you can randomly select a part of the background images from the existing background image database, or if there is no background image database, you can randomly combine different areas of the existing background images to obtain multiple background images. image, so as to ensure the diversity of the final sample text images.
  • the sample text image can be generated.
  • a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language can be obtained, which solves the problem of difficulty in obtaining the sample text image, so that the accuracy of the target neural network can be improved in the follow-up. and robustness.
  • the first text language is Vietnamese
  • the second text language is English
  • the application scenario is a parking lot as an example.
  • the license plate contents corresponding to Thai and Arabic numerals belong to the characters to be recognized, and the license plate contents corresponding to English belong to other characters.
  • the collected text image of the license plate can be used as the input of the target neural network, and the feature sequence corresponding to the text image output by the target neural network can be obtained, and then the feature sequence can be used as the input of the classifier.
  • the controller determines at least one candidate character category to which each character included in the text image belongs and a recognition rate corresponding to each candidate character category.
  • the candidate character category corresponding to the maximum recognition rate in at least one candidate character category of each character is used as the category judgment result of character category judgment for each character in the text image. Further, according to the correspondence between the character category and the character structure, the character structure corresponding to the candidate character category corresponding to the maximum recognition rate to which each character belongs is determined, and the candidate character corresponding to the maximum recognition rate to which each character belongs is determined. After determining the target characters belonging to Thai or Arabic numerals, and the irrelevant characters belonging to English, filter out the character structures corresponding to irrelevant characters from the above character structures to obtain the character structures corresponding to Thai and Arabic numerals, and finally get the pair The target character recognition result of character recognition performed on Thai characters and Arabic numerals in the character image.
  • the license plates For vehicles entering and leaving the parking lot, the license plates include Thai, Arabic numerals and English, and the purpose of character recognition of Thai and Arabic numerals is realized, and misjudgment is not easy to occur, and the recognition accuracy is improved.
  • the camera deployed at the entrance and exit of the parking lot can collect the text images of the license plates including Thai, Arabic numerals and English of the vehicles entering and leaving the entrance and exit of the parking lot.
  • the acquisition method of the text image may include, but is not limited to, frame selection of the video stream collected by the camera.
  • a periodic or aperiodic frame selection operation can be performed on the video stream to obtain a text image obtained by photographing the license plate of the same vehicle including Thai, Arabic numerals and English in one or more frames.
  • the same text image of the license plate including Thai, Arabic numerals and English input to the target neural network may include one or more, which is not limited herein.
  • the recognition result of this sheet can be used as the final recognition result, and in the case of including multiple sheets, the recognition result of each sheet can be comprehensively considered, or the recognition of some text images can be comprehensively considered.
  • the specific implementation method is not limited here, and may include but not Limited to the cases exemplified above.
  • the text image is a text image of a license plate including Thai, Arabic numerals and English
  • the target neural network first determines the candidates where Thai, Arabic numerals and/or English characters are located.
  • FIG. 9B it is assumed that 2 candidate regions are obtained, and the candidate region 1 is divided into 8 sub-regions as an example (the present disclosure uses 8 as an example for exemplary illustration, and the number of feature sequences obtained in practical applications can be less than 8 or more than 8), each sub-region can correspondingly obtain a feature sequence, for example, as shown in FIG. 9B .
  • at least one feature sequence (not shown in FIG. 9B ) can also be obtained for candidate region 2, and the combination of feature sequences corresponding to all subregions obtained by dividing the two candidate subregions is used as the feature corresponding to the text image sequence.
  • At least one candidate character category corresponding to each character included in the text image and the recognition rate corresponding to each candidate character category can be obtained through the classifier.
  • the candidate character category with the maximum recognition rate may be used as the category judgment result.
  • the target characters belonging to Thai and Arabic numerals and irrelevant characters belonging to English can be determined, and the character structure corresponding to the target characters is used as the target character recognition result, that is, character recognition is performed for the Thai characters and Arabic numerals therein.
  • the target text recognition result is used as the target character recognition result.
  • the above-mentioned target neural network may be obtained after training the preset neural network.
  • the sample text images can be obtained through the existing first candidate text images including Thai text and English corpus.
  • a first candidate text image including only Thai text and a second candidate text image including only English text may be separately acquired, and a sample text image is generated based on the first candidate text image and the second candidate text image.
  • the sample text image is used as the input of the preset neural network, and the various character labels in the sample text image are used as supervision, and the required target neural network is obtained through the CTC supervision training method.
  • the character category label in the sample text image includes at least one of the following: at least one of the plurality of first character category labels corresponding to the plurality of characters included in the first text language, that is, the plurality of first character category labels corresponding to the Thai characters respectively at least one of the character class labels; at least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second text languages, that is, English The same third character label corresponding to the character.
  • a large amount of sample training data can be obtained to meet the training accuracy requirements of the preset neural network, improve the robustness of the target neural network, and have high versatility, which can be quickly deployed to any device to realize text recognition. the goal of.
  • the character recognition solution provided by the present disclosure can be used in scenarios such as signboard character recognition, bill recognition, and the above-mentioned license plate recognition.
  • the character recognition scheme can also be used to issue an electronic visa.
  • the user who needs to apply for the electronic visa uploads the target data required for applying for the electronic visa
  • the target data includes but is not limited to at least one of the following: electronic data including at least one of round-trip flight information and hotel information. Forms, ticket information for round-trip flights, successful reservation information issued by the hotel, passport, income proof, medical examination information, and other information required for electronic visa application.
  • the user After uploading the above target information, the user needs to manually check the information content in order to issue an electronic visa.
  • the user can upload the text image of the target data
  • the electronic visa system can first determine the feature sequence corresponding to the text image of each target data according to the text recognition scheme provided by the present disclosure, and further, based on the feature sequence , determining that each character in the text image of the target data belongs to the target character corresponding to the first text language or to an irrelevant character corresponding to other texts except the first text language.
  • the irrelevant characters in the text image of the target data are filtered out, and text recognition is performed on the text to be recognized in the text image of the target data, so as to obtain the target text recognition result.
  • the characters to be recognized include characters corresponding to the first character language
  • the first character language is the character language corresponding to the electronic visa.
  • an electronic visa will be issued in English.
  • the electronic visa system can determine, in the text image of the target data uploaded by the user, that each character belongs to an English character or an irrelevant character corresponding to other characters.
  • the English characters in the text image are recognized, and the target text recognition result is obtained.
  • the electronic visa system can issue the electronic visa based on the target character recognition result. For example, the electronic visa system verifies that the user meets the conditions for issuing an electronic visa based on the target character recognition result, and automatically issues an electronic visa for the user.
  • irrelevant characters corresponding to other characters can be filtered out in the text image of the target data required when applying for an electronic visa, and the text corresponding to the electronic visa in the text image of the target data can be recognized, which improves the issuance of electronic visas. accuracy, timeliness, and high availability.
  • the present disclosure also provides device embodiments.
  • FIG. 10 is a block diagram of a character recognition apparatus shown in the present disclosure according to an exemplary embodiment.
  • the apparatus includes: an image acquisition module 410, configured to acquire a character image including characters to be recognized and other characters; a character category
  • the determination module 420 is used to obtain the category judgment result of each character in the text image based on the feature sequence corresponding to the text image, and the category judgment result is used to characterize the character category; the text recognition module 430 is used to The category judgment result is used to determine the target character recognition result for character recognition of the to-be-recognized character.
  • the apparatus further includes: a region determination module, configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located; a division module, configured to The candidate region is divided into multiple sub-regions; the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.
  • a region determination module configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located
  • a division module configured to The candidate region is divided into multiple sub-regions
  • the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.
  • the character category determination module includes: a first determination submodule, configured to determine at least one candidate to which each character included in the text image belongs based on a feature sequence corresponding to the text image The character category and the recognition rate of each candidate character category; the second determination submodule is used for, for each character, the candidate character corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs category, as the category judgment result of the character.
  • the character recognition module includes: a third determination submodule, configured to, for each character, determine the maximum value of the character to which the character belongs according to the correspondence between the character category and the character structure.
  • the character structure corresponding to the candidate character category of the recognition rate is used to determine that the character belongs to the target character corresponding to the character to be recognized or belongs to the target character corresponding to the character to be recognized according to the candidate character category of the maximum recognition rate to which the character belongs.
  • the irrelevant characters corresponding to the other characters are described; the fifth determination sub-module is configured to use the character structure corresponding to the target character as the target character recognition result of performing character recognition on the character to be recognized.
  • the fifth determination sub-module includes: a first determination unit, configured to respond to determining that the candidate character category with the maximum recognition rate to which the character belongs is a plurality of first character categories or a plurality of first character categories One of the two character categories, determining that the character belongs to the target character; a second determining unit, configured to determine that the character belongs to the irrelevant characters.
  • the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.
  • the feature sequence determination module includes: a sixth determination submodule, configured to use the text image as an input of a target neural network for character category judgment on characters, and obtain the target neural network The feature sequence corresponding to the text image output by the network.
  • the apparatus further includes: a sample text image acquisition module, configured to acquire a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language;
  • the first text language is the text language corresponding to the text to be recognized, and the second text language is a text language different from the first text language;
  • the training module is used to use the sample text image as a preset neural network.
  • the sample text image acquisition module includes: a first acquisition sub-module for acquiring a first candidate text image including text corresponding to the first text language; a second acquisition sub-module for using for acquiring the candidate text corpus corresponding to the at least one second text language; a first generating submodule is configured to generate the sample text image based on the candidate text corpus and the first candidate text image.
  • the sample character image obtaining module includes: a third obtaining sub-module, configured to obtain a first candidate character image including characters corresponding to the first character language and a first candidate character image including the at least one first character image.
  • the second candidate text image of the text corresponding to the two text languages; the second generation sub-module is configured to generate the sample text image based on the first candidate text image and the second candidate text image.
  • the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
  • the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa;
  • the text image includes the target data required when applying for the electronic visa.
  • the character category determination module includes: a seventh determination sub-module for determining, based on the feature sequence corresponding to the text image of the target data, that each character in the text image of the target data belongs to the first text language Corresponding target characters, or irrelevant characters belonging to other characters;
  • the character recognition module includes: an eighth determination sub-module, used to determine the target character recognition for character recognition of the target characters in the character image of the target data Result;
  • the apparatus further includes: an execution module, configured to issue the electronic visa based on the target character recognition result.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute any one of the above-described character recognition methods.
  • embodiments of the present disclosure provide a computer program product, comprising computer-readable code, when the computer-readable code is executed on a device, the processor in the device executes any of the above implementations.
  • the example provides instructions for the text recognition method.
  • the embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, and when the instructions are executed, the computer executes the character recognition method provided by any of the foregoing embodiments.
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • a software development kit Software Development Kit, SDK
  • An embodiment of the present disclosure further provides a character recognition device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the above A method of character recognition as described in one item.
  • FIG. 11 is a schematic diagram of a hardware structure of a character recognition device provided by an embodiment of the present disclosure.
  • the character recognition device 510 includes a processor 511 , and may further include an input device 512 , an output device 513 and a memory 514 .
  • the input device 512, the output device 513, the memory 514 and the processor 511 are connected to each other through a bus.
  • Memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or portable Read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read-only memory
  • Input means are used for inputting data and/or signals, and output means are used for outputting data and/or signals.
  • the output device and the input device can be independent devices or an integral device.
  • the processor may include one or more processors, such as one or more central processing units (CPUs).
  • processors such as one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single-core CPU, or a Multi-core CPU.
  • Memory is used to store program codes and data for network devices.
  • the processor is configured to call the program code and data in the memory to execute the steps in the above method embodiments. For details, refer to the description in the method embodiment, which is not repeated here.
  • FIG. 11 only shows a simplified design of a character recognition device.
  • the character recognition device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all the character recognition devices that can implement the embodiments of the present disclosure All fall within the protection scope of the present disclosure.

Abstract

A text recognition method and device, and a storage medium. The method comprises: obtaining a text image comprising a text to be recognized and other texts (101); obtaining a category determination result of each character of the text image on the basis of a feature sequence corresponding to the text image (102), the category determination result being used for representing a character category; and on the basis of the category determination result, determining a target text recognition result of performing text recognition on the text to be recognized (103).

Description

文字识别方法及装置、存储介质Character recognition method and device, storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2021年1月29日提交的、申请号为202110127630.5、发明名称为“文字识别方法及装置、存储介质”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。This disclosure claims the priority of the Chinese patent application filed on January 29, 2021 with the application number of 202110127630.5 and the invention titled "character recognition method and device, storage medium", the entire contents of which are disclosed by reference manner is incorporated herein.
技术领域technical field
本公开涉及计算机视觉领域,尤其涉及一种文字识别方法及装置、存储介质。The present disclosure relates to the field of computer vision, and in particular, to a character recognition method and device, and a storage medium.
背景技术Background technique
在不同应用场景中进行文字识别,已经成为计算机视觉以及智能视频分析的一大研究方向。Character recognition in different application scenarios has become a major research direction in computer vision and intelligent video analysis.
但是进行文字识别时,如果采集的文字图像中不止包括待识别文字,还包括了其他文字,那么识别的准确率很可能会下降。However, when performing text recognition, if the collected text images include not only the text to be recognized, but also other texts, the recognition accuracy is likely to drop.
发明内容SUMMARY OF THE INVENTION
本公开提供了一种文字识别方法及装置、存储介质。The present disclosure provides a character recognition method and device, and a storage medium.
根据本公开实施例的第一方面,提供一种文字识别方法,所述方法包括:获取包括待识别文字和其他文字的文字图像;基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,所述类别判断结果用于表征字符类别;基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果。According to a first aspect of the embodiments of the present disclosure, there is provided a method for character recognition, the method comprising: acquiring a character image including characters to be recognized and other characters; The category judgment result of each character, the category judgment result is used to represent the character category; based on the category judgment result, the target character recognition result for character recognition of the to-be-recognized character is determined.
在一些可选实施例中,所述方法还包括:确定所述文字图像中所述待识别文字和/或所述其他文字所在的候选区域;将所述候选区域划分为多个子区域;基于所述多个子区域中至少部分子区域对应的特征信息,确定所述文字图像对应的特征序列。In some optional embodiments, the method further includes: determining a candidate area where the to-be-recognized character and/or the other characters in the character image are located; dividing the candidate area into a plurality of sub-areas; feature information corresponding to at least part of the sub-regions in the plurality of sub-regions, to determine the feature sequence corresponding to the text image.
在一些可选实施例中,所述基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,包括:基于所述文字图像对应的特征序列,确定所述文字图像包括的每个字符所属的至少一个备选字符类别和每个备选字符类别的识别率;针对所述每个字符,将该字符所属的所述至少一个备选字符类别中最大识别率对应的备选字符类别,作为该字符的所述类别判断结果。In some optional embodiments, the obtaining the category judgment result of each character in the text image based on the feature sequence corresponding to the text image includes: determining the text based on the feature sequence corresponding to the text image At least one candidate character category to which each character included in the image belongs and the recognition rate of each candidate character category; for each character, the maximum recognition rate in the at least one candidate character category to which the character belongs corresponds to The candidate character category of , as the category judgment result of the character.
在一些可选实施例中,所述基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果,包括:针对所述每个字符,根据所述字符类别和字符结构之间的对应关系,确定该字符所属的最大识别率的备选字符类别对应的字符结构;根据该字符所属的最大识别率的备选字符类别,确定该字符属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符;将所述目标字符对应的所述字符结构,作为对所述待识别文字进行文字识别的所述目标文字识别结果。In some optional embodiments, the determining a target character recognition result for performing character recognition on the character to be recognized based on the category judgment result includes: for each character, according to the character category and character structure Determine the character structure corresponding to the candidate character category of the maximum recognition rate to which the character belongs; according to the candidate character category of the maximum recognition rate to which the character belongs, determine that the character belongs to the target corresponding to the text to be recognized character or an irrelevant character corresponding to the other characters; the character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the character to be recognized.
在一些可选实施例中,所述根据该字符所属的最大识别率的备选字符类别,确定该字符属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符,包括:响应于确定该字符所属的最大识别率的备选字符类别是多个第一字符类别或多个第二字符类别中的一个,确定该字符属于所述目标字符;响应于确定该字符所属的最大识别率的备选字符类别是第三字符类别,确定该字符属于所述无关字符。In some optional embodiments, determining that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other character according to the candidate character category with the maximum recognition rate to which the character belongs, including: In response to determining that the candidate character class with the highest recognition rate to which the character belongs is one of a plurality of first character classes or a plurality of second character classes, it is determined that the character belongs to the target character; in response to determining that the character belongs to the largest character class The candidate character class for the recognition rate is the third character class, which is determined to belong to the irrelevant character.
在一些可选实施例中,所述多个第一字符类别包括:与第一文字语言包括的多个字符分别对应的字符类别;其中,所述第一文字语言是所述待识别文字对应的文字语言;所述多个第二字符类别包括:与多个阿拉伯数字分别对应的字符类别;所述第三字符类别包括:与多种第二文字语言包括的多个字符对应的相同的字符类别;其中,所述第二文字语言是不同于所述第一文字语言的文字语言。In some optional embodiments, the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.
在一些可选实施例中,所述确定所述文字图像对应的特征序列,包括:将所述文字图像作为用于对字符进行字符类别判断的目标神经网络的输入,获得所述目标神经网络输出的所述文字图像对应的特征序列。In some optional embodiments, the determining the feature sequence corresponding to the text image includes: using the text image as an input of a target neural network for character category judgment on characters, and obtaining an output of the target neural network The feature sequence corresponding to the text image.
在一些可选实施例中,所述方法还包括:获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像;其中,所述第一文字语言是所述待识别文字对应的文字语言,所述第二文字语言是不同于所述第一文字语言的文字语言;将所述样本文字图像作为预设神经网络的输入,以所述样本文字图像中的字符类别标签为监督,对所述预设神经网络进行训练,得到用于对字符进行字符类别判断的目标神经网络。In some optional embodiments, the method further includes: acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; wherein the first text language is the Identify the text language corresponding to the text, and the second text language is a text language different from the first text language; use the sample text image as the input of the preset neural network, and use the character category label in the sample text image For supervision, the preset neural network is trained to obtain a target neural network for character category judgment on characters.
在一些可选实施例中,所述获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像,包括:获取包括所述第一文字语言对应的文字的第一备选文字图像;获取所述至少一种第二文字语言对应的备选文字语料;基于所述备选文字语料和所述第一备选文字图像,生成所述样本文字图像。In some optional embodiments, the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes: acquiring a first text image that includes the text corresponding to the first text language candidate text image; acquiring candidate text corpus corresponding to the at least one second text language; generating the sample text image based on the candidate text corpus and the first candidate text image.
在一些可选实施例中,所述获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像,包括:获取包括所述第一文字语言对应的文字的第一备选文字图像和包括所述至少一种第二文字语言对应的文字的第二备选文字图像;基于所述第一备选文字图像和所述第二备选文字图像,生成所述样本文字图像。In some optional embodiments, the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes: acquiring a first text image that includes the text corresponding to the first text language A candidate text image and a second candidate text image including text corresponding to the at least one second text language; based on the first candidate text image and the second candidate text image, the sample text is generated image.
在一些可选实施例中,所述样本文字图像中的字符类别标签包括以下至少一个:与所述第一文字语言包括的多个字符分别对应的多个第一字符类别标签中的至少一个;与多个阿拉伯数字分别对应的多个第二字符类别标签中的至少一个;与多种第二文字语言包括的多个字符对应的相同的第三字符类别标签。In some optional embodiments, the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
在一些可选实施例中,所述待识别文字包括第一文字语言对应的文字,所述第一文字语言是电子签证对应的文字语言;所述文字图像包括申请所述电子签证时需要的目标资料的文字图像;所述基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,包括:基于所述目标资料的文字图像对应的特征序列,确定所述目标资料的文字图像中每个字符属于所述第一文字语言对应的目标字符,或属于其他文字对应的无关字符;所述确定对所述待识别文字进行文字识别的目标文字识别结果,包括:确定对所述目标资料的文字图像中所述目标字符进行文字识别的目标文字识别结果;所述方法还包括:基于所述目标文字识别结果,签发所述电子签证。In some optional embodiments, the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes the target data required when applying for the electronic visa. text image; the obtaining the category judgment result of each character in the text image based on the feature sequence corresponding to the text image includes: determining the target data based on the feature sequence corresponding to the text image of the target data Each character in the text image belongs to a target character corresponding to the first text language, or belongs to an irrelevant character corresponding to other characters; the determining the target character recognition result of performing character recognition on the to-be-recognized character includes: determining the target character recognition result for the character to be recognized. The target character recognition result of the character recognition performed by the target character in the text image of the target data; the method further includes: issuing the electronic visa based on the target character recognition result.
根据本公开实施例的第二方面,提供一种文字识别装置,包括:图像获取模块,用于获取包括待识别文字和其他文字的文字图像;字符类别确定模块,用于基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,所述类别判断结果用于表征字符类别;文字识别模块,用于基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果。According to a second aspect of the embodiments of the present disclosure, there is provided a character recognition device, comprising: an image acquisition module for acquiring text images including characters to be recognized and other characters; a character category determination module for corresponding The feature sequence is obtained, and the category judgment result of each character in the text image is obtained, and the category judgment result is used to characterize the character category; the character recognition module is used for determining the to-be-recognized text based on the category judgment result. The target text recognition result of text recognition.
在一些可选实施例中,所述装置还包括:区域确定模块,用于确定所述文字图像中所述待识别文字和/或所述其他文字所在的候选区域;划分模块,用于将所述候选区域划分为多个子区域;特征序列确定模块,用于基于所述多个子区域中至少部分子区域对应的特征信息,确定所述文字图像对应的特征序列。In some optional embodiments, the apparatus further includes: a region determination module, configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located; a division module, configured to The candidate region is divided into multiple sub-regions; the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.
在一些可选实施例中,所述字符类别确定模块包括:第一确定子模块,用于基于所 述文字图像对应的特征序列,确定所述文字图像包括的每个字符所属的至少一个备选字符类别和每个备选字符类别的识别率;第二确定子模块,用于针对所述每个字符,将该字符所属的所述至少一个备选字符类别中最大识别率对应的备选字符类别,作为该字符的所述类别判断结果。In some optional embodiments, the character category determination module includes: a first determination submodule, configured to determine at least one candidate to which each character included in the text image belongs based on a feature sequence corresponding to the text image The character category and the recognition rate of each candidate character category; the second determination submodule is used for, for each character, the candidate character corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs category, as the category judgment result of the character.
在一些可选实施例中,所述文字识别模块包括:第三确定子模块,用于针对所述每个字符,根据所述字符类别和字符结构之间的对应关系,确定该字符所属的最大识别率的备选字符类别对应的字符结构;第四确定子模块,用于根据该字符所属的最大识别率的备选字符类别,确定该字符属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符;第五确定子模块,用于将所述目标字符对应的所述字符结构,作为对所述待识别文字进行文字识别的所述目标文字识别结果。In some optional embodiments, the character recognition module includes: a third determination submodule, configured to, for each character, determine the maximum value of the character to which the character belongs according to the correspondence between the character category and the character structure. The character structure corresponding to the candidate character category of the recognition rate; the fourth determination submodule is used to determine that the character belongs to the target character corresponding to the character to be recognized or belongs to the target character corresponding to the character to be recognized according to the candidate character category of the maximum recognition rate to which the character belongs. The irrelevant characters corresponding to the other characters are described; the fifth determination sub-module is configured to use the character structure corresponding to the target character as the target character recognition result of performing character recognition on the character to be recognized.
在一些可选实施例中,所述第五确定子模块包括:第一确定单元,用于响应于确定该字符所属的最大识别率的备选字符类别是多个第一字符类别或多个第二字符类别中的一个,确定该字符属于所述目标字符;第二确定单元,用于响应于确定该字符所属的最大识别率的备选字符类别是第三字符类别,确定该字符属于所述无关字符。In some optional embodiments, the fifth determination sub-module includes: a first determination unit, configured to respond to determining that the candidate character category with the maximum recognition rate to which the character belongs is a plurality of first character categories or a plurality of first character categories One of the two character categories, determining that the character belongs to the target character; a second determining unit, configured to determine that the character belongs to the irrelevant characters.
在一些可选实施例中,所述多个第一字符类别包括:与第一文字语言包括的多个字符分别对应的字符类别;其中,所述第一文字语言是所述待识别文字对应的文字语言;所述多个第二字符类别包括:与多个阿拉伯数字分别对应的字符类别;所述第三字符类别包括:与多种第二文字语言包括的多个字符对应的相同的字符类别;其中,所述第二文字语言是不同于所述第一文字语言的文字语言。In some optional embodiments, the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.
在一些可选实施例中,所述特征序列确定模块包括:第六确定子模块,用于将所述文字图像作为用于对字符进行字符类别判断的目标神经网络的输入,获得所述目标神经网络输出的所述文字图像对应的特征序列。In some optional embodiments, the feature sequence determination module includes: a sixth determination submodule, configured to use the text image as an input of a target neural network for character category judgment on characters, and obtain the target neural network The feature sequence corresponding to the text image output by the network.
在一些可选实施例中,所述装置还包括:样本文字图像获取模块,用于获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像;其中,所述第一文字语言是所述待识别文字对应的文字语言,所述第二文字语言是不同于所述第一文字语言的文字语言;训练模块,用于将所述样本文字图像作为预设神经网络的输入,以所述样本文字图像中的字符类别标签为监督,对所述预设神经网络进行训练,得到用于对字符进行字符类别判断的目标神经网络。In some optional embodiments, the apparatus further includes: a sample text image acquisition module, configured to acquire a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; The first text language is the text language corresponding to the text to be recognized, and the second text language is a text language different from the first text language; the training module is used to use the sample text image as a preset neural network. Input, with the character category label in the sample text image as supervision, the preset neural network is trained to obtain a target neural network for judging the character category of the characters.
在一些可选实施例中,所述样本文字图像获取模块包括:第一获取子模块,用于获取包括所述第一文字语言对应的文字的第一备选文字图像;第二获取子模块,用于获取所述至少一种第二文字语言对应的备选文字语料;第一生成子模块,用于基于所述备选文字语料和所述第一备选文字图像,生成所述样本文字图像。In some optional embodiments, the sample text image acquisition module includes: a first acquisition sub-module for acquiring a first candidate text image including text corresponding to the first text language; a second acquisition sub-module for using for acquiring the candidate text corpus corresponding to the at least one second text language; a first generating submodule is configured to generate the sample text image based on the candidate text corpus and the first candidate text image.
在一些可选实施例中,所述样本文字图像获取模块包括:第三获取子模块,用于获取包括所述第一文字语言对应的文字的第一备选文字图像和包括所述至少一种第二文字语言对应的文字的第二备选文字图像;第二生成子模块,用于基于所述第一备选文字图像和所述第二备选文字图像,生成所述样本文字图像。In some optional embodiments, the sample character image obtaining module includes: a third obtaining sub-module, configured to obtain a first candidate character image including characters corresponding to the first character language and a first candidate character image including the at least one first character image. The second candidate text image of the text corresponding to the two text languages; the second generation sub-module is configured to generate the sample text image based on the first candidate text image and the second candidate text image.
在一些可选实施例中,所述样本文字图像中的字符类别标签包括以下至少一个:与所述第一文字语言包括的多个字符分别对应的多个第一字符类别标签中的至少一个;与多个阿拉伯数字分别对应的多个第二字符类别标签中的至少一个;与多种第二文字语言包括的多个字符对应的相同的第三字符类别标签。In some optional embodiments, the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
在一些可选实施例中,所述待识别文字包括第一文字语言对应的文字,所述第一文字语言是电子签证对应的文字语言;所述文字图像包括申请所述电子签证时需要的目标资料的文字图像;所述字符类别确定模块包括:第七确定子模块,用于基于所述目标资料的文字图像对应的特征序列,确定所述目标资料的文字图像中每个字符属于所述第一 文字语言对应的目标字符,或属于其他文字对应的无关字符;所述文字识别模块包括:第八确定子模块,用于确定对所述目标资料的文字图像中所述目标字符进行文字识别的目标文字识别结果;所述装置还包括:执行模块,用于基于所述目标文字识别结果,签发所述电子签证。In some optional embodiments, the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes the target data required when applying for the electronic visa. text image; the character category determination module includes: a seventh determination sub-module for determining, based on the feature sequence corresponding to the text image of the target data, that each character in the text image of the target data belongs to the first text language Corresponding target characters, or irrelevant characters belonging to other characters; the character recognition module includes: an eighth determination sub-module, used to determine the target character recognition for character recognition of the target characters in the character image of the target data Result; the apparatus further includes: an execution module, configured to issue the electronic visa based on the target character recognition result.
根据本公开实施例的第三方面,提供一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述第一方面任一所述的文字识别方法。According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is configured to execute the character recognition method according to any one of the above-mentioned first aspect.
根据本公开实施例的第四方面,提供一种文字识别装置,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现第一方面任一项所述的文字识别方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a character recognition device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the memory stored in the memory The executable instructions of the first aspect implement the character recognition method described in any one of the first aspects.
本公开的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:
本公开实施例中,针对同时包括待识别文字和其他文字的文字图像,可以对文字图像中的每个字符进行字符类别判断,从而基于类别判断结果,在包括多种语言的文字图像中确定出待识别文字对应的字符,以及其他文字对应的无关字符,过滤掉无关字符,对待识别文字对应的字符进行文字识别,得到目标文字识别结果。本公开对待识别文字和其他文字进行字符类别判断,以便在对待识别文字进行文字识别之前,过滤掉其他文字对应的无关字符,从而降低将其他文字误判为待识别文字的概率,在混合了多种文字语言的文字图像中,提高了对其中待识别文字进行文字识别的准确率。In the embodiment of the present disclosure, for a text image that includes both the text to be recognized and other texts, the character category judgment can be performed on each character in the text image, so that based on the category judgment result, it can be determined in the text image including multiple languages. The characters corresponding to the characters to be recognized and the irrelevant characters corresponding to other characters are filtered out, and the characters corresponding to the characters to be recognized are subjected to character recognition to obtain a target character recognition result. The present disclosure performs character classification judgment on the text to be recognized and other texts, so as to filter out irrelevant characters corresponding to other texts before performing text recognition on the text to be recognized, so as to reduce the probability of misjudging other texts as texts to be recognized. In the text images of different language languages, the accuracy of text recognition of the text to be recognized is improved.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1是本公开根据一示例性实施例示出的一种文字识别方法流程图。FIG. 1 is a flowchart of a method for character recognition according to an exemplary embodiment of the present disclosure.
图2是本公开根据一示例性实施例示出的另一种文字识别方法流程图。FIG. 2 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
图3A是本公开根据一示例性实施例示出的一种对候选区域进行划分的场景示意图。FIG. 3A is a schematic diagram of a scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.
图3B是本公开根据一示例性实施例示出的另一种对候选区域进行划分的场景示意图。FIG. 3B is a schematic diagram of another scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.
图3C是本公开根据一示例性实施例示出的另一种对候选区域进行划分的场景示意图。FIG. 3C is a schematic diagram of another scenario of dividing a candidate region according to an exemplary embodiment of the present disclosure.
图4是本公开根据一示例性实施例示出的另一种文字识别方法流程图。FIG. 4 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
图5是本公开根据一示例性实施例示出的另一种文字识别方法流程图。FIG. 5 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
图6是本公开根据一示例性实施例示出的另一种文字识别方法流程图。FIG. 6 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
图7是本公开根据一示例性实施例示出的另一种文字识别方法流程图。FIG. 7 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
图8是本公开根据一示例性实施例示出的另一种文字识别方法流程图。FIG. 8 is a flowchart of another method for character recognition according to an exemplary embodiment of the present disclosure.
图9A是本公开根据一示例性实施例示出的一种文字识别过程对应的架构示意图。FIG. 9A is a schematic structural diagram corresponding to a character recognition process according to an exemplary embodiment of the present disclosure.
图9B是本公开根据一示例性实施例示出的一种确定特征序列的示意图。FIG. 9B is a schematic diagram of a determination feature sequence according to an exemplary embodiment of the present disclosure.
图10是本公开根据一示例性实施例示出的一种文字识别装置框图。FIG. 10 is a block diagram of a character recognition apparatus according to an exemplary embodiment of the present disclosure.
图11是本公开根据一示例性实施例示出的一种文字识别装置的一结构示意图。FIG. 11 is a schematic structural diagram of a character recognition device according to an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
在本公开运行的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所运行的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中运行的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所运行的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining."
目前,如果文字图像中同时包括待识别文字和其他文字,可以采用忽略其他文字的方式进行文字识别,即在文字识别模型训练过程中,样本文字图像中只包括待识别文字对应的文字标签。但是这样得到的文字识别模型在判断过程中,容易将其他文字误判为待识别文字,准确率无法保证。At present, if the text image contains both the text to be recognized and other text, the text recognition can be performed by ignoring other text, that is, during the training of the text recognition model, only the text label corresponding to the text to be recognized is included in the sample text image. However, in the judgment process of the text recognition model obtained in this way, it is easy to misjudge other texts as the text to be recognized, and the accuracy cannot be guaranteed.
为了解决上述问题,本公开实施例提供了一种文字识别方案,针对同时包括待识别文字和其他文字的文字图像,可以对文字图像中每个字符进行字符类别判断,从而基于类别判断结果,得到对其中的待识别文字进行文字识别的目标文字识别结果。In order to solve the above problem, an embodiment of the present disclosure provides a character recognition solution. For a character image including the character to be recognized and other characters at the same time, character category judgment can be performed on each character in the character image, so that based on the category judgment result, the The target character recognition result of performing character recognition on the characters to be recognized.
例如图1所示,图1是根据一示例性实施例示出的一种文字识别方法,包括以下步骤:For example, as shown in FIG. 1, FIG. 1 shows a character recognition method according to an exemplary embodiment, including the following steps:
在步骤101中,获取包括待识别文字和其他文字的文字图像。In step 101, a text image including the text to be recognized and other texts is acquired.
在本公开实施例中,可以通过部署在不同应用场景中的摄像头,获取包括待识别文字的文字图像。该不同应用场景包括但不限于招牌文字识别场景、车牌识别场景、票据识别场景等。相应地,获取到的文字图像中可以包括但不限于用待识别文字对应的第一文字语言书写的招牌、车牌、票据等。另外,在本公开实施例中,获取到的文字图像中还同时包括用其他文字对应的第二文字语言书写的文字内容,第二文字语言包括但不限于不同于第一文字语言的文字语言。In the embodiment of the present disclosure, a text image including the text to be recognized can be acquired through cameras deployed in different application scenarios. The different application scenarios include but are not limited to signboard text recognition scenarios, license plate recognition scenarios, bill recognition scenarios, and the like. Correspondingly, the acquired text images may include, but are not limited to, signboards, license plates, bills, and the like written in the first text language corresponding to the text to be recognized. In addition, in the embodiment of the present disclosure, the acquired text image also includes text content written in a second text language corresponding to other text, and the second text language includes but is not limited to a text language different from the first text language.
在本公开实施例中,用第二文字语言书写的文字内容可以与用第一文字语言书写的文字内容相同、至少部分相同或者不同。In the embodiment of the present disclosure, the text content written in the second text language may be the same, at least partially the same, or different from the text content written in the first text language.
例如,第一文字语言为泰文,第二文字语言为英文,获取到的文字图像中包括了用泰文书写的招牌,同时还包括了用英文书写的相同的招牌内容。再例如,第一文字语言为泰文,第二文字语言为中文,获取到的文字图像中包括了用泰文书写的票据内容,同时还包括了用中文书写的该票据中的部分内容。再例如,第一文字语言为英文,第二文字语言为中文,获取到的文字图像中包括了用泰文书写的文字内容,同时还包括了用中文书写的完全不同的文字内容。For example, if the first text language is Thai, and the second text language is English, the acquired text image includes a signboard written in Thai, and also includes the same signboard content written in English. For another example, the first text language is Thai, the second text language is Chinese, and the acquired text image includes the content of the receipt written in Thai, and also includes part of the content of the receipt written in Chinese. For another example, the first text language is English, the second text language is Chinese, and the acquired text image includes text content written in Thai, and also includes completely different text content written in Chinese.
在步骤102中,基于所述文字图像对应的特征序列,得到对所述文字图像中每个字符的类别判断结果。In step 102, based on the feature sequence corresponding to the character image, a result of determining the category of each character in the character image is obtained.
在本公开实施例中,文字图像对应的特征序列的数目可以为一个或多个,每个特征序列可以由该文字图像中待识别文字和/或其他文字所在的候选区域包括的至少部分特 征信息构成。In this embodiment of the present disclosure, the number of feature sequences corresponding to the text image may be one or more, and each feature sequence may be composed of at least part of the feature information included in the candidate region where the text to be recognized and/or other texts are located in the text image constitute.
其中,候选区域是在文字图像中确定出的待识别文字和/或其他文字可能所在的区域。候选区域可以再次被划分为多个子区域,候选区域包括的至少部分特征信息可以是由至少部分子区域对应的特征信息构成,至少部分子区域对应的特征信息是指至少部分子区域对应的全部特征信息。例如,多个子区域包括子区域1、子区域2和子区域3,候选区域包括的至少部分特征信息可以由子区域1和子区域2的全部特征信息构成。Wherein, the candidate area is the area where the character to be recognized and/or other characters may be located determined in the character image. The candidate region can be divided into multiple sub-regions again, and at least part of the feature information included in the candidate region can be composed of feature information corresponding to at least some of the sub-regions, and the feature information corresponding to at least some of the sub-regions refers to all the features corresponding to at least some of the sub-regions. information. For example, the multiple sub-regions include sub-region 1, sub-region 2, and sub-region 3, and at least part of the feature information included in the candidate region may be composed of all the feature information of sub-region 1 and sub-region 2.
在本公开实施例中,进一步地,可以根据该文字图像对应的特征序列,确定对文字图像中的每个字符的类别判断结果。其中,该类别判断结果可以用于表征字符类别。In the embodiment of the present disclosure, further, the category judgment result for each character in the text image may be determined according to the feature sequence corresponding to the text image. The category judgment result can be used to characterize the character category.
在本公开实施例中,可以预先针对第一文字语言包括的每个字符,确定对应的第一字符类别,以及针对每个阿拉伯数字确定对应的第二字符类别,同时,还可以针对多种第二文字语言包括的所有字符确定相同的一个第三字符类别。其中,第一文字语言可以是待识别文字对应的文字语言,第一文字语言包括的每个字符可以指第一文字语言包括的每一个字母元素、以及每一个标点符号元素,所述第二文字语言是不同于所述第一文字语言的文字语言。In the embodiment of the present disclosure, a corresponding first character category may be determined for each character included in the first text language in advance, and a corresponding second character category may be determined for each Arabic numeral. All characters included in the literal language define the same one third character class. The first text language may be the text language corresponding to the text to be recognized, each character included in the first text language may refer to each letter element and each punctuation element included in the first text language, and the second text language is different A textual language in the first textual language.
例如,第一文字语言为英文,那么英文所包括的26个字母(区分大小写)、以及英文标点符号中的每个字母和每个标点符号可以对应一个第一字符类别。阿拉伯数字0至9分别对应一个第二字符类别。第二文字语言就是除了英文之外的任一种文字语言,假设可以包括中文、泰文、阿拉伯文、韩文等等,所有第二文字语言包括的所有字符都对应同一个第三字符类别。For example, if the first text language is English, then the 26 letters (case sensitive) included in English and each letter and each punctuation mark in the English punctuation mark may correspond to a first character category. Arabic numerals 0 to 9 each correspond to a second character class. The second text language is any text language other than English, and it is assumed that it can include Chinese, Thai, Arabic, Korean, etc. All characters included in all the second text languages correspond to the same third character category.
在步骤103中,基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果。In step 103, based on the category judgment result, a target character recognition result for performing character recognition on the to-be-recognized character is determined.
在本公开实施例中,基于上述的类别判断结果,就可以确定其中属于待识别文字对应的目标字符,以及属于其他文字的无关字符,过滤掉其中的无关字符,最终只得到属于待识别文字对应的目标字符的字符结构,即得到对所述待识别文字进行文字识别的目标文字识别结果。In the embodiment of the present disclosure, based on the above category judgment results, it is possible to determine the target characters corresponding to the characters to be recognized and the irrelevant characters belonging to other characters, filter out the irrelevant characters, and finally obtain only the characters corresponding to the characters to be recognized. The character structure of the target character is obtained, that is, the target character recognition result of performing character recognition on the character to be recognized is obtained.
上述实施例中,针对同时包括待识别文字和其他文字的文字图像,可以对文字图像中的每个字符进行字符类别判断,从而基于类别判断结果,在包括多种语言的文字图像中确定出待识别文字对应的字符,以及其他文字对应的无关字符,过滤掉无关字符,对待识别文字对应的字符进行文字识别,得到目标文字识别结果。本公开对待识别文字和其他文字进行字符类别判断,以便在对待识别文字进行文字识别之前,过滤掉其他文字对应的无关字符,从而降低将其他文字误判为待识别文字的概率,在混合了多种文字语言的文字图像中,提高了对其中待识别文字进行文字识别的准确率。In the above-mentioned embodiment, for a text image that includes both the text to be recognized and other texts, the character category judgment can be performed on each character in the text image, so that based on the category judgment result, the text image including multiple languages is determined. Characters corresponding to characters and irrelevant characters corresponding to other characters are identified, the irrelevant characters are filtered out, and the characters corresponding to the characters to be recognized are subjected to character recognition to obtain a target character recognition result. The present disclosure performs character classification judgment on the text to be recognized and other texts, so as to filter out irrelevant characters corresponding to other texts before performing text recognition on the text to be recognized, so as to reduce the probability of misjudging other texts as texts to be recognized. In the text images of different language languages, the accuracy of text recognition of the text to be recognized is improved.
在一些可选实施例中,例如图2所示,上述方法还可以包括步骤104至步骤106:In some optional embodiments, such as shown in FIG. 2 , the above method may further include steps 104 to 106:
在步骤104中,确定所述文字图像中所述待识别文字和/或所述其他文字所在的候选区域。In step 104, a candidate region where the to-be-recognized text and/or the other text is located in the text image is determined.
其中,候选区域是在文字图像中确定出的待识别文字和/或所述其他文字可能所在的区域。Wherein, the candidate area is the area where the character to be recognized and/or the other character may be located determined in the character image.
在一个示例中,可以采用区域预测网络(Region Proposal Network,RPN)来确定文字图像中所述待识别文字和/或所述其他文字可能所在的候选区域。In one example, a region prediction network (Region Proposal Network, RPN) may be used to determine a candidate region where the to-be-recognized text and/or the other text may be located in the text image.
在步骤105中,将所述候选区域划分为多个子区域。In step 105, the candidate area is divided into a plurality of sub-areas.
在本公开实施例中,在确定了待识别文字和/或所述其他文字所在的候选区域后,可以将该候选区域划分为多个子区域,每个子区域的尺寸可以相同或不同。In the embodiment of the present disclosure, after the candidate area where the character to be recognized and/or the other character is located is determined, the candidate area may be divided into a plurality of sub-areas, and the size of each sub-area may be the same or different.
在一个示例中,可以对候选区域按照预设数目进行平均划分,从而得到尺寸相同的多个子区域,例如图3A所示,将候选区域划分为3个尺寸相同的子区域。In an example, the candidate region may be divided evenly according to a preset number to obtain multiple sub-regions with the same size. For example, as shown in FIG. 3A , the candidate region is divided into three sub-regions with the same size.
在另一个示例中,可以对候选区域按照预设的相同尺寸进行划分,从而得到N个尺寸相同的子区域,或者可以得到(N-1)个尺寸相同的子区域和一个尺寸与其他子区域不同的子区域,例如图3B所示。得到的子区域1至子区域3尺寸相同,子区域4的尺寸与其他三个子区域尺寸均不同。In another example, the candidate region may be divided according to the preset same size, so as to obtain N sub-regions with the same size, or (N-1) sub-regions with the same size and a size with the same size as other sub-regions may be obtained Different sub-regions, such as those shown in Figure 3B. The obtained sub-regions 1 to 3 have the same size, and the size of the sub-region 4 is different from that of the other three sub-regions.
在另一个示例中,可以对候选区域按照预设的多个不同尺寸顺序进行划分,例如图3C所示,可以得到3个尺寸互不相同的子区域。In another example, the candidate region may be divided according to a preset sequence of multiple different sizes. For example, as shown in FIG. 3C , three sub-regions with different sizes may be obtained.
在步骤106中,基于所述多个子区域中至少部分子区域对应的特征信息,确定所述文字图像对应的特征序列。In step 106, a feature sequence corresponding to the character image is determined based on feature information corresponding to at least part of the sub-regions in the plurality of sub-regions.
在本公开实施例中,基于文字图像对应的特征图,可以确定候选区域所包括的每个子区域对应的特征信息。基于其中至少部分子区域对应的特征信息,即根据多个子区域中部分或全部子区域对应的全部特征信息,得到文字图像对应的特征序列。In the embodiment of the present disclosure, based on the feature map corresponding to the text image, feature information corresponding to each sub-region included in the candidate region can be determined. Based on the feature information corresponding to at least some of the sub-regions, that is, according to all the feature information corresponding to some or all of the multiple sub-regions, the feature sequence corresponding to the text image is obtained.
在一个示例中,每个子区域对应的全部特征信息,可以对应一个特征序列,或者多个子区域对应的全部特征信息,可以对应一个特征序列,或者每个子区域对应的全部特征信息对应多个特征序列。本公开对此不作限定。In an example, all feature information corresponding to each sub-region may correspond to one feature sequence, or all feature information corresponding to multiple sub-regions may correspond to one feature sequence, or all feature information corresponding to each sub-region may correspond to multiple feature sequences . The present disclosure does not limit this.
在另一个示例中,可以按照文字书写顺序,例如从左到右的顺序,先确定每个子区域出现在该文字图像中的顺序。进一步地,根据至少部分子区域对应的特征信息确定特征序列后,按照相应子区域在文字图像中出现的顺序,对特征序列进行前后排序,例如出现在文字图像的最左侧的子区域对应的特征序列排在最前边,出现在文字图像的最右侧的子区域对应的特征序列排在最后边,多个特征序列排序组合后得到该文字图像对应的特征序列。In another example, the order in which each sub-region appears in the text image may be determined first according to the writing order of the text, for example, the order from left to right. Further, after the feature sequence is determined according to the feature information corresponding to at least part of the sub-regions, the feature sequences are sorted according to the order in which the corresponding sub-regions appear in the text image, for example, the feature sequence that appears in the leftmost sub-region of the text image corresponds. The feature sequence is ranked first, and the feature sequence corresponding to the sub-region that appears in the rightmost sub-region of the text image is ranked last. After sorting and combining multiple feature sequences, the feature sequence corresponding to the text image is obtained.
例如,按照从左到右的顺序,候选区域被划分为子区域1、子区域2和子区域3,至少部分区域包括子区域2和子区域3,其中子区域2对应特征序列2和3、子区域3对应特征序列4,那么排序后得到的文字图像对应的特征序列为特征序列2、特征序列3和特征序列4。在另一个示例中,可以对至少部分子区域对应的特征信息进行池化和/或采样等处理后,得到对应的特征序列。通过池化和/或采样,可以选取每个子区域中特征明显的部分对应的特征信息,来确定特征序列,在确保得到的特征序列的准确性的同时,可以提高确定文字图像对应的特征序列的效率,进而提高对待识别文字进行文字识别的效率。For example, in the order from left to right, the candidate area is divided into sub-area 1, sub-area 2 and sub-area 3, at least part of the area includes sub-area 2 and sub-area 3, where sub-area 2 corresponds to feature sequences 2 and 3, sub-area 3 3 corresponds to the feature sequence 4, then the feature sequences corresponding to the text images obtained after sorting are the feature sequence 2, the feature sequence 3 and the feature sequence 4. In another example, a corresponding feature sequence may be obtained after processing, such as pooling and/or sampling, on feature information corresponding to at least part of the sub-regions. Through pooling and/or sampling, the feature information corresponding to the part with obvious features in each sub-region can be selected to determine the feature sequence. While ensuring the accuracy of the obtained feature sequence, the accuracy of determining the feature sequence corresponding to the text image can be improved. efficiency, thereby improving the efficiency of character recognition of the characters to be recognized.
在本公开实施例中,可以在确定了文字图像对应的特征序列后,再执行步骤102,基于所述文字图像对应的特征序列,确定对所述文字图像中每个字符进行字符类别判断的类别判断结果。In this embodiment of the present disclosure, after the feature sequence corresponding to the text image is determined, step 102 may be executed to determine the category for character category judgment for each character in the text image based on the feature sequence corresponding to the text image. critical result.
上述实施例中,可以将文字图像中待识别文字和/或所述其他文字所在的候选区域划分为多个子区域,基于所述多个子区域的全部或部分子区域对应的特征信息,来确定所述文字图像对应的特征序列。以便后续基于文字图像对应的特征序列,确定对所述文字图像中每个字符进行字符类别判断的类别判断结果,实现简便,可用性高。In the above-mentioned embodiment, the candidate area in which the character to be recognized and/or the other character is located in the character image may be divided into multiple sub-areas, and the corresponding feature information of all or part of the multiple sub-areas may be determined. Describe the feature sequence corresponding to the text image. In order to subsequently determine the category judgment result of character category judgment for each character in the text image based on the feature sequence corresponding to the text image, the implementation is simple and the usability is high.
在一些可选实施例中,例如图4所示,步骤102可以包括步骤102-1和步骤102-2。In some optional embodiments, such as shown in FIG. 4 , step 102 may include step 102-1 and step 102-2.
在步骤102-1中,基于所述文字图像对应的特征序列,确定所述文字图像包括的每个字符所属的至少一个备选字符类别和每个备选字符类别的识别率。In step 102-1, based on the feature sequence corresponding to the text image, at least one candidate character category to which each character included in the text image belongs and the recognition rate of each candidate character category are determined.
在一个示例中,可以将文字图像对应的特征序列作为分类器的输入,获得该分类器输出的分类预测结果,分类预测结果包括但不限于所述文字图像包括的每个字符所属的至少一个备选字符类别,以及每个备选字符类别对应的识别率,即每个字符属于该备选 字符类别的可能性概率值。In one example, the feature sequence corresponding to the text image can be used as the input of the classifier, and the classification prediction result output by the classifier can be obtained, and the classification prediction result includes but is not limited to at least one device belonging to each character included in the text image. The selected character category, and the recognition rate corresponding to each candidate character category, that is, the probability probability value of each character belonging to the candidate character category.
例如,文字图像中包括2个字符,第一个字符对应2个备选字符类别,第二字符对应3个备选字符类别。其中,第一个字符属于备选字符类别1的可能性概率值为a,即备选字符类别1对应的识别率为a,属于备选字符类别2的可能性概率值为b,即备选字符类别2对应的识别率为b。第二个字符属于备选字符类别3、备选字符类别4和备选字符类别5的可能性概率值分别为c、d、e,即备选字符类别3、备选字符类别4和备选字符类别5的识别率分别为c、d、e。For example, the text image includes 2 characters, the first character corresponds to 2 candidate character categories, and the second character corresponds to 3 candidate character categories. Among them, the possibility probability value of the first character belonging to candidate character category 1 is a, that is, the recognition rate corresponding to candidate character category 1 is a, and the probability probability value of belonging to candidate character category 2 is b, that is, the candidate character category 2 has a probability probability value of b. The recognition rate corresponding to character category 2 is b. The possibility probability values of the second character belonging to candidate character category 3, candidate character category 4, and candidate character category 5 are c, d, and e, respectively, that is, candidate character category 3, candidate character category 4, and candidate character category 5. The recognition rates of character category 5 are c, d, and e, respectively.
在步骤102-2中,针对所述每个字符,将该字符所属的所述至少一个备选字符类别中最大识别率对应的备选字符类别,作为该字符的所述类别判断结果。In step 102-2, for each character, the candidate character category corresponding to the maximum recognition rate among the at least one candidate character category to which the character belongs is used as the category judgment result of the character.
在本公开实施例中,为了便于后续确定目标文字识别结果,可以将某个字符所属的至少一个备选字符类别中最大识别率对应的备选字符类别,作为该字符的所述类别判断结果。In the embodiment of the present disclosure, in order to facilitate the subsequent determination of the target character recognition result, the candidate character category corresponding to the maximum recognition rate in at least one candidate character category to which a certain character belongs may be used as the category judgment result of the character.
例如,文字图像中包括的某个字符对应2个备选字符类别。其中,该字符属于备选字符类别1的识别率为a,属于备选字符类别2的识别率为b,a大于b,那么备选字符类别1可以作为该字符对应的类别判断结果。For example, a certain character included in the text image corresponds to two candidate character categories. Wherein, the recognition rate of the character belonging to candidate character category 1 is a, and the recognition rate of the character belonging to candidate character category 2 is b. If a is greater than b, then candidate character category 1 can be used as the category judgment result corresponding to the character.
上述实施例中,可以基于文字图像对应的特征序列,确定文字图像包括的每个字符可能所属的备选字符类别和每个备选字符类别的识别率,从而将备选字符类别中最大识别率对应的备选字符类别,作为对该字符进行字符类别判断的类别判断结果,后续可以基于该类别判断结果,确定属于待识别文字的目标字符和属于其他文字的无关字符,以便过滤掉无关字符,提高了在混合多种文字语言的文字图像中,针对待识别文字进行文字识别的准确率。In the above embodiment, the candidate character category to which each character included in the text image may belong and the recognition rate of each candidate character category may be determined based on the feature sequence corresponding to the text image, so that the maximum recognition rate in the candidate character category can be determined. The corresponding candidate character category is used as the category judgment result of the character category judgment for the character, and based on the category judgment result, the target character belonging to the character to be recognized and the irrelevant characters belonging to other characters can be determined later, so as to filter out the irrelevant characters, The accuracy rate of text recognition for the text to be recognized in a text image mixed with multiple text languages is improved.
在一些可选实施例中,例如图5所示,步骤103可以包括步骤103-1和步骤103-3。In some optional embodiments, such as shown in FIG. 5 , step 103 may include step 103-1 and step 103-3.
在步骤103-1中,针对每个字符,根据字符类别和字符结构之间的对应关系,确定该字符所属的最大识别率的备选字符类别对应的字符结构。In step 103-1, for each character, according to the correspondence between the character category and the character structure, the character structure corresponding to the candidate character category with the highest recognition rate to which the character belongs is determined.
在本公开实施例中,预先设置了不同的字符类别和对应的字符结构,例如,字符类别1对应的字符结构为‘a’,字符类别2对应的字符结构为‘b’,等等。可以基于之前确定的类别判断结果和上述对应关系,确定每个字符所属的最大识别率的备选字符类别对应的字符结构。In the embodiment of the present disclosure, different character classes and corresponding character structures are preset, for example, the character structure corresponding to character class 1 is 'a', the character structure corresponding to character class 2 is 'b', and so on. The character structure corresponding to the candidate character category with the highest recognition rate to which each character belongs may be determined based on the previously determined category judgment result and the above-mentioned corresponding relationship.
在本公开实施例中,第一文字语言包括的每个字符对应不同的第一字符类别,每个第一字符类别分别对应不同的字符结构。不同的阿拉伯数字对应不同的第二字符类别,这些第二字符类别也分别对应不同的字符结构,例如字符结构‘0’、‘1’等。而多种第二文字语言包括的所有字符可以对应同一个第三字符类别,这个第三字符类别可以对应相同的一个字符结构,例如,多种第二文字语言包括中文、阿拉伯文、泰文等,第二文字语言包括的所有的字符可以都对应一个第三字符类别,假设为字符类别70,这个字符类别70可以对应同一个字符结构,例如都对应中文的字符结构‘啊’。In the embodiment of the present disclosure, each character included in the first character language corresponds to a different first character category, and each first character category corresponds to a different character structure. Different Arabic numerals correspond to different second character categories, and these second character categories also correspond to different character structures, such as character structures '0', '1', and so on. All characters included in multiple second script languages may correspond to the same third character category, and this third character category may correspond to the same character structure. For example, multiple second script languages include Chinese, Arabic, Thai, etc. All characters included in the second script language may correspond to a third character category, assuming a character category 70, this character category 70 may correspond to the same character structure, for example, all correspond to the Chinese character structure 'ah'.
当然,上述第一文字语言是待识别文字对应的文字语言,除了第一文字语言之外的其他文字语言均可以作为第二文字语言。Of course, the above-mentioned first text language is the text language corresponding to the text to be recognized, and other text languages other than the first text language can be used as the second text language.
在本公开实施例中,根据上述对应关系,就可以确定每个字符所属的最大识别率的备选字符类别对应的字符结构。In the embodiment of the present disclosure, according to the above-mentioned correspondence, the character structure corresponding to the candidate character category with the maximum recognition rate to which each character belongs can be determined.
例如,文字图像包括4个字符,每个字符分别所属的最大识别率的备选字符类别依次为1、2、3、70,根据上述对应关系,可以确定对应的字符结构依次为a、b、c、啊。For example, a text image includes 4 characters, and the candidate character categories of the maximum recognition rate to which each character belongs are 1, 2, 3, and 70 in sequence. c. Ah.
在步骤103-2中,根据该字符所属的最大识别率的备选字符类别,确定该字符 属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符。In step 103-2, according to the candidate character category of the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other characters.
在本公开实施例中,如果确定某个字符所属的最大识别率的备选字符类别是多个第一字符类别或多个第二字符类别中的一个,那么可以确定该字符属于所述待识别文字对应的目标字符。其中,多个第一字符类别包括:与第一文字语言包括的多个字符分别对应的字符类别,所述第一文字语言是所述待识别文字对应的文字语言,多个第二字符类别包括:与多个阿拉伯数字分别对应的字符类别。In this embodiment of the present disclosure, if it is determined that the candidate character category with the highest recognition rate to which a certain character belongs is one of multiple first character categories or multiple second character categories, then it can be determined that the character belongs to the to-be-recognized character category. The target character corresponding to the text. Wherein, the plurality of first character categories include: character categories corresponding to a plurality of characters included in the first character language, the first character language is the character language corresponding to the character to be recognized, and the plurality of second character categories include: and Character categories corresponding to multiple Arabic numerals.
如果确定某个字符所属的最大识别率的备选字符类别是第三字符类别,那么可以确定该字符属于所述其他文字对应的无关字符。If it is determined that the candidate character category with the highest recognition rate to which a certain character belongs is the third character category, then it can be determined that the character belongs to the irrelevant character corresponding to the other characters.
例如,第一文字语言为英文,多个第一字符类别包括字符类别1至59,阿拉伯数字对应的多个第二字符类别包括字符类别60至69,第三字符类别包括字符类别70,文字图像中包括4个字符,每个字符分别所属的最大识别率的备选字符类别依次为1、2、3、70,那么可以确定前3个字符属于目标字符,最后一个字符属于无关字符。For example, the first character language is English, the plurality of first character classes include character classes 1 to 59, the plurality of second character classes corresponding to Arabic numerals include character classes 60 to 69, the third character class includes character class 70, and in the text image Including 4 characters, the candidate character categories of the maximum recognition rate to which each character belongs are 1, 2, 3, and 70 in sequence, then it can be determined that the first 3 characters belong to the target character, and the last character belongs to the irrelevant character.
在步骤103-3中,将所述目标字符对应的所述字符结构,作为对所述待识别文字进行文字识别的所述目标文字识别结果。In step 103-3, the character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the to-be-recognized character.
之前已经确定了文字图像包括的4个字符对应的字符结构依次为a、b、c、啊,其中最后一个字符属于无关字符,可以过滤掉无关字符对应的字符结构,只留下目标字符对应的所述字符结构,从而得到目标文字识别结果,例如得到的目标文字识别结果为‘a b c’。It has been determined before that the character structures corresponding to the four characters included in the text image are a, b, c, ah, and the last character is an irrelevant character. The character structure, thereby obtaining the target character recognition result, for example, the obtained target character recognition result is 'ab c'.
在一个示例中,可以调用预设程序,过滤掉无关字符对应的字符结构,从而得到目标字符对应的所述字符结构。其中,预设程序可以是预先编写的用于过滤指定字符结构的程序。例如,指定字符结构为‘啊’,该预设程序可以过滤字符结构‘啊’,从而得到待识别文字对应的目标字符的字符结构。In one example, a preset program can be called to filter out the character structure corresponding to the irrelevant characters, so as to obtain the character structure corresponding to the target character. Wherein, the preset program may be a pre-written program for filtering the specified character structure. For example, if the character structure is specified as 'ah', the preset program can filter the character structure 'ah', so as to obtain the character structure of the target character corresponding to the character to be recognized.
上述实施例中,可以基于类别判断结果,确定文字图像中每个字符属于所述待识别文字对应的目标字符,还是属于所述其他文字对应的无关字符,从而可以过滤掉无关字符对应的字符结构,只保留待识别文字对应的目标字符的字符结构,得到对所述待识别文字进行文字识别的目标文字识别结果,提高了对混合多种文字语言的文字图像中的待识别文字进行文字识别的准确率。In the above embodiment, based on the category judgment result, it can be determined whether each character in the text image belongs to the target character corresponding to the text to be recognized, or belongs to the irrelevant character corresponding to the other text, so that the character structure corresponding to the irrelevant character can be filtered out. , only the character structure of the target character corresponding to the character to be recognized is retained, and the target character recognition result of the character recognition of the character to be recognized is obtained, which improves the ability of character recognition of the character to be recognized in the character image mixed with multiple language languages. Accuracy.
在一些可选实施例中,针对上述步骤102,可以将文字图像直接作为目标神经网络的输入,获得目标神经网络输出的所述文字图像对应的特征序列。其中,所述目标神经网络是用于对字符进行字符类别判断的神经网络。In some optional embodiments, for the above step 102, the text image can be directly used as the input of the target neural network, and the feature sequence corresponding to the text image output by the target neural network can be obtained. Wherein, the target neural network is a neural network used for character category judgment on characters.
在本公开实施例中,目标神经网络是基于预设神经网络训练得到的,可以从文字图像中确定对应的特征序列。其中,预设神经网络包括但不限于计算机视觉组(Visual Geometry Group,VGG)网络、谷歌网络(GoogLeNet)残差网络(Resnet)等。In the embodiment of the present disclosure, the target neural network is obtained by training based on the preset neural network, and the corresponding feature sequence can be determined from the text image. The preset neural network includes but is not limited to a computer vision group (Visual Geometry Group, VGG) network, a Google network (GoogLeNet) residual network (Resnet), and the like.
上述实施例中,可以将文字图像作为对字符进行字符类别判断的目标神经网络的输入,从而得到该目标神经网络输出的文字图像对应的特征序列,后续基于文字图像对应的特征序列来确定文字图像包括的每个字符对应的字符类别,进而可以对文字图像中的待识别文字进行文字识别,提高了对待识别文字进行文字识别的准确率。In the above-mentioned embodiment, the text image can be used as the input of the target neural network for judging the character category of the characters, so as to obtain the feature sequence corresponding to the text image output by the target neural network, and subsequently determine the text image based on the feature sequence corresponding to the text image. The character category corresponding to each included character is included, so that text recognition can be performed on the to-be-recognized text in the text image, which improves the accuracy of character recognition on the to-be-recognized text.
在一些可选实施例中,例如图6所示(图6仅为示例性说明,实际应用中可以不限定下列步骤100-1至100-2的执行顺序必须要在步骤101之前执行),上述方法还可以包括步骤100-1和步骤100-2。In some optional embodiments, such as shown in FIG. 6 (FIG. 6 is only an exemplary illustration, in practical applications, the execution order of the following steps 100-1 to 100-2 may not be limited to be executed before step 101), the above The method may further include step 100-1 and step 100-2.
在步骤100-1中,获取同时包括第一文字语言对应的文字和第二文字语言对应的文字的样本文字图像。In step 100-1, a sample text image including both the text corresponding to the first text language and the text corresponding to the second text language is acquired.
在本公开实施例中,可以直接从样本图像数据库中获得上述样本文字图像。In the embodiment of the present disclosure, the above-mentioned sample text images can be directly obtained from the sample image database.
在步骤100-2中,将所述样本文字图像作为预设神经网络的输入,以所述样本文字图像中的字符类别标签为监督,对所述预设神经网络进行训练,得到用于对字符进行字符类别判断的目标神经网络。In step 100-2, the sample text image is used as the input of the preset neural network, and the character category label in the sample text image is used as the supervision to train the preset neural network to obtain the input of the preset neural network. The target neural network for character class judgment.
在本公开实施例中,样本文字图像中的字符类别标签包括以下至少一个:所述第一文字语言包括的多个字符分别对应的多个第一字符类别标签中的至少一个;多个阿拉伯数字分别对应的多个第二字符类别标签中的至少一个;多种第二文字语言包括的多个字符对应的相同的第三字符类别标签。In the embodiment of the present disclosure, the character category label in the sample text image includes at least one of the following: at least one of the multiple first character category labels corresponding to the multiple characters included in the first text language; multiple Arabic numerals respectively at least one of the corresponding plurality of second character category labels; the same third character category label corresponding to the plurality of characters included in the plurality of second text languages.
在本公开实施例中,可以采用联接主义时间分类(Connectionist Temporal Classification,CTC)监督训练方式,对预设神经网络进行训练,从而得到目标神经网络。其中,CTC监督训练方式是指使神经网络直接对输入序列进行学习,而无需事先标注好训练数据中输入序列和输出结果的映射关系。In the embodiment of the present disclosure, a connectionist Temporal Classification (Connectionist Temporal Classification, CTC) supervised training method may be used to train a preset neural network, thereby obtaining a target neural network. Among them, the CTC supervised training method means that the neural network directly learns the input sequence without having to mark the mapping relationship between the input sequence and the output result in the training data in advance.
在本公开实施例中,预设神经网络输出样本文字图像中所包括的字符类别,根据预设神经网络的输出结果和样本文字图像中的字符类别标签的差异,确定损失函数,采用网络参数梯度反传的方式,对预设神经网络进行迭代训练,以便得到目标神经网络。In the embodiment of the present disclosure, the preset neural network outputs the character category included in the sample text image, and the loss function is determined according to the difference between the output result of the preset neural network and the character category label in the sample text image, and the gradient of the network parameter is adopted. In the way of backpropagation, the preset neural network is iteratively trained to obtain the target neural network.
上述实施例中,可以获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像,样本文字图像中包括多种字符类别标签,通过对预设神经网络的训练,得到用于对字符进行字符类别判断的目标神经网络,提高了目标神经网络的精度和鲁棒性。In the above embodiment, a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language can be obtained, and the sample text image includes a variety of character category labels. , the target neural network for character category judgment is obtained, which improves the accuracy and robustness of the target neural network.
在一些可选实施例中,考虑到样本文字图像数目可能较少,为了确保目标神经网络的精度和鲁棒性,可以采用以下方式中的任一种或多种的组合来得到样本文字图像。In some optional embodiments, considering that the number of sample text images may be small, in order to ensure the accuracy and robustness of the target neural network, any one or a combination of the following methods may be used to obtain the sample text images.
第一种方式,基于包括所述第一文字语言对应的文字的第一备选文字图像,以及所述第二文字语言对应的备选文字语料,生成样本文字图像。In a first manner, a sample text image is generated based on a first candidate text image including text corresponding to the first text language and a candidate text corpus corresponding to the second text language.
例如图7所示,步骤100-1可以包括以下步骤201至步骤203。For example, as shown in FIG. 7 , step 100 - 1 may include the following steps 201 to 203 .
在步骤201中,获取包括所述第一文字语言对应的文字的第一备选文字图像。In step 201, a first candidate text image including text corresponding to the first text language is acquired.
在本公开实施例中,可以获取只包括第一文字语言对应的文字的第一备选文字图像。其中,第一文字语言是待识别文字对应的文字语言,例如待识别文字为英文,那么第一文字语言就是英文,如果待识别文字为泰文,那么第一文字语言就是泰文。In this embodiment of the present disclosure, a first candidate text image that only includes text corresponding to the first text language may be acquired. The first text language is the text language corresponding to the text to be recognized. For example, if the text to be recognized is English, the first text language is English. If the text to be recognized is Thai, the first text language is Thai.
在步骤202中,获取所述至少一种第二文字语言对应的备选文字语料。In step 202, the candidate text corpus corresponding to the at least one second text language is acquired.
备选文字语料是至少一种第二文字语言对应的样本语料,所述第二文字语言是不同于第一文字语言的文字语言,例如第一文字语言是泰文,那么除了泰文之外的中文、阿拉伯文、韩文等都可以作为第二文字语言。The candidate text corpus is a sample corpus corresponding to at least one second text language, and the second text language is a text language different from the first text language. For example, the first text language is Thai, then Chinese and Arabic other than Thai , Korean, etc. can be used as the second text language.
备选文字语料中包括但不限于多个字符、由字符构成的多个字符串,另外,备选文字语料中也可以包括多个字(每个字可以由至少一个字符或至少一个字符串组成)、多个词(每个词可以由至少一个字和/或至少一个字符构成)和多个语句(每个语句可以由至少一个字和/或词构成)。The candidate text corpus includes, but is not limited to, multiple characters and multiple character strings composed of characters. In addition, the candidate text corpus may also include multiple characters (each character may consist of at least one character or at least one character string). ), a plurality of words (each word may be composed of at least one word and/or at least one character), and a plurality of sentences (each sentence may be composed of at least one word and/or word).
其中,备选文字语料中的字、词和/或语句可以有语义或没有语义,本公开对此不作限定。有语义表示具备语言意义,例如陈述了一件事、描述了一个东西等,没有语义表示不具备语言意义,例如多个字符组合在一起构成商标(logo)或车牌时,多个字符的组合并不具备任何语言意义。Wherein, the characters, words and/or sentences in the candidate text corpus may have semantics or no semantics, which is not limited in the present disclosure. Semantic means have linguistic meaning, such as stating one thing, describing a thing, etc., without semantic means having no linguistic meaning, for example, when multiple characters are combined to form a trademark (logo) or license plate, the combination of multiple characters is combined. Does not have any linguistic meaning.
在步骤203中,基于所述备选文字语料和所述第一备选文字图像,生成所述样本文字图像。In step 203, the sample text image is generated based on the candidate text corpus and the first candidate text image.
在本公开实施例中,可以分别得到第一备选文字图像所包括的前景内容和背景内容,将备选文字语料与第一备选文字图像所包括的前景内容进行组合,得到样本文字图像的前景内容,将第一备选文字图像所包括的背景内容作为样本文字图像的背景内容,从而生成样本文字图像。In the embodiment of the present disclosure, the foreground content and background content included in the first candidate text image can be obtained respectively, and the candidate text corpus and the foreground content included in the first candidate text image can be combined to obtain the sample text image. For the foreground content, the background content included in the first candidate text image is used as the background content of the sample text image, thereby generating the sample text image.
其中,前景内容包括用第一文字语言书写的文字,前景内容与备选文字语料的组合包括但不限于在确保前景内容与备选文字语料的文字内容不重叠的情况下,使两部分的文字内容处于不同的相对位置。相对位置包括但不限于,其中一个位于另一个的上方、下方、左侧、右侧等位置。Wherein, the foreground content includes text written in the first text language, and the combination of the foreground content and the alternative text corpus includes, but is not limited to, in the case of ensuring that the foreground content and the text content of the alternative text corpus do not overlap, making the two parts of the text content in different relative positions. Relative positions include, but are not limited to, where one is positioned above, below, to the left, to the right of the other, and the like.
第二种方式,分别获取包括所述第一文字语言对应的文字的第一备选文字图像和包括所述第二文字语言对应的文字的第二备选文字图像,从而生成样本文字图像。In the second manner, a first candidate text image including the text corresponding to the first text language and a second candidate text image including the text corresponding to the second text language are respectively acquired, thereby generating a sample text image.
例如图8所示,步骤100-1可以包括以下步骤301至步骤302。For example, as shown in FIG. 8 , step 100 - 1 may include the following steps 301 to 302 .
在步骤301中,获取包括所述第一文字语言对应的文字的第一备选文字图像和包括所述至少一种第二文字语言对应的文字的第二备选文字图像。In step 301, a first candidate text image including text corresponding to the first text language and a second candidate text image including text corresponding to the at least one second text language are acquired.
在步骤302中,基于所述第一备选文字图像和所述第二备选文字图像,生成所述样本文字图像。In step 302, the sample text image is generated based on the first candidate text image and the second candidate text image.
在本公开实施例中,可以分别得到第一备选文字图像包括的前景内容和第二备选文字图像包括的前景内容,将两个前景内容进行组合后得到样本文字图像对应的前景内容。其中,第一备选文字图像包括的前景内容包括用第一文字语言书写的文字,第二备选文字图像包括的前景内容包括用第二文字语言书写的文字,两个前景内容的组合包括但不限于在确保两部分的文字内容不重叠的情况下,使两部分文字内容处于不同的相对位置。In the embodiment of the present disclosure, the foreground content included in the first candidate text image and the foreground content included in the second candidate text image can be obtained respectively, and the foreground content corresponding to the sample text image is obtained by combining the two foreground contents. The foreground content included in the first candidate text image includes text written in the first text language, the foreground content included in the second candidate text image includes text written in the second text language, and the combination of the two foreground content includes but does not It is limited to make the text content of the two parts in different relative positions under the condition that the text content of the two parts does not overlap.
可以将第一备选文字图像包括的背景内容、或第二备选文字图像包括的背景内容作为样本文字图像对应的背景内容,或者还可以将预设背景图作为样本文字图像对应的背景内容。The background content included in the first candidate text image or the background content included in the second candidate text image may be used as the background content corresponding to the sample text image, or a preset background image may also be used as the background content corresponding to the sample text image.
在本公开实施例中,背景图可以包括但不限于预先设置好的不同的纯色背景图、存在不同背景内容的背景图,背景内容可以为实物、景色等。In this embodiment of the present disclosure, the background image may include, but is not limited to, different pre-set solid-color background images, background images with different background content, and the background content may be real objects, scenery, and the like.
在一种实现方式中,可以基于背景图的数量采用对应的方式获取背景图,比如,如果预先设置的背景图的数目较多,可以通过随机采样的方式得到预先设置的背景图中的至少一个。具体可以依据背景图数量对应的数量级,或是依据背景图数量所属的数量区间,或是依据背景图数量与数量阈值之间的大小关系来确定背景图的数目较多或是较少等。其中,数量级、数量区间的划分,以及数量阈值的设置,可以基于获得第一备选文字图像或第二备选文字图像时的经验值得到,在此不予限定。In an implementation manner, the background images may be obtained in a corresponding manner based on the number of background images. For example, if the number of preset background images is large, at least one of the preset background images may be obtained by random sampling. . Specifically, the number of background images can be determined according to the order of magnitude corresponding to the number of background images, or according to the number interval to which the number of background images belongs, or according to the size relationship between the number of background images and the number threshold. Wherein, the order of magnitude, the division of the quantity interval, and the setting of the quantity threshold can be obtained based on the empirical value when obtaining the first candidate text image or the second candidate text image, which is not limited herein.
如果预先设置的背景图的数目较少,可以从已有的背景图数据库中随机选取一部分背景图,或者如果没有背景图数据库,可以基于已有背景图的不同区域进行随机组合,得到多个背景图,从而确保最终得到的样本文字图像的多样性。If the number of preset background images is small, you can randomly select a part of the background images from the existing background image database, or if there is no background image database, you can randomly combine different areas of the existing background images to obtain multiple background images. image, so as to ensure the diversity of the final sample text images.
在本公开实施例中,确定了样本文字图像的前景内容和背景内容后,可以生成样本文字图像。In the embodiment of the present disclosure, after the foreground content and the background content of the sample text image are determined, the sample text image can be generated.
上述实施例中,可以获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像,解决了样本文字图像获取困难的问题,从而可以在后续提高目标神经网络精度和鲁棒性。In the above embodiment, a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language can be obtained, which solves the problem of difficulty in obtaining the sample text image, so that the accuracy of the target neural network can be improved in the follow-up. and robustness.
在一些可选实施例中,以第一文字语言为泰文,第二文字语言为英文,应用场景为停车场为例。在采集了包括泰文、阿拉伯数字和英文的车牌内容的文字图像后,需 要对其中的泰文和阿拉伯数字进行文字识别。其中,泰文和阿拉伯数字对应的车牌内容就属于待识别文字,英文对应的车牌内容属于其他文字。In some optional embodiments, the first text language is Thai, the second text language is English, and the application scenario is a parking lot as an example. After collecting the text image of the license plate content including Thai, Arabic numerals and English, it is necessary to perform character recognition on the Thai and Arabic numerals. Among them, the license plate contents corresponding to Thai and Arabic numerals belong to the characters to be recognized, and the license plate contents corresponding to English belong to other characters.
在本公开实施例中,可以将采集到的车牌文字图像作为目标神经网络的输入,得到目标神经网络输出的与该文字图像对应的特征序列,进而将特征序列作为分类器的输入,以通过分类器确定该文字图像包括的每个字符属于的至少一个备选字符类别和每个备选字符类别对应的识别率。In the embodiment of the present disclosure, the collected text image of the license plate can be used as the input of the target neural network, and the feature sequence corresponding to the text image output by the target neural network can be obtained, and then the feature sequence can be used as the input of the classifier. The controller determines at least one candidate character category to which each character included in the text image belongs and a recognition rate corresponding to each candidate character category.
基于分类器输出的上述结果,将每个字符至少一个备选字符类别中最大识别率对应的备选字符类别,作为对所述文字图像中每个字符进行字符类别判断的类别判断结果。进一步地,根据字符类别和字符结构之间的对应关系,确定所述每个字符所属的最大识别率对应的备选字符类别对应的字符结构,根据每个字符所属的最大识别率的备选字符类别,确定出属于泰文或阿拉伯数字的目标字符,以及属于英文的无关字符后,从上述的字符结构中,过滤掉无关字符对应的字符结构,得到泰文和阿拉伯数字对应的字符结构,最终得到对该文字图像中的泰文和阿拉伯数字进行文字识别的目标文字识别结果。Based on the above results output by the classifier, the candidate character category corresponding to the maximum recognition rate in at least one candidate character category of each character is used as the category judgment result of character category judgment for each character in the text image. Further, according to the correspondence between the character category and the character structure, the character structure corresponding to the candidate character category corresponding to the maximum recognition rate to which each character belongs is determined, and the candidate character corresponding to the maximum recognition rate to which each character belongs is determined. After determining the target characters belonging to Thai or Arabic numerals, and the irrelevant characters belonging to English, filter out the character structures corresponding to irrelevant characters from the above character structures to obtain the character structures corresponding to Thai and Arabic numerals, and finally get the pair The target character recognition result of character recognition performed on Thai characters and Arabic numerals in the character image.
针对进出停车场的车辆的包括泰文、阿拉伯数字和英文的车牌,实现对其中的泰文和阿拉伯数字进行文字识别的目的,且不容易出现误判,提高了识别准确率。For vehicles entering and leaving the parking lot, the license plates include Thai, Arabic numerals and English, and the purpose of character recognition of Thai and Arabic numerals is realized, and misjudgment is not easy to occur, and the recognition accuracy is improved.
在实现过程中,首先,可以通过部署在停车场出入口的摄像头,采集进出停车场出入口的车辆的包括泰文、阿拉伯数字和英文的车牌的文字图像。需要说明的是,文字图像的获取方式,可以包括但不限于对摄像头采集的视频流进行选帧。比如,可以对视频流进行周期性或是非周期性选帧操作,以得到一帧或是多帧对同一车辆的包括泰文、阿拉伯数字和英文的车牌进行拍摄得到的文字图像。选帧过程中,可以考虑拍摄角度、成像清晰度、成像亮度等会影响到文字图像质量和/或识别准确率的一种或是多种因素,从而得到输入目标神经网络的文字图像。其中,输入目标神经网络的同一包括泰文、阿拉伯数字和英文的车牌的文字图像,可以包括一张或多张,在此不予限定。在包括一张的情况下,可以将这一张的识别结果作为最终的识别结果,而在包括多张的情况下,可以综合考虑每张的识别结果,或是综合考虑其中部分文字图像的识别结果,以得到最终的识别结果,又或者可以从多张文字图像中筛选出一张,以基于这一张文字图像得到最终的识别结果,具体实现方式,在此不予限定,可以包括但不限于上述例举的情况。In the implementation process, firstly, the camera deployed at the entrance and exit of the parking lot can collect the text images of the license plates including Thai, Arabic numerals and English of the vehicles entering and leaving the entrance and exit of the parking lot. It should be noted that the acquisition method of the text image may include, but is not limited to, frame selection of the video stream collected by the camera. For example, a periodic or aperiodic frame selection operation can be performed on the video stream to obtain a text image obtained by photographing the license plate of the same vehicle including Thai, Arabic numerals and English in one or more frames. In the frame selection process, one or more factors that affect the quality and/or recognition accuracy of the text image, such as the shooting angle, imaging clarity, and imaging brightness, can be considered, so as to obtain the text image input to the target neural network. Wherein, the same text image of the license plate including Thai, Arabic numerals and English input to the target neural network may include one or more, which is not limited herein. In the case of including one sheet, the recognition result of this sheet can be used as the final recognition result, and in the case of including multiple sheets, the recognition result of each sheet can be comprehensively considered, or the recognition of some text images can be comprehensively considered. As a result, in order to obtain the final recognition result, or one can be selected from multiple text images to obtain the final recognition result based on this text image, the specific implementation method is not limited here, and may include but not Limited to the cases exemplified above.
例如图9A所示,在本公开提供的应用场景中,该文字图像为同时包括泰文、阿拉伯数字和英文的车牌的文字图像,目标神经网络首先确定泰文、阿拉伯数字和/或英文字符所在的候选区域,例如图9B所示,假设得到2个候选区域,以针对候选区域1划分为8个子区域为例(本公开以8个为例进行示例性说明,实际应用中得到的特征序列的数目可以小于8个或大于8个),每个子区域可以对应得到一个特征序列,例如图9B所示。同样地,针对候选区域2也可以得到至少一个特征序列(图9B中未示出),将两个候选子区域划分得到的所有子区域分别对应的特征序列的组合,作为该文字图像对应的特征序列。For example, as shown in FIG. 9A , in the application scenario provided by the present disclosure, the text image is a text image of a license plate including Thai, Arabic numerals and English, and the target neural network first determines the candidates where Thai, Arabic numerals and/or English characters are located. For example, as shown in FIG. 9B , it is assumed that 2 candidate regions are obtained, and the candidate region 1 is divided into 8 sub-regions as an example (the present disclosure uses 8 as an example for exemplary illustration, and the number of feature sequences obtained in practical applications can be less than 8 or more than 8), each sub-region can correspondingly obtain a feature sequence, for example, as shown in FIG. 9B . Similarly, at least one feature sequence (not shown in FIG. 9B ) can also be obtained for candidate region 2, and the combination of feature sequences corresponding to all subregions obtained by dividing the two candidate subregions is used as the feature corresponding to the text image sequence.
在得到目标神经网络输出的对应文字图像的特征序列后,可以通过分类器得到该文字图像包括的每个字符对应的至少一个备选字符类别和每个备选字符类别对应的识别率。在本公开实施例中,可以将最大识别率的备选字符类别作为类别判断结果。After obtaining the feature sequence corresponding to the text image output by the target neural network, at least one candidate character category corresponding to each character included in the text image and the recognition rate corresponding to each candidate character category can be obtained through the classifier. In the embodiment of the present disclosure, the candidate character category with the maximum recognition rate may be used as the category judgment result.
进一步地,根据类别判断结果可以确定其中属于泰文和阿拉伯数字的目标字符,以及属于英文的无关字符,将目标字符对应的字符结构作为目标文字识别结果即,针对其中的泰文和阿拉伯数字进行文字识别的目标文字识别结果。Further, according to the category judgment result, the target characters belonging to Thai and Arabic numerals and irrelevant characters belonging to English can be determined, and the character structure corresponding to the target characters is used as the target character recognition result, that is, character recognition is performed for the Thai characters and Arabic numerals therein. The target text recognition result.
在本公开实施例中,可以对预设神经网络进行训练后,得到上述目标神经网络。In the embodiment of the present disclosure, the above-mentioned target neural network may be obtained after training the preset neural network.
在对目标神经网络进行训练的过程中,可以通过已有的包括泰文文字的第一备 选文字图像,以及英文语料,得到样本文字图像。In the process of training the target neural network, the sample text images can be obtained through the existing first candidate text images including Thai text and English corpus.
或者可以单独获取只包括泰文文字的第一备选文字图像,以及只包括英文文字的第二备选文字图像,基于第一备选文字图像和第二备选文字图像,生成样本文字图像。Alternatively, a first candidate text image including only Thai text and a second candidate text image including only English text may be separately acquired, and a sample text image is generated based on the first candidate text image and the second candidate text image.
以样本文字图像作为预设神经网络的输入,样本文字图像中的多种字符标签作为监督,通过CTC监督训练方式,得到所需要的目标神经网络。其中,样本文字图像中的字符类别标签包括以下至少一个:所述第一文字语言包括的多个字符分别对应的多个第一字符类别标签中的至少一个,即泰文字符分别对应的多个第一字符类别标签中的至少一个;多个阿拉伯数字分别对应的多个第二字符类别标签中的至少一个;多种第二文字语言包括的多个字符对应的相同的第三字符类别标签,即英文字符对应的同一个第三字符标签。The sample text image is used as the input of the preset neural network, and the various character labels in the sample text image are used as supervision, and the required target neural network is obtained through the CTC supervision training method. Wherein, the character category label in the sample text image includes at least one of the following: at least one of the plurality of first character category labels corresponding to the plurality of characters included in the first text language, that is, the plurality of first character category labels corresponding to the Thai characters respectively at least one of the character class labels; at least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second text languages, that is, English The same third character label corresponding to the character.
上述实施例中,可以得到大量的样本训练数据,满足对预设神经网络的训练精度的需求,提高了目标神经网络的鲁棒性,且通用性高,可以快速部署到任意设备上实现文字识别的目的。In the above embodiment, a large amount of sample training data can be obtained to meet the training accuracy requirements of the preset neural network, improve the robustness of the target neural network, and have high versatility, which can be quickly deployed to any device to realize text recognition. the goal of.
在一些可选实施例中,本公开提供的文字识别方案可以用于招牌文字识别、票据识别、上述的车牌识别等场景中。在本公开实施例中,该文字识别方案还可以用于签发电子签证。In some optional embodiments, the character recognition solution provided by the present disclosure can be used in scenarios such as signboard character recognition, bill recognition, and the above-mentioned license plate recognition. In the embodiment of the present disclosure, the character recognition scheme can also be used to issue an electronic visa.
电子签证的签发过程中,需要申请电子签证的用户上传申请所述电子签证所需要的目标资料,该目标资料包括但不限于以下至少一项:包括往返航班信息和酒店信息中至少一项的电子表格、往返航班的票据信息、酒店开具的预定成功信息、护照、收入证明、体检信息、其他申请电子签证所需要的信息。用户在上传了上述目标资料后,需要人工核对其中的信息内容,从而签发电子签证。During the issuance process of the electronic visa, the user who needs to apply for the electronic visa uploads the target data required for applying for the electronic visa, the target data includes but is not limited to at least one of the following: electronic data including at least one of round-trip flight information and hotel information. Forms, ticket information for round-trip flights, successful reservation information issued by the hotel, passport, income proof, medical examination information, and other information required for electronic visa application. After uploading the above target information, the user needs to manually check the information content in order to issue an electronic visa.
在本公开实施例中,用户可以上传目标资料的文字图像,电子签证系统可以按照本公开提供的文字识别方案,先确定每个目标资料的文字图像对应的特征序列,进一步地,基于该特征序列,确定目标资料的文字图像中每个字符属于第一文字语言对应的目标字符,或是属于除了所述第一文字语言之外的其他文字对应的无关字符。过滤掉目标资料的文字图像中的无关字符,对目标资料的文字图像中待识别文字进行文字识别,从而得到目标文字识别结果。其中,待识别文字包括第一文字语言对应的文字,所述第一文字语言是电子签证对应的文字语言。In the embodiment of the present disclosure, the user can upload the text image of the target data, and the electronic visa system can first determine the feature sequence corresponding to the text image of each target data according to the text recognition scheme provided by the present disclosure, and further, based on the feature sequence , determining that each character in the text image of the target data belongs to the target character corresponding to the first text language or to an irrelevant character corresponding to other texts except the first text language. The irrelevant characters in the text image of the target data are filtered out, and text recognition is performed on the text to be recognized in the text image of the target data, so as to obtain the target text recognition result. Wherein, the characters to be recognized include characters corresponding to the first character language, and the first character language is the character language corresponding to the electronic visa.
例如,电子签证会采用英文签发,电子签证系统可以在用户上传的目标资料的文字图像中,确定每个字符属于英文字符,或属于其他文字对应的无关字符,过滤掉无关字符后,针对目标资料的文字图像中的英文字符进行文字识别,得到目标文字识别结果。For example, an electronic visa will be issued in English. The electronic visa system can determine, in the text image of the target data uploaded by the user, that each character belongs to an English character or an irrelevant character corresponding to other characters. The English characters in the text image are recognized, and the target text recognition result is obtained.
进一步地,电子签证系统可以基于目标文字识别结果,签发电子签证。例如,电子签证系统基于目标文字识别结果,验证该用户符合签发电子签证的条件,自动为该用户签发电子签证。Further, the electronic visa system can issue the electronic visa based on the target character recognition result. For example, the electronic visa system verifies that the user meets the conditions for issuing an electronic visa based on the target character recognition result, and automatically issues an electronic visa for the user.
上述实施例中,可以在申请电子签证时需要的目标资料的文字图像中,过滤掉其他文字对应的无关字符,对目标资料的文字图像中电子签证对应的文字进行文字识别,提高了电子签证签发的准确性、时效性,可用性高。In the above-mentioned embodiment, irrelevant characters corresponding to other characters can be filtered out in the text image of the target data required when applying for an electronic visa, and the text corresponding to the electronic visa in the text image of the target data can be recognized, which improves the issuance of electronic visas. accuracy, timeliness, and high availability.
与前述方法实施例相对应,本公开还提供了装置的实施例。Corresponding to the foregoing method embodiments, the present disclosure also provides device embodiments.
如图10所示,图10是本公开根据一示例性实施例示出的一种文字识别装置框图,装置包括:图像获取模块410,用于获取包括待识别文字和其他文字的文字图像;字符类别确定模块420,用于基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,所述类别判断结果用于表征字符类别;文字识别模块430, 用于基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果。As shown in FIG. 10, FIG. 10 is a block diagram of a character recognition apparatus shown in the present disclosure according to an exemplary embodiment. The apparatus includes: an image acquisition module 410, configured to acquire a character image including characters to be recognized and other characters; a character category The determination module 420 is used to obtain the category judgment result of each character in the text image based on the feature sequence corresponding to the text image, and the category judgment result is used to characterize the character category; the text recognition module 430 is used to The category judgment result is used to determine the target character recognition result for character recognition of the to-be-recognized character.
在一些可选实施例中,所述装置还包括:区域确定模块,用于确定所述文字图像中所述待识别文字和/或所述其他文字所在的候选区域;划分模块,用于将所述候选区域划分为多个子区域;特征序列确定模块,用于基于所述多个子区域中至少部分子区域对应的特征信息,确定所述文字图像对应的特征序列。In some optional embodiments, the apparatus further includes: a region determination module, configured to determine a candidate region where the to-be-recognized character and/or the other characters in the character image are located; a division module, configured to The candidate region is divided into multiple sub-regions; the feature sequence determination module is configured to determine the feature sequence corresponding to the character image based on the feature information corresponding to at least part of the sub-regions in the multiple sub-regions.
在一些可选实施例中,所述字符类别确定模块包括:第一确定子模块,用于基于所述文字图像对应的特征序列,确定所述文字图像包括的每个字符所属的至少一个备选字符类别和每个备选字符类别的识别率;第二确定子模块,用于针对所述每个字符,将该字符所属的所述至少一个备选字符类别中最大识别率对应的备选字符类别,作为该字符的所述类别判断结果。In some optional embodiments, the character category determination module includes: a first determination submodule, configured to determine at least one candidate to which each character included in the text image belongs based on a feature sequence corresponding to the text image The character category and the recognition rate of each candidate character category; the second determination submodule is used for, for each character, the candidate character corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs category, as the category judgment result of the character.
在一些可选实施例中,所述文字识别模块包括:第三确定子模块,用于针对所述每个字符,根据所述字符类别和字符结构之间的对应关系,确定该字符所属的最大识别率的备选字符类别对应的字符结构;第四确定子模块,用于根据该字符所属的最大识别率的备选字符类别,确定该字符属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符;第五确定子模块,用于将所述目标字符对应的所述字符结构,作为对所述待识别文字进行文字识别的所述目标文字识别结果。In some optional embodiments, the character recognition module includes: a third determination submodule, configured to, for each character, determine the maximum value of the character to which the character belongs according to the correspondence between the character category and the character structure. The character structure corresponding to the candidate character category of the recognition rate; the fourth determination submodule is used to determine that the character belongs to the target character corresponding to the character to be recognized or belongs to the target character corresponding to the character to be recognized according to the candidate character category of the maximum recognition rate to which the character belongs. The irrelevant characters corresponding to the other characters are described; the fifth determination sub-module is configured to use the character structure corresponding to the target character as the target character recognition result of performing character recognition on the character to be recognized.
在一些可选实施例中,所述第五确定子模块包括:第一确定单元,用于响应于确定该字符所属的最大识别率的备选字符类别是多个第一字符类别或多个第二字符类别中的一个,确定该字符属于所述目标字符;第二确定单元,用于响应于确定该字符所属的最大识别率的备选字符类别是第三字符类别,确定该字符属于所述无关字符。In some optional embodiments, the fifth determination sub-module includes: a first determination unit, configured to respond to determining that the candidate character category with the maximum recognition rate to which the character belongs is a plurality of first character categories or a plurality of first character categories One of the two character categories, determining that the character belongs to the target character; a second determining unit, configured to determine that the character belongs to the irrelevant characters.
在一些可选实施例中,所述多个第一字符类别包括:与第一文字语言包括的多个字符分别对应的字符类别;其中,所述第一文字语言是所述待识别文字对应的文字语言;所述多个第二字符类别包括:与多个阿拉伯数字分别对应的字符类别;所述第三字符类别包括:与多种第二文字语言包括的多个字符对应的相同的字符类别;其中,所述第二文字语言是不同于所述第一文字语言的文字语言。In some optional embodiments, the plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized ; the plurality of second character classes include: character classes corresponding to a plurality of Arabic numerals respectively; the third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein , the second script language is a script language different from the first script language.
在一些可选实施例中,所述特征序列确定模块包括:第六确定子模块,用于将所述文字图像作为用于对字符进行字符类别判断的目标神经网络的输入,获得所述目标神经网络输出的所述文字图像对应的特征序列。In some optional embodiments, the feature sequence determination module includes: a sixth determination submodule, configured to use the text image as an input of a target neural network for character category judgment on characters, and obtain the target neural network The feature sequence corresponding to the text image output by the network.
在一些可选实施例中,所述装置还包括:样本文字图像获取模块,用于获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像;其中,所述第一文字语言是所述待识别文字对应的文字语言,所述第二文字语言是不同于所述第一文字语言的文字语言;训练模块,用于将所述样本文字图像作为预设神经网络的输入,以所述样本文字图像中的字符类别标签为监督,对所述预设神经网络进行训练,得到用于对字符进行字符类别判断的目标神经网络。In some optional embodiments, the apparatus further includes: a sample text image acquisition module, configured to acquire a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; The first text language is the text language corresponding to the text to be recognized, and the second text language is a text language different from the first text language; the training module is used to use the sample text image as a preset neural network. Input, with the character category label in the sample text image as supervision, the preset neural network is trained to obtain a target neural network for judging the character category of the characters.
在一些可选实施例中,所述样本文字图像获取模块包括:第一获取子模块,用于获取包括所述第一文字语言对应的文字的第一备选文字图像;第二获取子模块,用于获取所述至少一种第二文字语言对应的备选文字语料;第一生成子模块,用于基于所述备选文字语料和所述第一备选文字图像,生成所述样本文字图像。In some optional embodiments, the sample text image acquisition module includes: a first acquisition sub-module for acquiring a first candidate text image including text corresponding to the first text language; a second acquisition sub-module for using for acquiring the candidate text corpus corresponding to the at least one second text language; a first generating submodule is configured to generate the sample text image based on the candidate text corpus and the first candidate text image.
在一些可选实施例中,所述样本文字图像获取模块包括:第三获取子模块,用于获取包括所述第一文字语言对应的文字的第一备选文字图像和包括所述至少一种第二文字语言对应的文字的第二备选文字图像;第二生成子模块,用于基于所述第一备选文字图像和所述第二备选文字图像,生成所述样本文字图像。In some optional embodiments, the sample character image obtaining module includes: a third obtaining sub-module, configured to obtain a first candidate character image including characters corresponding to the first character language and a first candidate character image including the at least one first character image. The second candidate text image of the text corresponding to the two text languages; the second generation sub-module is configured to generate the sample text image based on the first candidate text image and the second candidate text image.
在一些可选实施例中,所述样本文字图像中的字符类别标签包括以下至少一个:与所述第一文字语言包括的多个字符分别对应的多个第一字符类别标签中的至少一个; 与多个阿拉伯数字分别对应的多个第二字符类别标签中的至少一个;与多种第二文字语言包括的多个字符对应的相同的第三字符类别标签。In some optional embodiments, the character category labels in the sample text image include at least one of the following: at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language; and At least one of the plurality of second character class labels corresponding to the plurality of Arabic numerals respectively; the same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
在一些可选实施例中,所述待识别文字包括第一文字语言对应的文字,所述第一文字语言是电子签证对应的文字语言;所述文字图像包括申请所述电子签证时需要的目标资料的文字图像;所述字符类别确定模块包括:第七确定子模块,用于基于所述目标资料的文字图像对应的特征序列,确定所述目标资料的文字图像中每个字符属于所述第一文字语言对应的目标字符,或属于其他文字对应的无关字符;所述文字识别模块包括:第八确定子模块,用于确定对所述目标资料的文字图像中所述目标字符进行文字识别的目标文字识别结果;所述装置还包括:执行模块,用于基于所述目标文字识别结果,签发所述电子签证。In some optional embodiments, the text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes the target data required when applying for the electronic visa. text image; the character category determination module includes: a seventh determination sub-module for determining, based on the feature sequence corresponding to the text image of the target data, that each character in the text image of the target data belongs to the first text language Corresponding target characters, or irrelevant characters belonging to other characters; the character recognition module includes: an eighth determination sub-module, used to determine the target character recognition for character recognition of the target characters in the character image of the target data Result; the apparatus further includes: an execution module, configured to issue the electronic visa based on the target character recognition result.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place , or distributed to multiple network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present disclosure. Those of ordinary skill in the art can understand and implement it without creative effort.
本公开实施例还提供了一种计算机可读存储介质,存储介质存储有计算机程序,计算机程序用于执行上述任一所述的文字识别方法。An embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute any one of the above-described character recognition methods.
在一些可选实施例中,本公开实施例提供了一种计算机程序产品,包括计算机可读代码,当计算机可读代码在设备上运行时,设备中的处理器执行用于实现如上任一实施例提供的文字识别方法的指令。In some optional embodiments, embodiments of the present disclosure provide a computer program product, comprising computer-readable code, when the computer-readable code is executed on a device, the processor in the device executes any of the above implementations. The example provides instructions for the text recognition method.
在一些可选实施例中,本公开实施例还提供了另一种计算机程序产品,用于存储计算机可读指令,指令被执行时使得计算机执行上述任一实施例提供的文字识别方法。In some optional embodiments, the embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, and when the instructions are executed, the computer executes the character recognition method provided by any of the foregoing embodiments.
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
本公开实施例还提供了一种文字识别装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器被配置为调用所述存储器中存储的可执行指令,实现上述任一项所述的文字识别方法。An embodiment of the present disclosure further provides a character recognition device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the above A method of character recognition as described in one item.
图11为本公开实施例提供的一种文字识别装置的硬件结构示意图。该文字识别装置510包括处理器511,还可以包括输入装置512、输出装置513和存储器514。该输入装置512、输出装置513、存储器514和处理器511之间通过总线相互连接。FIG. 11 is a schematic diagram of a hardware structure of a character recognition device provided by an embodiment of the present disclosure. The character recognition device 510 includes a processor 511 , and may further include an input device 512 , an output device 513 and a memory 514 . The input device 512, the output device 513, the memory 514 and the processor 511 are connected to each other through a bus.
存储器包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器用于相关指令及数据。Memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or portable Read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.
输入装置用于输入数据和/或信号,以及输出装置用于输出数据和/或信号。输出装置和输入装置可以是独立的器件,也可以是一个整体的器件。Input means are used for inputting data and/or signals, and output means are used for outputting data and/or signals. The output device and the input device can be independent devices or an integral device.
处理器可以包括是一个或多个处理器,例如包括一个或多个中央处理器(central processing unit,CPU),在处理器是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The processor may include one or more processors, such as one or more central processing units (CPUs). In the case where the processor is a CPU, the CPU may be a single-core CPU, or a Multi-core CPU.
存储器用于存储网络设备的程序代码和数据。Memory is used to store program codes and data for network devices.
处理器用于调用该存储器中的程序代码和数据,执行上述方法实施例中的步骤。具体可参见方法实施例中的描述,在此不再赘述。The processor is configured to call the program code and data in the memory to execute the steps in the above method embodiments. For details, refer to the description in the method embodiment, which is not repeated here.
可以理解的是,图11仅仅示出了一种文字识别装置的简化设计。在实际应用中,文字识别装置还可以分别包含必要的其他元件,包含但不限于任意数量的输入/输出装置、处理器、控制器、存储器等,而所有可以实现本公开实施例的文字识别装置都在本公开的保护范围之内。It can be understood that FIG. 11 only shows a simplified design of a character recognition device. In practical applications, the character recognition device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all the character recognition devices that can implement the embodiments of the present disclosure All fall within the protection scope of the present disclosure.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或者惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the present disclosure. within the scope of protection.

Claims (14)

  1. 一种文字识别方法,包括:A text recognition method, comprising:
    获取包括待识别文字和其他文字的文字图像;Obtain text images including text to be recognized and other text;
    基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,所述类别判断结果用于表征字符类别;Based on the feature sequence corresponding to the text image, a category judgment result of each character in the text image is obtained, and the category judgment result is used to characterize the character category;
    基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果。Based on the category judgment result, a target character recognition result for performing character recognition on the to-be-recognized character is determined.
  2. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    确定所述文字图像中所述待识别文字和/或所述其他文字所在的候选区域;determining the candidate area where the to-be-recognized text and/or the other text in the text image are located;
    将所述候选区域划分为多个子区域;dividing the candidate area into a plurality of sub-areas;
    基于所述多个子区域中至少部分子区域对应的特征信息,确定所述文字图像对应的特征序列。A feature sequence corresponding to the character image is determined based on feature information corresponding to at least part of the sub-regions in the plurality of sub-regions.
  3. 根据权利要求1或2所述的方法,其中,所述基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,包括:The method according to claim 1 or 2, wherein the obtaining a category judgment result of each character in the text image based on the feature sequence corresponding to the text image, comprising:
    基于所述文字图像对应的特征序列,确定所述文字图像包括的每个字符所属的至少一个备选字符类别和每个备选字符类别的识别率;determining at least one candidate character category to which each character included in the text image belongs and the recognition rate of each candidate character category based on the feature sequence corresponding to the text image;
    针对所述每个字符,For each of the characters,
    将该字符所属的所述至少一个备选字符类别中最大识别率对应的备选字符类别,作为该字符的所述类别判断结果。The candidate character category corresponding to the maximum recognition rate in the at least one candidate character category to which the character belongs is taken as the category judgment result of the character.
  4. 根据权利要求3所述的方法,其中,所述基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果,包括:The method according to claim 3, wherein the determining a target character recognition result for performing character recognition on the to-be-recognized character based on the category judgment result comprises:
    针对所述每个字符,For each of the characters,
    根据所述字符类别和字符结构之间的对应关系,确定该字符所属的最大识别率的备选字符类别对应的字符结构;According to the correspondence between the character category and the character structure, determine the character structure corresponding to the candidate character category with the maximum recognition rate to which the character belongs;
    根据该字符所属的最大识别率的备选字符类别,确定该符属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符;According to the candidate character category of the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or to an irrelevant character corresponding to the other character;
    将所述目标字符对应的所述字符结构,作为对所述待识别文字进行文字识别的所述目标文字识别结果。The character structure corresponding to the target character is used as the target character recognition result of performing character recognition on the to-be-recognized character.
  5. 根据权利要求4所述的方法,其中,所述根据该字符所属的最大识别率的备选字符类别,确定该字符属于所述待识别文字对应的目标字符或属于所述其他文字对应的无关字符,包括:The method according to claim 4, wherein, according to the candidate character category with the maximum recognition rate to which the character belongs, it is determined that the character belongs to the target character corresponding to the character to be recognized or belongs to the irrelevant character corresponding to the other characters ,include:
    响应于确定该字符所属的最大识别率的备选字符类别是多个第一字符类别或多个第二字符类别中的一个,确定该字符属于所述目标字符;determining that the character belongs to the target character in response to determining that the character class with the highest recognition rate to which the character belongs is one of a plurality of first character classes or a plurality of second character classes;
    响应于确定该字符所属的最大识别率的备选字符类别是第三字符类别,确定该字符属于所述无关字符。The character is determined to belong to the irrelevant character in response to determining that the highest recognition rate candidate character class to which the character belongs is the third character class.
  6. 根据权利要求5所述的方法,其中,The method of claim 5, wherein,
    所述多个第一字符类别包括:与第一文字语言包括的多个字符分别对应的字符类别;其中,所述第一文字语言是所述待识别文字对应的文字语言;The plurality of first character categories include: character categories corresponding to the plurality of characters included in the first character language; wherein the first character language is the character language corresponding to the character to be recognized;
    所述多个第二字符类别包括:与多个阿拉伯数字分别对应的字符类别;The plurality of second character categories include: character categories corresponding to the plurality of Arabic numerals;
    所述第三字符类别包括:与多种第二文字语言包括的多个字符对应的相同的字符类别;其中,所述第二文字语言是不同于所述第一文字语言的文字语言。The third character class includes: the same character class corresponding to a plurality of characters included in a plurality of second script languages; wherein the second script languages are different script languages than the first script languages.
  7. 根据权利要求2-6任一项所述的方法,其中,所述确定所述文字图像对应的特征序列,包括:The method according to any one of claims 2-6, wherein the determining the feature sequence corresponding to the text image comprises:
    将所述文字图像作为用于对字符进行字符类别判断的目标神经网络的输入,获得所述目标神经网络输出的所述文字图像对应的特征序列。The character image is used as the input of a target neural network for character classification judgment of characters, and a feature sequence corresponding to the character image output by the target neural network is obtained.
  8. 根据权利要求1-7任一项所述的方法,还包括:The method according to any one of claims 1-7, further comprising:
    获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像;其中,所述第一文字语言是所述待识别文字对应的文字语言,所述第二文字语言是不同于所述第一文字语言的文字语言;Obtaining a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language; wherein the first text language is the text language corresponding to the to-be-recognized text, and the second text language is a script language different from the first script language;
    将所述样本文字图像作为预设神经网络的输入,以所述样本文字图像中的字符类别标签为监督,对所述预设神经网络进行训练,得到用于对字符进行字符类别判断的目标神经网络。The sample text image is used as the input of the preset neural network, and the character category label in the sample text image is used as supervision, and the preset neural network is trained to obtain the target neural network used to judge the character category of the character. network.
  9. 根据权利要求8所述的方法,其中,所述获取同时包括第一文字语言对应的文字和至少一种第二文字语言对应的文字的样本文字图像,包括如下至少一项:The method according to claim 8, wherein the acquiring a sample text image that includes both the text corresponding to the first text language and the text corresponding to at least one second text language includes at least one of the following:
    获取包括所述第一文字语言对应的文字的第一备选文字图像;获取所述至少一种第二文字语言对应的备选文字语料;基于所述备选文字语料和所述第一备选文字图像,生成所述样本文字图像;acquiring a first candidate text image including text corresponding to the first text language; acquiring candidate text corpus corresponding to the at least one second text language; based on the candidate text corpus and the first candidate text image, generating the sample text image;
    获取包括所述第一文字语言对应的文字的第一备选文字图像和包括所述至少一种第二文字语言对应的文字的第二备选文字图像;基于所述第一备选文字图像和所述第二备选文字图像,生成所述样本文字图像。acquiring a first candidate text image including text corresponding to the first text language and a second candidate text image including text corresponding to the at least one second text language; based on the first candidate text image and all The second candidate text image is described, and the sample text image is generated.
  10. 根据权利要求8或9所述的方法,其中,所述样本文字图像中的字符类别标签包括以下至少一个:The method according to claim 8 or 9, wherein the character category label in the sample text image includes at least one of the following:
    与所述第一文字语言包括的多个字符分别对应的多个第一字符类别标签中的至少一个;at least one of a plurality of first character category labels corresponding to a plurality of characters included in the first text language;
    与多个阿拉伯数字分别对应的多个第二字符类别标签中的至少一个;at least one of the plurality of second character category labels corresponding to the plurality of Arabic numerals respectively;
    与多种第二文字语言包括的多个字符对应的相同的第三字符类别标签。The same third character class label corresponding to the plurality of characters included in the plurality of second script languages.
  11. 根据权利要求1-10任一项所述的方法,其中,The method according to any one of claims 1-10, wherein,
    所述待识别文字包括第一文字语言对应的文字,所述第一文字语言是电子签证对应的文字语言;所述文字图像包括申请所述电子签证时需要的目标资料的文字图像;The text to be recognized includes a text corresponding to a first text language, and the first text language is a text language corresponding to an electronic visa; the text image includes a text image of the target data required when applying for the electronic visa;
    所述基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,包括:According to the feature sequence corresponding to the text image, the category judgment result of each character in the text image is obtained, including:
    基于所述目标资料的文字图像对应的特征序列,确定所述目标资料的文字图像中每个字符属于所述第一文字语言对应的目标字符,或属于其他文字对应的无关字符;Based on the feature sequence corresponding to the text image of the target data, it is determined that each character in the text image of the target data belongs to the target character corresponding to the first text language, or belongs to an irrelevant character corresponding to other texts;
    所述确定对所述待识别文字进行文字识别的目标文字识别结果,包括:The determining of the target character recognition result of performing character recognition on the to-be-recognized character includes:
    确定对所述目标资料的文字图像中所述目标字符进行文字识别的目标文字识别结果;Determine the target character recognition result of performing character recognition on the target character in the text image of the target data;
    所述方法还包括:The method also includes:
    基于所述目标文字识别结果,签发所述电子签证。Based on the target character recognition result, the electronic visa is issued.
  12. 一种文字识别装置,包括:A character recognition device, comprising:
    图像获取模块,用于获取包括待识别文字和其他文字的文字图像;An image acquisition module, used to acquire text images including text to be recognized and other text;
    字符类别确定模块,用于基于所述文字图像对应的特征序列,得到所述文字图像中每个字符的类别判断结果,所述类别判断结果用于表征字符类别;A character category determination module, configured to obtain a category judgment result of each character in the text image based on the feature sequence corresponding to the text image, and the category judgment result is used to characterize the character category;
    文字识别模块,用于基于所述类别判断结果,确定对所述待识别文字进行文字识别的目标文字识别结果。A character recognition module, configured to determine a target character recognition result for character recognition of the to-be-recognized character based on the category judgment result.
  13. 一种文字识别装置,包括:A character recognition device, comprising:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;a memory for storing the processor-executable instructions;
    其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现权利要求1-10中任一项所述的文字识别方法。Wherein, the processor is configured to invoke the executable instructions stored in the memory to implement the character recognition method according to any one of claims 1-10.
  14. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-10任一所述的文字识别方法。A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the character recognition method according to any one of the preceding claims 1-10.
PCT/CN2021/103787 2021-01-29 2021-06-30 Text recognition method and device, and storage medium WO2022160598A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110127630.5 2021-01-29
CN202110127630.5A CN112800972A (en) 2021-01-29 2021-01-29 Character recognition method and device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022160598A1 true WO2022160598A1 (en) 2022-08-04

Family

ID=75812940

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103787 WO2022160598A1 (en) 2021-01-29 2021-06-30 Text recognition method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112800972A (en)
WO (1) WO2022160598A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800972A (en) * 2021-01-29 2021-05-14 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium
CN113298188A (en) * 2021-06-28 2021-08-24 深圳市商汤科技有限公司 Character recognition and neural network training method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050232495A1 (en) * 2004-04-19 2005-10-20 International Business Machines Corporation Device for Outputting Character Recognition Results, Character Recognition Device, and Method and Program Therefor
CN110569830A (en) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 Multi-language text recognition method and device, computer equipment and storage medium
CN111178363A (en) * 2019-12-18 2020-05-19 北京旷视科技有限公司 Character recognition method and device, electronic equipment and readable storage medium
CN111563495A (en) * 2020-05-09 2020-08-21 北京奇艺世纪科技有限公司 Method and device for recognizing characters in image and electronic equipment
CN112200188A (en) * 2020-10-16 2021-01-08 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium
CN112800972A (en) * 2021-01-29 2021-05-14 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214386B (en) * 2018-09-14 2020-11-24 京东数字科技控股有限公司 Method and apparatus for generating image recognition model
CN109492643B (en) * 2018-10-11 2023-12-19 平安科技(深圳)有限公司 Certificate identification method and device based on OCR, computer equipment and storage medium
CN111582282B (en) * 2020-05-13 2024-04-12 科大讯飞股份有限公司 Text recognition method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050232495A1 (en) * 2004-04-19 2005-10-20 International Business Machines Corporation Device for Outputting Character Recognition Results, Character Recognition Device, and Method and Program Therefor
CN110569830A (en) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 Multi-language text recognition method and device, computer equipment and storage medium
CN111178363A (en) * 2019-12-18 2020-05-19 北京旷视科技有限公司 Character recognition method and device, electronic equipment and readable storage medium
CN111563495A (en) * 2020-05-09 2020-08-21 北京奇艺世纪科技有限公司 Method and device for recognizing characters in image and electronic equipment
CN112200188A (en) * 2020-10-16 2021-01-08 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium
CN112800972A (en) * 2021-01-29 2021-05-14 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium

Also Published As

Publication number Publication date
CN112800972A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
Bušta et al. E2e-mlt-an unconstrained end-to-end method for multi-language scene text
Rong et al. Recognizing text-based traffic guide panels with cascaded localization network
CN110866471A (en) Face image quality evaluation method and device, computer readable medium and communication terminal
Jain et al. Unconstrained scene text and video text recognition for arabic script
WO2022160598A1 (en) Text recognition method and device, and storage medium
CN112288018B (en) Training method of character recognition network, character recognition method and device
CN107909088B (en) Method, apparatus, device and computer storage medium for obtaining training samples
Raj et al. Helmet violation processing using deep learning
WO2020155790A1 (en) Method and apparatus for extracting claim settlement information, and electronic device
CN107368827A (en) Character identifying method and device, user equipment, server
Li et al. Publication date estimation for printed historical documents using convolutional neural networks
CN109726285A (en) A kind of file classification method, device, storage medium and terminal device
Harizi et al. Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
US11250299B2 (en) Learning representations of generalized cross-modal entailment tasks
US20200285879A1 (en) Scene text detector for unconstrained environments
CN111797762A (en) Scene recognition method and system
Zhao et al. DetectGAN: GAN-based text detector for camera-captured document images
CN115661846A (en) Data processing method and device, electronic equipment and storage medium
Gunna et al. Transfer learning for scene text recognition in Indian languages
Duan et al. Attention enhanced convnet-RNN for Chinese vehicle license plate recognition
CN112200188B (en) Character recognition method and device and storage medium
CN116263784A (en) Picture text-oriented coarse granularity emotion analysis method and device
Siddiqua et al. Recognition of Kannada characters in scene images using neural networks
Khurat et al. An open-source based automatic car detection system using iot
Tonge et al. Automatic Number Plate Recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922195

Country of ref document: EP

Kind code of ref document: A1