US20220012488A1 - Receipt identification method, apparatus, electronic device and computer-readable storage medium - Google Patents

Receipt identification method, apparatus, electronic device and computer-readable storage medium Download PDF

Info

Publication number
US20220012488A1
US20220012488A1 US17/485,511 US202117485511A US2022012488A1 US 20220012488 A1 US20220012488 A1 US 20220012488A1 US 202117485511 A US202117485511 A US 202117485511A US 2022012488 A1 US2022012488 A1 US 2022012488A1
Authority
US
United States
Prior art keywords
receipt
region
store name
determining
store
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/485,511
Inventor
Qingsong XU
Qing Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Glority Software Ltd
Original Assignee
Hangzhou Glority Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Glority Software Ltd filed Critical Hangzhou Glority Software Ltd
Assigned to HANGZHOU GLORITY SOFTWARE LIMITED reassignment HANGZHOU GLORITY SOFTWARE LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LI, QING, XU, Qingsong
Publication of US20220012488A1 publication Critical patent/US20220012488A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07DHANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
    • G07D7/00Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
    • G07D7/20Testing patterns thereon
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06K9/00469
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06K9/325
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • G06K2209/01
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the disclosure relates to the field of image processing technology, and particularly to a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium.
  • the purpose of the disclosure is to provide a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium to automatically identify relevant information on the receipt.
  • the specific technical solutions are as follows.
  • the disclosure provides a receipt identification method, which includes:
  • the region identification model is a model based on a neural network
  • the character identification model is a model based on the neural network
  • the method further includes: identifying a time region by using the region identification model, and labeling the time region, wherein the time region is a row region that conforms to a preset time feature;
  • the step of determining the store name on the receipt according to the character content of each row region includes:
  • the method further includes:
  • each store name pattern in the store name database is labeled with a corresponding store name.
  • the step of determining the store name on the receipt according to the character content of each row region includes:
  • the step of determining the store address based on the character content of each row region includes:
  • the step of using the store address as the store name on the receipt includes:
  • the step of determining the payment amount on the receipt according to the character content in the total payment region includes:
  • the rule for determining the amount stored in the rule database is: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt;
  • the step of formulating all the preset key phrases in the total amount region into the phrases to be queried includes:
  • the disclosure further provides a receipt identification apparatus, the receipt identification apparatus includes:
  • an acquisition module configured to obtain an image of a receipt to be identified
  • a first identification module configured to identify each row region in the image by using a region identification model, wherein the row region is the region where each line of characters on the receipt is located, and the region identification model is a model based on a neural network;
  • a second identification module configured to identify the character content in each row region by using a character identification model, wherein the character identification model is a model based on a neural network;
  • a determining module configured to determine the time information, store name, and payment amount on the receipt according to the character content of each row region
  • the first identification module is further configured to identify a time region by using the region identification model when identifying each row region on the receipt in the image, and label the time region, wherein the time region is a row region that conforms to a preset time feature;
  • the step in which the determining module determines the store name on the receipt according to the character content of each row region includes:
  • the first identification module is further configured to identify the region where the pattern in the image is located by using the region identification model;
  • the determining module is further configured to determine whether there is a matched store name pattern in the store name database according to the pattern; if there is the matched store name pattern in the store name database, the determining module is configured to determine the store name corresponding to the matched store name pattern as the store name on the receipt, if there is no matched store name pattern in the store name database, the determining module is configured to identify the characters in the pattern, and use the identified characters in the pattern as the store name on the receipt; if there are no characters in the pattern, the determining module is configured to conduct a search in the store name database according to the character content of each row region to determine the store name on the receipt;
  • each store name pattern in the store name database is labeled with a corresponding store name.
  • the step in which the determining module determines the store name on the receipt according to the character content of each row region includes: conducting a search in the store name database based on the character content of each row region, if the store name on the receipt is not obtained, determining the store address based on the character content of each row region, and using the store address as the store name on the receipt.
  • the step in which the determining module determines the store address based on the character content of each row region includes:
  • the step in which the determining module uses the store address as the store name on the receipt includes:
  • the step in which the determining module determines the payment amount on the receipt according to the character content in the total payment region includes:
  • the rule for determining the amount stored in the rule database is: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt;
  • the step in which the determining module formulates all the preset key phrases in the total amount region into phrases to be queried includes:
  • the disclosure further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory completes the communication between each other through the communication bus;
  • the memory is configured to store computer programs
  • the processor is configured to implement the steps of the receipt identification method described in the first aspect when executing the computer program stored in the memory.
  • the disclosure further provides a non-transitory computer-readable storage medium in which a computer program is stored.
  • a computer program is stored.
  • the steps of the receipt identification method described in the first aspect are implemented.
  • the disclosure first uses the region identification model to identify each row region, then uses the character identification model to identify the character content in each row region, and finally determines the time information, store name, and payment amount in the receipt based on the character content, thereby realizing the automatic identification and display of relevant information on the receipt, which improves the efficiency of receipt processing. Further, the total amount region is determined by searching for the preset keywords in the character content to determine the payment amount on the receipt, thereby improving the accuracy and efficiency of identification of the payment amount.
  • FIG. 1 is a schematic flowchart of a receipt identification method according to an embodiment of the disclosure.
  • FIG. 2A to FIG. 2D are specific examples of receipts embodied in an embodiment of the disclosure.
  • FIG. 3 is a schematic structural diagram of a receipt identification apparatus embodied in an embodiment of the disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device embodied in an embodiment of the disclosure.
  • embodiments of the disclosure provide a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium.
  • a receipt identification method in the embodiment of the disclosure can be applied to a receipt identification apparatus in the embodiment of the disclosure, and the receipt identification apparatus can be configured on an electronic device.
  • the electronic device may be a personal computer, a mobile terminal, etc.
  • the mobile terminal may be a hardware device with various operating systems, such as a mobile phone or a tablet computer.
  • FIG. 1 is a schematic flowchart of a receipt identification method according to an embodiment of the disclosure. Please refer to FIG. 1 , a receipt identification method can include the following steps.
  • step S 101 an image of the receipt to be identified is obtained.
  • the receipt described in this embodiment can be invoices, bills, tax bills, receipts, shopping lists, catering receipts, insurance policies, reimbursement forms, express orders, itineraries, tickets, and other documents containing payment amount.
  • the language of characters on the receipt can be Chinese, English, Japanese, Korean, German, etc., which should not be construed as a limitation to the disclosure.
  • each row region of the receipt in the image is identified by using a region identification model, wherein the row region is the region where each line of characters is located.
  • the region identification module may be a neural network model obtained by pre-training.
  • the image of receipt is input into the region identification model, and the region identification model can identify the region where each line of characters in the receipt is located, and label each identified row region.
  • step S 103 the character content in each row region is identified by using the character identification model.
  • the character identification module may be a neural network model obtained by pre-training. After each row region is identified, the receipt image labeled with various row regions can be input into the character identification model, or each row region is directly sliced and the sliced image is input into the character identification model. The character content in each row region is identified by using the character identification model.
  • the characters in the receipt can be the characters in a printed font or a handwritten font. Since there are differences in the character set corresponding to printed fonts and handwritten fonts, if the same character model is used to identify printed fonts and handwritten fonts, the accuracy of character identification will be reduced. Therefore, in order to improve the accuracy of character identification, different character identification models are adopted for different fonts.
  • the character identification model can include an identification model for printed fonts and an identification model for handwritten fonts.
  • the identification model for printed fonts and the identification model for handwritten fonts are trained separately. For handwritten fonts and printed fonts, different character training sets can be adopted to train the corresponding character identification models.
  • step S 104 the time information, store name, and payment amount on the receipt are determined according to the character content of each row region.
  • the payment amount can be determined in the following manner: at least one row region containing at least one preset keyword in the character content is determined as the total amount region; and the payment amount on the receipt is determined according to the character content in the total amount region.
  • the preset keyword is used to indicate the project name of each payment project in the payment region.
  • the keyword can include: “subtotal”, “total”, “cash”, “change”, “discount”, etc.
  • the keywords in the international receipt can include: “subtotal”, “total”, “ttl”, “tax”, “gratuity”, “cash”, “change”, “discount”, “service”, “payment”, “visa”, etc.
  • the row region containing the preset keyword can be found, and all the row regions containing the preset keyword are determined as the total amount region, so as to find the value of the amount corresponding to the preset keyword from the character content in the total amount region, and then determine the payment amount on the receipt.
  • the step of determining the payment amount on the receipt according to the character content of the total amount region may include: formulating all the preset key phrases in the total amount region into phrases to be queried, selecting the target amount determining rule corresponding to the phrases to be queried from a rule database, wherein there are phrases composed of different preset key phrases and the rules for determining the amount corresponding to various phrases in the rule database; and determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
  • the preset keywords in the total amount region are arranged and combined to obtain the phrase to be queried.
  • the preset keywords may be arranged and combined according to the initial alphabetical order of the preset keywords.
  • the preset keywords contained in one receipt are “subtotal”, “tax”, and “total”, and the phrase to be queried is subtotal-tax-total after the preset keywords are arranged and combined in an initial alphabetical order.
  • the preset keywords contained in another receipt are “subtotal”, “tax”, “total”, and “visa”, and the phrase to be queried is subtotal-tax-total-visa after the preset keywords are arranged and combined in an initial alphabetical order.
  • the phrase to be queried may be composed according to arrangement and combination of the initial alphabet of the Chinese pinyin abbreviation of the preset keywords.
  • the preset keywords are “ (which is translated as “subtotal” in English)”, “ (which is translated as “tax” in English)”, and “ (which is translated as “total” in English)”, and the phrase to be queried that can be obtained is “ (which is translated as “tax-subtotal-total” in English)” according to arrangement and combination of the initial alphabet of the Chinese pinyin abbreviation of the preset keywords.
  • the amount determining rule stored in the rule database may be: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt. Therefore, the step of determining the payment amount on the receipt based on the character content in the total amount region according to the target amount determining rule is specifically: using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
  • a phrase in the rule database is subtotal-tax-total, and the corresponding amount determining rule is set to select the amount value corresponding to the preset keyword “total” as the payment amount. Then, if the phrase to be queried is also subtotal-tax-total, the target amount determining rule is to select the amount value corresponding to the preset keyword “total” as the payment amount. Therefore, the amount value corresponding to the preset keyword “total” in the total amount region is used as the payment amount.
  • a phrase in the rule database is subtotal-tax-total-visa, and the corresponding amount determining rule is set to select the amount value corresponding to the preset keyword “visa” as the payment amount.
  • the target amount determining rule is to select the amount value corresponding to the preset keyword “visa” as the payment amount. Therefore, the amount value corresponding to the preset keyword “visa” in the total amount region is used as the payment amount.
  • the following table exemplarily shows some phrases and their corresponding amount determining rules.
  • time information typically the time information is displayed on the receipt in a certain time format, that is, the time information conforms to certain time feature, such as the feature with date and slash, the feature with date and English character, and so on.
  • the time information displayed on receipt can be: “30 Jan′ 18”, “02/10/17”, “22/11/2017”, “Apr 06′ 18”, “Apr. 4, 2018”, “2018-02-02”, “26 Oct. 2017”, “Nov. 18. 2017”, “Mar. 24, 2018”, “01012017”, etc.
  • the region that conforms to the preset time feature can be found from the row region, that is, the region (time region) where the time information is located, and then the time information of the receipt can be determined.
  • the region that conforms to the preset time feature in the row region is identified through the neural network model.
  • the neural network model is established through pre-training, and the training samples are time pictures in various formats.
  • the specific method is as follows: First, in the process of identifying each row region of the receipt in the image in step S 102 , the step further includes: identifying a time region by using the region identification model, and labeling the time region, wherein the time region is a row region that conforms to the preset time feature.
  • step S 104 the step of determining the time information of the receipt according to the character content of each row region includes: determining the time information of the receipt according to the character content in the time region. For example, if the character in the time region is “2018-02-02”, it can be determined that the time information of receipt is “Feb. 2, 2018”.
  • the store name can be determined in the following manner: conducting a search in a store name database according to the character content of each row region to determine the store name on the receipt.
  • Various store names are pre-stored in the store name database, and the search is conducted in the store name database for the character content in each row region one by one. If the character content in a certain row region can be found in the store name database, the store name found in the store name database is used as the store name on the receipt. If the character content cannot be found through the search, the store address can be determined based on the character content of each row region, and the store address can be used as the store name on the receipt.
  • the store address can be determined in the following ways: 1. if there are preset characters such as “ (which is translated as “address” in English)”, “address”, “add.”, etc. for indicating the address in a row region, it can be determined that the characters following these preset characters are address information; 2. if there are characters corresponding to the administrative region name or street name number, these characters are determined as address information.
  • the store address is determined as the store name.
  • the address information in the store address used to indicate a smaller region may be selected as the store name.
  • the street+number or building+floor room number information in the store address may be selected as the store name.
  • the address information for indicating a smaller region can be the address information indicating the smallest region or the second smallest region among the administrative region name, and such information is typically the characters in the last part of the Chinese address or the first part of the English address. For example, if the store address information includes No. 10 Nanjing East Road, “No. 10 Nanjing East Road” is selected as the store name. If the store address information includes Raffles Plaza 302 , “Raffles Plaza 302 ” is selected as the store name.
  • the store address information contains “store 601 XX mall”, “store 601 XX mall” is selected as the store name.
  • the address information in the store address information for indicating a larger region is not included in the store name, so as to keep the store name short. For example, if the store address information includes “No. 10, Nanjing East Road, Huangpu District, Shanghai”, then “Huangpu District, Shanghai” is ignored, and only “No. 10, Nanjing East Road” is selected as the store name, so that the store name can be simplified.
  • the store name can be determined by the pattern, and the specific method is as follows:
  • the store name corresponding to the matched store name pattern is determined as the store name on the receipt, and the store name on the receipt determined based on the character content in each row region is discarded. If there is no matched store name pattern in the store name database, but the characters in the pattern can be identified, then the identified characters in the pattern are used as the store name on the receipt. Similarly, the store name on the receipt determined based on the character content in each row region is discarded.
  • the store name cannot be found in the store name database based on the pattern, and there are no characters in the pattern, and the store name cannot be found in the store name database based on the character content of each row region, it is also possible to determine the store address based on the character content of each row region, and the store address is used as the store name on the receipt.
  • the region identification model can identify the time region A 1 in the process of identifying each row region, and then it can be determined that the time information is 8/8/2017 according to the identification result of the character identification model.
  • the receipt includes a pattern, and the region identification model can further identify the region A 2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “Hudson News” in the pattern as the store name.
  • the receipt contains the preset keywords “subtotal”, “total”, “cash”, and “change”.
  • the row region A 3 where these keywords are located is determined as the total amount region, and these key phrases are formulated into the phrase to be queried, namely “cash-change-subtotal-total”.
  • An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “total” as the payment amount, the amount value 2.54 corresponding to the keyword “total” in the total amount region A 3 is used as the payment amount.
  • the region identification model can identify the time region B 1 in the process of identifying each row region, and then it can be determined that the time information is 08/03/17 according to the identification result of the character identification model.
  • the receipt includes a pattern, and the region identification model can further identify the region B 2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “ingles” in the pattern as the store name.
  • the receipt contains the preset keywords “TAX”, “BALANCE”, “TOTAL AMOUNT”, and “CHANGE”.
  • the row region B 3 where these keywords are located is determined as the total amount region, and these key phrases are formulated into the phrase to be queried, namely “BALANCE-CHANGE-TAX-TOTAL AMOUNT”.
  • An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “TOTAL AMOUNT” as the payment amount, the amount value 4.44 corresponding to the keyword “TOTAL AMOUNT” in the total amount region B 3 is used as the payment amount.
  • the region identification model can identify the time region C 1 in the process of identifying each row region, and then it can be determined that the time information is 08/02/17 and 10/31/17 according to the identification result of the character identification model.
  • the receipt includes a pattern, and the region identification model can further identify the region C 2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “TARGET” in the pattern as the store name.
  • the receipt contains the preset keywords “SUBTOTAL”, “TAX”, and “TOTAL”.
  • the row region C 3 where these keywords are located is determined as the total amount region, and these key phrases are formulated into the phrase to be queried, namely “SUBTOTAL-TAX-TOTAL”.
  • An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “TOTAL” as the payment amount, the amount value 4.86 corresponding to the keyword “TOTAL” in the total amount region C 3 is used as the payment amount.
  • the region identification model can identify the time region D 1 in the process of identifying each row region, and then it can be determined that the time information is 26/12/2017 according to the identification result of the character identification model.
  • the receipt includes a pattern, and the region identification model can further identify the region D 2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “RTA” in the pattern as the store name. Based on the character identification result of each row region, it can be determined that the receipt contains a preset keyword “Total Amount”.
  • the row region D 3 where the keyword is located is determined as the total amount region, and the key phrase is formed into the phrase to be queried, namely “Total Amount”.
  • An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “Total Amount” as the payment amount, the amount value 61.00 corresponding to the keyword “Total Amount” in the total amount region D 3 is used as the payment amount.
  • the region identification model can be obtained through the following process: labeling each receipt image sample in the receipt image sample set to label each row region in each receipt image sample; training the neural network through the labeled receipt image sample set to obtain the region identification model.
  • labeling each row region it is possible to further label the region that conforms to the preset time feature as the time region.
  • the region identification model that is trained through a large number of various types of time region samples can identify each row region while identifying and labeling the time region.
  • the character identification model can be obtained through the following process: labeling each row region that is labeled in the training process of the region identification model to label the characters in each row region; training the neural network through each row region that is labeled to obtain the character identification model.
  • the training set of the character identification model may be different from the training set of the region identification model, the disclosure provides no limitation thereto.
  • the region identification model is first used to identify each row region, and then the character identification model is used to identify the character content in each row region, and finally the time information, store name, and payment amount in the receipt are determined based on the character content, thereby realizing the automatic identification and display of relevant information on the receipt, which improves the efficiency of processing receipts.
  • FIG. 3 is a schematic structural diagram of a receipt identification apparatus embodied in an embodiment of the disclosure. Please refer to FIG. 3 .
  • a receipt identification apparatus can include:
  • an acquisition module 201 configured to obtain an image of a receipt to be identified
  • a first identification module 202 configured to identify each row region of the receipt in the image by using a region identification model, wherein the row region is the region where each line of characters on the receipt is located;
  • a second identification module 203 configured to identify the character content in each row region by using a character identification model
  • a determining module 204 configured to determine the time information, store name, and payment amount on the receipt according to the character content of each row region.
  • the step in which the determining module 204 determines the payment amount on the receipt according to the character contents in each row region includes:
  • the first identification module 202 is further configured to identify a time region by using the region identification model when identifying each row region on the receipt in the image, and label the time region, wherein the time region is a row region that conforms to a preset time feature.
  • the step in which the determining module 204 determines the time information on the receipt according to the character contents in each row region is specifically as follows:
  • the step in which the determining module 204 determines the store name on the receipt according to the character content of each row region is specifically as follows:
  • the first identification module 202 is further configured to identify the region where the pattern in the image is located by using the region identification model.
  • the determining module 204 is further configured to determine whether there is a matched store name pattern in the store name database according to the pattern; if there is the matched store name pattern in the store name database, the store name corresponding to the matched store name pattern is determined as the store name on the receipt, if there is no matched store name pattern in the store name database, the characters in the pattern are identified and used as the store name on the receipt; if there are no characters in the pattern, a search is conducted in the store name database again according to the character content of each row region to determine the store name on the receipt;
  • each store name pattern in the store name database is labeled with a corresponding store name.
  • the store name corresponding to the matched store name pattern is determined as the store name on the receipt, and the store name on the receipt determined based on the character content in each row region is discarded. If there is no matched store name pattern in the store name database, but the characters in the pattern can be identified, then the identified characters in the pattern are used as the store name on the receipt. Similarly, the store name on the receipt determined based on the character content in each row region is discarded.
  • the determining module 204 is further configured for searching in the store name database according to the character content of each row region. If the store name on the receipt cannot be determined, the store address is determined based on the character content in each row region, and the store address is used as the store name on the receipt.
  • the step in which the determining module 204 determines the store address based on the character content of each row region is specifically as follows:
  • the step in which the determining module 204 uses the store address as the store name on the receipt includes:
  • the step in which the determining module 204 determines the payment amount on the receipt according to the character content in the total payment region includes:
  • the rule for determining the amount stored in the rule database is: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt.
  • the step in which the determining module 204 determines the payment amount on the receipt based on the character content in the total amount region according to the target amount determining rule is specifically as follows:
  • the step in which the determining module 204 formulates all the preset key phrases in the total amount region into phrases to be queried is specifically as follows:
  • FIG. 4 is a schematic structural diagram of an electronic device embodied in an embodiment of the disclosure. Please refer to FIG. 4 , the electronic device includes a processor 301 , a communication interface 302 , a memory 303 , and a communication bus 304 , wherein the processor 301 , the communication interface 302 , and the memory 303 completes the communication between each other through the communication bus 304 .
  • the memory 303 is configured to store computer programs.
  • the processor 301 is configured to implement the following steps when executing the computer program stored in the memory 303 :
  • step of determining the payment amount on the receipt according to the character contents in each row region includes:
  • the communication bus mentioned in the description related to electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be categorized into address bus, data bus, control bus, etc. For ease of illustration, only a thick line is adopted in the figure to represent the communication bus, which does not mean that there is only one bus or only one type of bus.
  • the communication interface is configured to implement communication between the electronic device and other devices.
  • the memory may include random access memory (RAM), and may also include non-volatile memory (NVM), such as at least one disk memory.
  • NVM non-volatile memory
  • the memory may also be at least one storage apparatus located far away from the processor described above.
  • the aforementioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.
  • the processor can also be a digital signal processing (DSP), a dedicated Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • An embodiment of the disclosure further provides a computer-readable storage medium in which a computer program is stored.
  • the computer program is executed by a processor, the steps of the above-mentioned receipt identification method are realized.

Abstract

The disclosure provides a receipt identification method, apparatus, electronic device and computer-readable storage medium. The method includes: obtaining (S101) an image of the receipt to be identified; identifying (S102) each row region of the receipt in the image by using a region identification model, wherein the row region is the region where each line of characters is located; identifying (S103) the character content in each row region by using the character identification model; determining (S104) time information, store name and payment amount on the receipt according to the character contents in each row region. The solution provided by the disclosure can automatically identify the relevant information on the receipt.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This is a continuation-in-part application of International Application No. PCT/CN2019/103848, filed on Aug. 30, 2019, which claims the priority benefits of China Application No. 201910386149.0, filed on May 9, 2019. The entirety of each of the above-mentioned patent applications is hereby incorporated by reference herein and made a part of this specification.
  • BACKGROUND Field of the Disclosure
  • The disclosure relates to the field of image processing technology, and particularly to a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium.
  • Description of Related Art
  • With the continuous development of the economy, people's consumption levels continue to improve. In order to protect consumers' rights, receipts have become a proof and effective reimbursement documents for consumers. Therefore, financial personnel need to process a large number of receipts every day to obtain information on receipts, such as ticketing time, ticketing store, payment amount, etc. In addition, there is an increasing number of people who utilize accounting classification statistics to keep a record of their own spending habits. Currently, people usually keep accounts by manually recording relevant information on receipts. Therefore, how to automatically identify the relevant information on the receipt is very important for financial personnel and individuals who keep accounting classification statistics.
  • SUMMARY OF THE DISCLOSURE
  • The purpose of the disclosure is to provide a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium to automatically identify relevant information on the receipt. The specific technical solutions are as follows.
  • In the first aspect, the disclosure provides a receipt identification method, which includes:
  • obtaining an image of a receipt to be identified;
  • identifying each row region in the image by using a region identification model, wherein the row region is the region where each line of characters on the receipt is located, and the region identification model is a model based on a neural network;
  • identifying a character content in each row region by using a character identification model, wherein the character identification model is a model based on the neural network; and
  • determining time information, store name and payment amount on the receipt according to the character content in each row region;
  • wherein the step of determining the payment amount on the receipt according to the character content in each row region includes:
  • determining at least one row region where the character content containing at least one preset keyword is located as the total amount region; and
  • determining the payment amount on the receipt according to the character content in the total amount region.
  • Optionally, when identifying each row region in the image, the method further includes: identifying a time region by using the region identification model, and labeling the time region, wherein the time region is a row region that conforms to a preset time feature;
  • the step of determining the time information on the receipt according to the character content in each row region includes:
  • determining the time information on the receipt according to the character content of the time region.
  • Optionally, the step of determining the store name on the receipt according to the character content of each row region includes:
  • conducting a search in a store name database according to the character content of each row region to determine the store name on the receipt.
  • Optionally, when the receipt includes a pattern, the method further includes:
  • identifying a region where the pattern in the image is located by using the region identification model;
  • determining whether there is a matched store name pattern in a store name database according to the pattern; if there is the matched store name pattern in the store name database, determining a store name corresponding to the matched store name pattern as the store name on the receipt, if there is no matched store name pattern in the store name database, identifying the characters in the pattern, and using the characters identified in the pattern as the store name on the receipt; if there are no characters in the pattern, conducting a search in the store name database according to the character content of each row region to determine the store name on the receipt; and
  • wherein each store name pattern in the store name database is labeled with a corresponding store name.
  • Optionally, the step of determining the store name on the receipt according to the character content of each row region includes:
  • conducting the search in the store name database based on the character content of each row region, if the store name on the receipt is not obtained, determining a store address based on the character content of each row region, and using the store address as the store name on the receipt.
  • Optionally, the step of determining the store address based on the character content of each row region includes:
  • if a preset character used to indicate an address appears in a certain row region, using the character following the preset character as the store address; and/or,
  • if characters corresponding to the administrative region name or street name appear in a row region, using these characters as the store address;
  • the step of using the store address as the store name on the receipt includes:
  • selecting the address information that represents the smaller region in the store address as the store name.
  • Optionally, the step of determining the payment amount on the receipt according to the character content in the total payment region includes:
  • formulating all the preset key phrases in the total amount region into phrases to be queried, and selecting the target amount determining rule corresponding to the phrases to be queried from a rule database; wherein, there are phrases composed of different preset key phrases and the rule for determining the amount corresponding to each of phrases in the rule database;
  • determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
  • Optionally, the rule for determining the amount stored in the rule database is: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt;
  • the step of determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule includes:
  • using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
  • Optionally, the step of formulating all the preset key phrases in the total amount region into the phrases to be queried includes:
  • formulating all the preset keywords in the total amount region into the phrase to be queried by an initial alphabetical order.
  • In the second aspect, the disclosure further provides a receipt identification apparatus, the receipt identification apparatus includes:
  • an acquisition module configured to obtain an image of a receipt to be identified;
  • a first identification module configured to identify each row region in the image by using a region identification model, wherein the row region is the region where each line of characters on the receipt is located, and the region identification model is a model based on a neural network;
  • a second identification module configured to identify the character content in each row region by using a character identification model, wherein the character identification model is a model based on a neural network; and
  • a determining module configured to determine the time information, store name, and payment amount on the receipt according to the character content of each row region;
  • the step in which the determining module determines the payment amount on the receipt according to the character content in each row region includes:
  • determining at least one row region where the character content containing at least one preset keyword is located as the total amount region; and
  • determining the payment amount on the receipt according to the character content in the total amount region.
  • Optionally, the first identification module is further configured to identify a time region by using the region identification model when identifying each row region on the receipt in the image, and label the time region, wherein the time region is a row region that conforms to a preset time feature;
  • the step in which the determining module determines the time information on the receipt according to the character contents in each row region includes:
  • determining the time information on the receipt according to the character content of the time region.
  • Optionally, the step in which the determining module determines the store name on the receipt according to the character content of each row region includes:
  • conducting a search in a store name database according to the character content of each row region to determine the store name on the receipt.
  • Optionally, when the receipt includes a pattern,
  • the first identification module is further configured to identify the region where the pattern in the image is located by using the region identification model;
  • the determining module is further configured to determine whether there is a matched store name pattern in the store name database according to the pattern; if there is the matched store name pattern in the store name database, the determining module is configured to determine the store name corresponding to the matched store name pattern as the store name on the receipt, if there is no matched store name pattern in the store name database, the determining module is configured to identify the characters in the pattern, and use the identified characters in the pattern as the store name on the receipt; if there are no characters in the pattern, the determining module is configured to conduct a search in the store name database according to the character content of each row region to determine the store name on the receipt;
  • wherein each store name pattern in the store name database is labeled with a corresponding store name.
  • Optionally, the step in which the determining module determines the store name on the receipt according to the character content of each row region includes: conducting a search in the store name database based on the character content of each row region, if the store name on the receipt is not obtained, determining the store address based on the character content of each row region, and using the store address as the store name on the receipt.
  • Optionally, the step in which the determining module determines the store address based on the character content of each row region includes:
  • if a preset character used to indicate an address appears in a certain row region, using the character following the preset character as the store address; and/or,
  • if characters corresponding to the administrative region name or street name appear in a row region, using these characters as the store address;
  • the step in which the determining module uses the store address as the store name on the receipt includes:
  • selecting the address information that represents the smaller region in the store address as the store name.
  • Optionally, the step in which the determining module determines the payment amount on the receipt according to the character content in the total payment region includes:
  • formulating all the preset key phrases in the total amount region into phrases to be queried, and selecting the target amount determining rule corresponding to the phrases to be queried from a rule database; wherein, there are phrases composed of different preset key phrases and the rules for determining the amount corresponding to various phrases in the rule database;
  • determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
  • Optionally, the rule for determining the amount stored in the rule database is: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt;
  • the step in which the determining module determines the payment amount on the receipt based on the character content in the total amount region according to the target amount determining rule includes:
  • using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
  • Optionally, the step in which the determining module formulates all the preset key phrases in the total amount region into phrases to be queried includes:
  • formulating all the preset keywords in the total amount region into the phrase to be queried by an initial alphabetical order.
  • In the aspect of a third party, the disclosure further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory completes the communication between each other through the communication bus;
  • the memory is configured to store computer programs;
  • the processor is configured to implement the steps of the receipt identification method described in the first aspect when executing the computer program stored in the memory.
  • In a fourth aspect, the disclosure further provides a non-transitory computer-readable storage medium in which a computer program is stored. When the computer program is executed by a processor, the steps of the receipt identification method described in the first aspect are implemented.
  • Compared with the current technologies, after obtaining the image of the receipt to be identified, the disclosure first uses the region identification model to identify each row region, then uses the character identification model to identify the character content in each row region, and finally determines the time information, store name, and payment amount in the receipt based on the character content, thereby realizing the automatic identification and display of relevant information on the receipt, which improves the efficiency of receipt processing. Further, the total amount region is determined by searching for the preset keywords in the character content to determine the payment amount on the receipt, thereby improving the accuracy and efficiency of identification of the payment amount.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the drawings of the embodiments will be briefly introduced below. Clearly, the drawings in the following description only relate to some embodiments of the present disclosure, rather than limit the present disclosure.
  • FIG. 1 is a schematic flowchart of a receipt identification method according to an embodiment of the disclosure.
  • FIG. 2A to FIG. 2D are specific examples of receipts embodied in an embodiment of the disclosure.
  • FIG. 3 is a schematic structural diagram of a receipt identification apparatus embodied in an embodiment of the disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device embodied in an embodiment of the disclosure.
  • DESCRIPTION OF EMBODIMENTS
  • Hereinafter, a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium provided by the disclosure will be further described in detail with reference to the accompanying drawings and specific embodiments. According to the claims and the following description, the advantages and features of the disclosure will be clearer. It should be noted that all the drawings are illustrated in a very simplified form with imprecise proportions, which are only used to conveniently and clearly assist in explaining the purpose of the embodiments of the disclosure.
  • In order to solve the problems of the current technologies, embodiments of the disclosure provide a receipt identification method, a receipt identification apparatus, an electronic device, and a computer-readable storage medium.
  • It should be noted that a receipt identification method in the embodiment of the disclosure can be applied to a receipt identification apparatus in the embodiment of the disclosure, and the receipt identification apparatus can be configured on an electronic device. Specifically, the electronic device may be a personal computer, a mobile terminal, etc., and the mobile terminal may be a hardware device with various operating systems, such as a mobile phone or a tablet computer.
  • FIG. 1 is a schematic flowchart of a receipt identification method according to an embodiment of the disclosure. Please refer to FIG. 1, a receipt identification method can include the following steps.
  • In step S101, an image of the receipt to be identified is obtained.
  • The receipt described in this embodiment can be invoices, bills, tax bills, receipts, shopping lists, catering receipts, insurance policies, reimbursement forms, express orders, itineraries, tickets, and other documents containing payment amount. The language of characters on the receipt can be Chinese, English, Japanese, Korean, German, etc., which should not be construed as a limitation to the disclosure.
  • In step S102, each row region of the receipt in the image is identified by using a region identification model, wherein the row region is the region where each line of characters is located.
  • The region identification module may be a neural network model obtained by pre-training. The image of receipt is input into the region identification model, and the region identification model can identify the region where each line of characters in the receipt is located, and label each identified row region.
  • In step S103, the character content in each row region is identified by using the character identification model.
  • The character identification module may be a neural network model obtained by pre-training. After each row region is identified, the receipt image labeled with various row regions can be input into the character identification model, or each row region is directly sliced and the sliced image is input into the character identification model. The character content in each row region is identified by using the character identification model.
  • The characters in the receipt can be the characters in a printed font or a handwritten font. Since there are differences in the character set corresponding to printed fonts and handwritten fonts, if the same character model is used to identify printed fonts and handwritten fonts, the accuracy of character identification will be reduced. Therefore, in order to improve the accuracy of character identification, different character identification models are adopted for different fonts. The character identification model can include an identification model for printed fonts and an identification model for handwritten fonts. The identification model for printed fonts and the identification model for handwritten fonts are trained separately. For handwritten fonts and printed fonts, different character training sets can be adopted to train the corresponding character identification models.
  • In step S104, the time information, store name, and payment amount on the receipt are determined according to the character content of each row region.
  • Specifically, the payment amount can be determined in the following manner: at least one row region containing at least one preset keyword in the character content is determined as the total amount region; and the payment amount on the receipt is determined according to the character content in the total amount region.
  • The preset keyword is used to indicate the project name of each payment project in the payment region. For example, the keyword can include: “subtotal”, “total”, “cash”, “change”, “discount”, etc., and the keywords in the international receipt can include: “subtotal”, “total”, “ttl”, “tax”, “gratuity”, “cash”, “change”, “discount”, “service”, “payment”, “visa”, etc. Based on the character content of each row region, the row region containing the preset keyword can be found, and all the row regions containing the preset keyword are determined as the total amount region, so as to find the value of the amount corresponding to the preset keyword from the character content in the total amount region, and then determine the payment amount on the receipt.
  • Specifically, the step of determining the payment amount on the receipt according to the character content of the total amount region may include: formulating all the preset key phrases in the total amount region into phrases to be queried, selecting the target amount determining rule corresponding to the phrases to be queried from a rule database, wherein there are phrases composed of different preset key phrases and the rules for determining the amount corresponding to various phrases in the rule database; and determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
  • All the preset keywords in the total amount region are arranged and combined to obtain the phrase to be queried. For example, the preset keywords may be arranged and combined according to the initial alphabetical order of the preset keywords. For example, the preset keywords contained in one receipt are “subtotal”, “tax”, and “total”, and the phrase to be queried is subtotal-tax-total after the preset keywords are arranged and combined in an initial alphabetical order. The preset keywords contained in another receipt are “subtotal”, “tax”, “total”, and “visa”, and the phrase to be queried is subtotal-tax-total-visa after the preset keywords are arranged and combined in an initial alphabetical order. In receipts with Chinese characters, the phrase to be queried may be composed according to arrangement and combination of the initial alphabet of the Chinese pinyin abbreviation of the preset keywords. For example, the preset keywords are “
    Figure US20220012488A1-20220113-P00001
    Figure US20220012488A1-20220113-P00002
    (which is translated as “subtotal” in English)”, “
    Figure US20220012488A1-20220113-P00003
    (which is translated as “tax” in English)”, and “
    Figure US20220012488A1-20220113-P00004
    (which is translated as “total” in English)”, and the phrase to be queried that can be obtained is “
    Figure US20220012488A1-20220113-P00005
    (which is translated as “tax-subtotal-total” in English)” according to arrangement and combination of the initial alphabet of the Chinese pinyin abbreviation of the preset keywords.
  • In this embodiment, the amount determining rule stored in the rule database may be: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt. Therefore, the step of determining the payment amount on the receipt based on the character content in the total amount region according to the target amount determining rule is specifically: using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
  • For example, a phrase in the rule database is subtotal-tax-total, and the corresponding amount determining rule is set to select the amount value corresponding to the preset keyword “total” as the payment amount. Then, if the phrase to be queried is also subtotal-tax-total, the target amount determining rule is to select the amount value corresponding to the preset keyword “total” as the payment amount. Therefore, the amount value corresponding to the preset keyword “total” in the total amount region is used as the payment amount. In another example, a phrase in the rule database is subtotal-tax-total-visa, and the corresponding amount determining rule is set to select the amount value corresponding to the preset keyword “visa” as the payment amount. Then, if the phrase to be queried is also subtotal-tax-total-visa, the target amount determining rule is to select the amount value corresponding to the preset keyword “visa” as the payment amount. Therefore, the amount value corresponding to the preset keyword “visa” in the total amount region is used as the payment amount.
  • The following table exemplarily shows some phrases and their corresponding amount determining rules.
  • phrases amount determining rules
    gratuity-purchase-total total
    cash-total cash
    credit card-total credit card
    credit-fuel total credit
    subtotal-tax-total-visa visa
    balance due-cash-change balance due
    purchase-total aud total aud
    amount usd amount usd
    subtotal usd-tip usd-total usd total usd
    tip-total total
    cashless-change-subtotal-take transaction amount
    out total-tax-transaction amount
    amount due inc gst-amount amount tendered c card
    tendered c card-total parking fee
  • Next, the method of determining time information and store name is described below.
  • Regarding time information, typically the time information is displayed on the receipt in a certain time format, that is, the time information conforms to certain time feature, such as the feature with date and slash, the feature with date and English character, and so on. For example, the time information displayed on receipt can be: “30 Jan′ 18”, “02/10/17”, “22/11/2017”, “Apr 06′ 18”, “Apr. 4, 2018”, “2018-02-02”, “26 Oct. 2017”, “Nov. 18. 2017”, “Mar. 24, 2018”, “01012017”, etc.
  • Therefore, the region that conforms to the preset time feature can be found from the row region, that is, the region (time region) where the time information is located, and then the time information of the receipt can be determined. Specifically, the region that conforms to the preset time feature in the row region is identified through the neural network model. The neural network model is established through pre-training, and the training samples are time pictures in various formats. The specific method is as follows: First, in the process of identifying each row region of the receipt in the image in step S102, the step further includes: identifying a time region by using the region identification model, and labeling the time region, wherein the time region is a row region that conforms to the preset time feature. Further, in step S104, the step of determining the time information of the receipt according to the character content of each row region includes: determining the time information of the receipt according to the character content in the time region. For example, if the character in the time region is “2018-02-02”, it can be determined that the time information of receipt is “Feb. 2, 2018”.
  • The store name can be determined in the following manner: conducting a search in a store name database according to the character content of each row region to determine the store name on the receipt. Various store names are pre-stored in the store name database, and the search is conducted in the store name database for the character content in each row region one by one. If the character content in a certain row region can be found in the store name database, the store name found in the store name database is used as the store name on the receipt. If the character content cannot be found through the search, the store address can be determined based on the character content of each row region, and the store address can be used as the store name on the receipt.
  • In this embodiment, the store address can be determined in the following ways: 1. if there are preset characters such as “
    Figure US20220012488A1-20220113-P00006
    (which is translated as “address” in English)”, “address”, “add.”, etc. for indicating the address in a row region, it can be determined that the characters following these preset characters are address information; 2. if there are characters corresponding to the administrative region name or street name number, these characters are determined as address information.
  • The store address is determined as the store name. The address information in the store address used to indicate a smaller region may be selected as the store name. For example, the street+number or building+floor room number information in the store address may be selected as the store name. The address information for indicating a smaller region can be the address information indicating the smallest region or the second smallest region among the administrative region name, and such information is typically the characters in the last part of the Chinese address or the first part of the English address. For example, if the store address information includes No. 10 Nanjing East Road, “No. 10 Nanjing East Road” is selected as the store name. If the store address information includes Raffles Plaza 302, “Raffles Plaza 302” is selected as the store name. If the store address information contains “store 601 XX mall”, “store 601 XX mall” is selected as the store name. The address information in the store address information for indicating a larger region is not included in the store name, so as to keep the store name short. For example, if the store address information includes “No. 10, Nanjing East Road, Huangpu District, Shanghai”, then “Huangpu District, Shanghai” is ignored, and only “No. 10, Nanjing East Road” is selected as the store name, so that the store name can be simplified.
  • In addition, when the receipt includes a pattern (usually a logo), the store name can be determined by the pattern, and the specific method is as follows:
  • identifying the region where the pattern in the image is located by using the region identification model;
  • determining whether there is a matched store name pattern in the store name database according to the pattern; if there is the matched store name pattern in the store name database, determining the store name corresponding to the matched store name pattern as the store name on the receipt, if there is no matched store name pattern in the store name database, identifying the characters in the pattern as the store name on the receipt; if there are no characters in the pattern, performing the step of conducting a search again in the store name database according to the character content of each row region to determine the store name on the receipt, wherein each store name pattern in the store name database is labeled with a corresponding store name.
  • That is, in the embodiment of the disclosure, if there is a matched store name pattern in the store name database, the store name corresponding to the matched store name pattern is determined as the store name on the receipt, and the store name on the receipt determined based on the character content in each row region is discarded. If there is no matched store name pattern in the store name database, but the characters in the pattern can be identified, then the identified characters in the pattern are used as the store name on the receipt. Similarly, the store name on the receipt determined based on the character content in each row region is discarded. If there is no matched store name pattern in the store name database, and there is no character in the pattern or the characters in the pattern are not identified, a search is conducted in the store name database according to the character content of each row region to determine the store name on the receipt, which can further improve the reliability of identification on store name.
  • Furthermore, if the store name cannot be found in the store name database based on the pattern, and there are no characters in the pattern, and the store name cannot be found in the store name database based on the character content of each row region, it is also possible to determine the store address based on the character content of each row region, and the store address is used as the store name on the receipt.
  • The method described in this embodiment is explained below with some specific examples of receipt.
  • In the receipt shown in FIG. 2A, the region identification model can identify the time region A1 in the process of identifying each row region, and then it can be determined that the time information is 8/8/2017 according to the identification result of the character identification model. In the meantime, the receipt includes a pattern, and the region identification model can further identify the region A2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “Hudson News” in the pattern as the store name. Based on the character identification result of each row region, it can be determined that the receipt contains the preset keywords “subtotal”, “total”, “cash”, and “change”. The row region A3 where these keywords are located is determined as the total amount region, and these key phrases are formulated into the phrase to be queried, namely “cash-change-subtotal-total”. An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “total” as the payment amount, the amount value 2.54 corresponding to the keyword “total” in the total amount region A3 is used as the payment amount.
  • In the receipt shown in FIG. 2B, the region identification model can identify the time region B1 in the process of identifying each row region, and then it can be determined that the time information is 08/03/17 according to the identification result of the character identification model. In the meantime, the receipt includes a pattern, and the region identification model can further identify the region B2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “ingles” in the pattern as the store name. Based on the character identification result of each row region, it can be determined that the receipt contains the preset keywords “TAX”, “BALANCE”, “TOTAL AMOUNT”, and “CHANGE”. The row region B3 where these keywords are located is determined as the total amount region, and these key phrases are formulated into the phrase to be queried, namely “BALANCE-CHANGE-TAX-TOTAL AMOUNT”. An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “TOTAL AMOUNT” as the payment amount, the amount value 4.44 corresponding to the keyword “TOTAL AMOUNT” in the total amount region B3 is used as the payment amount.
  • In the receipt shown in FIG. 2C, the region identification model can identify the time region C1 in the process of identifying each row region, and then it can be determined that the time information is 08/02/17 and 10/31/17 according to the identification result of the character identification model. In the meantime, the receipt includes a pattern, and the region identification model can further identify the region C2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “TARGET” in the pattern as the store name. Based on the character identification result of each row region, it can be determined that the receipt contains the preset keywords “SUBTOTAL”, “TAX”, and “TOTAL”. The row region C3 where these keywords are located is determined as the total amount region, and these key phrases are formulated into the phrase to be queried, namely “SUBTOTAL-TAX-TOTAL”. An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “TOTAL” as the payment amount, the amount value 4.86 corresponding to the keyword “TOTAL” in the total amount region C3 is used as the payment amount.
  • In the receipt shown in FIG. 2D, the region identification model can identify the time region D1 in the process of identifying each row region, and then it can be determined that the time information is 26/12/2017 according to the identification result of the character identification model. In the meantime, the receipt includes a pattern, and the region identification model can further identify the region D2 where the pattern is located, and search for the matched store name pattern in the store name database according to the pattern, so as to determine the store name on the receipt. If there is no matched store name pattern in the store name database, the character identification model is adopted to identify the character “RTA” in the pattern as the store name. Based on the character identification result of each row region, it can be determined that the receipt contains a preset keyword “Total Amount”. The row region D3 where the keyword is located is determined as the total amount region, and the key phrase is formed into the phrase to be queried, namely “Total Amount”. An amount determining rule corresponding to the phrase to be queried is searched in the rule database. If the found amount determining rule is to use the amount value corresponding to the keyword “Total Amount” as the payment amount, the amount value 61.00 corresponding to the keyword “Total Amount” in the total amount region D3 is used as the payment amount.
  • The training process of the region identification model and the character identification model will be briefly described below.
  • The region identification model can be obtained through the following process: labeling each receipt image sample in the receipt image sample set to label each row region in each receipt image sample; training the neural network through the labeled receipt image sample set to obtain the region identification model. When labeling each row region, it is possible to further label the region that conforms to the preset time feature as the time region. In this way, the region identification model that is trained through a large number of various types of time region samples can identify each row region while identifying and labeling the time region.
  • The character identification model can be obtained through the following process: labeling each row region that is labeled in the training process of the region identification model to label the characters in each row region; training the neural network through each row region that is labeled to obtain the character identification model.
  • Certainly, the training set of the character identification model may be different from the training set of the region identification model, the disclosure provides no limitation thereto.
  • To sum up, in this embodiment, after obtaining the image of the receipt to be identified, the region identification model is first used to identify each row region, and then the character identification model is used to identify the character content in each row region, and finally the time information, store name, and payment amount in the receipt are determined based on the character content, thereby realizing the automatic identification and display of relevant information on the receipt, which improves the efficiency of processing receipts.
  • Corresponding to the foregoing method embodiment, an embodiment of the disclosure further provides a receipt identification apparatus. FIG. 3 is a schematic structural diagram of a receipt identification apparatus embodied in an embodiment of the disclosure. Please refer to FIG. 3. A receipt identification apparatus can include:
  • an acquisition module 201 configured to obtain an image of a receipt to be identified;
  • a first identification module 202 configured to identify each row region of the receipt in the image by using a region identification model, wherein the row region is the region where each line of characters on the receipt is located;
  • a second identification module 203 configured to identify the character content in each row region by using a character identification model; and
  • a determining module 204 configured to determine the time information, store name, and payment amount on the receipt according to the character content of each row region.
  • Specifically, the step in which the determining module 204 determines the payment amount on the receipt according to the character contents in each row region includes:
  • determining at least one row region where the character content containing at least one preset keyword is located as the total amount region; and
  • determining the payment amount on the receipt according to the character content in the total amount region.
  • Optionally, the first identification module 202 is further configured to identify a time region by using the region identification model when identifying each row region on the receipt in the image, and label the time region, wherein the time region is a row region that conforms to a preset time feature.
  • The step in which the determining module 204 determines the time information on the receipt according to the character contents in each row region is specifically as follows:
  • determining the time information on the receipt according to the character content of the time region.
  • Optionally, the step in which the determining module 204 determines the store name on the receipt according to the character content of each row region is specifically as follows:
  • conducting a search in a store name database according to the character content of each row region to determine the store name on the receipt.
  • Optionally, when the receipt includes a pattern, the first identification module 202 is further configured to identify the region where the pattern in the image is located by using the region identification model.
  • The determining module 204 is further configured to determine whether there is a matched store name pattern in the store name database according to the pattern; if there is the matched store name pattern in the store name database, the store name corresponding to the matched store name pattern is determined as the store name on the receipt, if there is no matched store name pattern in the store name database, the characters in the pattern are identified and used as the store name on the receipt; if there are no characters in the pattern, a search is conducted in the store name database again according to the character content of each row region to determine the store name on the receipt;
  • wherein each store name pattern in the store name database is labeled with a corresponding store name.
  • That is, in the embodiment of the disclosure, if there is a matched store name pattern in the store name database, the store name corresponding to the matched store name pattern is determined as the store name on the receipt, and the store name on the receipt determined based on the character content in each row region is discarded. If there is no matched store name pattern in the store name database, but the characters in the pattern can be identified, then the identified characters in the pattern are used as the store name on the receipt. Similarly, the store name on the receipt determined based on the character content in each row region is discarded. If there is no matched store name pattern in the store name database, and there is no character in the pattern or the characters in the pattern are not identified, a search is conducted in the store name database according to the character content of each row region to determine the store name on the receipt, which can further improve the reliability of identification on store name.
  • Optionally, the determining module 204 is further configured for searching in the store name database according to the character content of each row region. If the store name on the receipt cannot be determined, the store address is determined based on the character content in each row region, and the store address is used as the store name on the receipt.
  • Optionally, the step in which the determining module 204 determines the store address based on the character content of each row region is specifically as follows:
  • if a preset character used to indicate an address appears in a certain row region, the character following the preset character is used as the store address;
  • if characters corresponding to the administrative region name or street name appear in a row region, these characters are used as the store address.
  • The step in which the determining module 204 uses the store address as the store name on the receipt includes:
  • selecting the address information that represents the smaller region in the store addresses as the store name.
  • Optionally, the step in which the determining module 204 determines the payment amount on the receipt according to the character content in the total payment region includes:
  • formulating all the preset key phrases in the total amount region into phrases to be queried, and selecting the target amount determining rule corresponding to the phrases to be queried from a rule database; wherein, there are phrases composed of different preset key phrases and the rules for determining the amount corresponding to various phrases in the rule database;
  • determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
  • Optionally, the rule for determining the amount stored in the rule database is: designating a preset keyword in the phrase, so as to use the amount value corresponding to the preset keyword in the total amount region as the payment amount on the receipt.
  • The step in which the determining module 204 determines the payment amount on the receipt based on the character content in the total amount region according to the target amount determining rule is specifically as follows:
  • using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
  • Optionally, the step in which the determining module 204 formulates all the preset key phrases in the total amount region into phrases to be queried is specifically as follows:
  • formulating all the preset keywords in the total amount region into the phrase to be queried by an initial alphabetical order.
  • An embodiment of the disclosure further provides an electronic device. FIG. 4 is a schematic structural diagram of an electronic device embodied in an embodiment of the disclosure. Please refer to FIG. 4, the electronic device includes a processor 301, a communication interface 302, a memory 303, and a communication bus 304, wherein the processor 301, the communication interface 302, and the memory 303 completes the communication between each other through the communication bus 304.
  • The memory 303 is configured to store computer programs.
  • The processor 301 is configured to implement the following steps when executing the computer program stored in the memory 303:
  • obtaining an image of the receipt to be identified;
  • adopting a region identification model to identify each row region of the receipt in the image; wherein the row region is the region where each line of characters is located;
  • adopting a character identification model to identify the character content in each row region;
  • determining the time information, store name and payment amount on the receipt according to the character content of each row region;
  • wherein, the step of determining the payment amount on the receipt according to the character contents in each row region includes:
  • determining at least one row region where the character content containing at least one preset keyword is located as the total amount region; and
  • determining the payment amount on the receipt according to the character content in the total amount region.
  • Please refer to the method embodiment shown in FIG. 1 above for the specific implementation of each step of the method and related content, and no further description will be incorporated herein.
  • In addition, other implementations of the receipt identification method implemented by the processor 301 executing the program stored in the memory 303 are the same as the implementations mentioned in the foregoing method embodiments, and no further description will be incorporated herein.
  • The communication bus mentioned in the description related to electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The communication bus can be categorized into address bus, data bus, control bus, etc. For ease of illustration, only a thick line is adopted in the figure to represent the communication bus, which does not mean that there is only one bus or only one type of bus.
  • The communication interface is configured to implement communication between the electronic device and other devices.
  • The memory may include random access memory (RAM), and may also include non-volatile memory (NVM), such as at least one disk memory. Optionally, the memory may also be at least one storage apparatus located far away from the processor described above.
  • The aforementioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc. The processor can also be a digital signal processing (DSP), a dedicated Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • An embodiment of the disclosure further provides a computer-readable storage medium in which a computer program is stored. When the computer program is executed by a processor, the steps of the above-mentioned receipt identification method are realized.
  • It should be noted that the various embodiments in this specification are described in a related manner, and the same or similar parts between the various embodiments can serve as cross-reference for each other. Each embodiment focuses on the differences from other embodiments. In particular, in the embodiments related to the apparatus, electronic device, and computer-readable storage medium, since they are basically similar to the method embodiments, the description is relatively simple. For related parts, please refer to the description of the method embodiments.
  • In this specification, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are in any actual relationship or order. Moreover, the terms “include”, “contain” or any other alternatives thereof are intended to involve non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes other elements that are not explicitly listed, or also include elements inherent to this process, method, article or device. If there are no further limitations, the element defined by the sentence “including a . . . ” does not exclude the existence of other identical elements in the process, method, article, or device that includes the element.
  • The foregoing description is only a description of the preferred embodiments of the disclosure and does not limit the scope of the disclosure in any way. Any changes or modifications made by persons of ordinary skill in the field of the disclosure based on the foregoing disclosure shall fall within the protection scope of the claims.

Claims (20)

What is claimed is:
1. A receipt identification method, characterized in comprising:
obtaining an image of a receipt to be identified;
identifying each row region in the image by using a region identification model, wherein the row region is a region where each line of characters on the receipt is located, and the region identification model is a model based on a neural network;
identifying a character content in each of the row regions by using a character identification model, wherein the character identification model is a model based on the neural network;
determining time information, a store name and a payment amount on the receipt according to the character content in each of the row regions;
wherein step of determining the payment amount on the receipt according to the character content in each of the row regions comprises:
determining at least one of the row regions where the character content containing at least one preset keyword is located as a total amount region;
determining the payment amount on the receipt according to the character content in the total amount region.
2. The receipt identification method according to claim 1, wherein when identifying each of the row regions in the image, the method further comprises: identifying a time region by using the region identification model, and labeling the time region, wherein the time region is a row region that conforms to a preset time feature;
step of determining the time information on the receipt according to the character content in each of the row regions comprises:
determining the time information on the receipt according to the character content of the time region.
3. The receipt identification method according to claim 1, wherein step of determining the store name on the receipt according to the character content of each of the row regions comprises:
conducting a search in a store name database according to the character content of each of the row regions to determine the store name on the receipt.
4. The receipt identification method according to claim 1, wherein when the receipt contains a pattern, the method further comprises:
identifying a region where the pattern in the image is located by using the region identification model;
determining whether there is a matched store name pattern in a store name database according to the pattern; if there is the matched store name pattern in the store name database, determining a store name corresponding to the matched store name pattern as the store name on the receipt, if there is no matched store name pattern in the store name database, identifying characters in the pattern, and using the characters identified in the pattern as the store name on the receipt; if there are no characters in the pattern, conducting a search in the store name database according to the character content of each of the row regions to determine the store name on the receipt;
wherein each of the store name patterns in the store name database is labeled with a corresponding store name.
5. The receipt identification method according to claim 3, wherein the step of determining the store name on the receipt according to the character content of each of the row regions comprises:
conducting the search in the store name database according to the character content of each of the row regions, if the store name on the receipt is not obtained, determining a store address based on the character content of each of the row regions, and using the store address as the store name on the receipt.
6. The receipt identification method according to claim 5, wherein step of determining the store address based on the character content of each of the row regions comprises at least one of followings:
if a preset character used to indicate an address appears in one of the row regions, using a character following the preset character as the store address; and
if characters corresponding to an administrative region name or a street name appear in one of the row regions, using the characters as the store address;
step of using the store address as the store name on the receipt comprises:
selecting address information that represents a smaller region in the store address as the store name.
7. The receipt identification method according to claim 1, wherein step of determining the payment amount on the receipt according to the character content in the total payment region comprises:
formulating all preset key phrases in the total amount region into phrases to be queried, and selecting a target amount determining rule corresponding to the phrases to be queried from a rule database; wherein, there are phrases composed of different preset key phrases and a rule for determining an amount corresponding to each of the phrases in the rule database;
determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
8. The receipt identification method according to claim 7, wherein the rule for determining the amount stored in the rule database is: designating one of the preset keywords in the phrase, and using an amount value corresponding to the preset keyword as the payment amount on the receipt;
step of determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule comprises:
using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
9. The receipt identification method according to claim 7, wherein step of formulating all the preset key phrases in the total amount region into the phrases to be queried comprises:
formulating all the preset keywords in the total amount region into the phrase to be queried by an initial alphabetical order.
10. A receipt identification apparatus, characterized in comprising:
an acquisition module configured to obtain an image of a receipt to be identified;
a first identification module configured to identify each row region in the image by using a region identification model, wherein the row region is a region where each line of characters on the receipt is located, and the region identification model is a model based on a neural network;
a second identification module configured to identify a character content in each of the row regions by using a character identification model, wherein the character identification model is a model based on the neural network;
a determining module configured to determine time information, a store name, and a payment amount on the receipt according to the character content of each of the row regions;
wherein the determining module determines the payment amount on the receipt according to the character content in each of the row regions comprising:
determining at least one of the row regions where the character content containing at least one preset keyword is located as a total amount region;
determining the payment amount on the receipt according to the character content in the total amount region.
11. The receipt identification apparatus according to claim 10, wherein the first identification module is further configured to identify a time region by using the region identification model when identifying each of the row regions on the receipt in the image, and label the time region, wherein the time region is a row region that conforms to a preset time feature;
the determining module determines the time information on the receipt according to the character content in each of the row regions comprising:
determining the time information on the receipt according to the character content of the time region.
12. The receipt identification apparatus according to claim 10, wherein the determining module determines the store name on the receipt according to the character content of each of the row regions comprising:
conducting a search in a store name database according to the character content of each of the row regions to determine the store name on the receipt.
13. The receipt identification apparatus according to claim 10, wherein when the receipt contains a pattern, the first identification module is further configured to identify a region where the pattern in the image is located by using the region identification model;
the determining module is further configured to determine whether there is a matched store name pattern in a store name database according to the pattern; if there is the matched store name pattern in the store name database, the determining module is configured to determine a store name corresponding to the matched store name pattern as the store name on the receipt, if there is no matched store name pattern in the store name database, the determining module is configured to identify characters in the pattern, and use the characters identified in the pattern as the store name on the receipt; if there are no characters in the pattern, the determining module is configured to conduct a search in the store name database according to the character content of each of the row regions to determine the store name on the receipt;
wherein each of the store name patterns in the store name database is labeled with a corresponding store name.
14. The receipt identification apparatus according to claim 12, wherein the determining module determines the store name on the receipt according to the character content of each of the row regions comprising:
conducting the search in the store name database according to the character content of each of the row regions, if the store name on the receipt is not obtained, determining a store address based on the character content of each of the row regions, and using the store address as the store name on the receipt.
15. The receipt identification apparatus according to claim 14, wherein the determining module determines the store address based on the character content of each of the row regions comprising at least one of followings:
if a preset character used to indicate an address appears in one of the row regions, using a character following the preset character as the store address; and
if characters corresponding to an administrative region name or a street name appear in one of the row regions, using the characters as the store address;
wherein the determining module uses the store address as the store name on the receipt comprising:
selecting address information that represents a smaller region in the store address as the store name.
16. The receipt identification apparatus according to claim 10, wherein the determining module determines the payment amount on the receipt according to the character content in the total payment region comprising:
formulating all preset key phrases in the total amount region into phrases to be queried, and selecting a target amount determining rule corresponding to the phrases to be queried from a rule database; wherein, there are phrases composed of different preset key phrases and a rule for determining an amount corresponding to each of the phrases in the rule database;
determining the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule.
17. The receipt identification apparatus according to claim 16, wherein the rule for determining the amount stored in the rule database is: designating one of the preset keywords in the phrase, and using an amount value corresponding to the preset keyword as the payment amount on the receipt;
the determining module determines the payment amount on the receipt based on the character content of the total amount region according to the target amount determining rule comprising:
using the amount value corresponding to the preset keyword specified by the target amount determining rule in the total amount region as the payment amount on the receipt.
18. The receipt identification apparatus according to claim 16, wherein the determining module formulates all the preset key phrases in the total amount region into the phrases to be queried comprising:
formulating all the preset keywords in the total amount region into the phrase to be queried by an initial alphabetical order.
19. An electronic device, characterized in comprising: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory completes communication between each other through the communication bus;
wherein the memory is configured to store a computer program; and
the processor is configured to implement the steps in the method claimed in claim 1 when executing the computer program stored in the memory.
20. A non-transitory computer-readable storage medium, in which a computer program is stored, characterized in that when the computer program is executed by a processor, the steps in the method claimed in claim 1 are realized.
US17/485,511 2019-05-09 2021-09-27 Receipt identification method, apparatus, electronic device and computer-readable storage medium Pending US20220012488A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201910386149.0A CN110956739A (en) 2019-05-09 2019-05-09 Bill identification method and device
CN201910386149.0 2019-05-09
PCT/CN2019/103848 WO2020224131A1 (en) 2019-05-09 2019-08-30 Receipt recognition method and apparatus, electronic device, and computer readable storage medium

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103848 Continuation-In-Part WO2020224131A1 (en) 2019-05-09 2019-08-30 Receipt recognition method and apparatus, electronic device, and computer readable storage medium

Publications (1)

Publication Number Publication Date
US20220012488A1 true US20220012488A1 (en) 2022-01-13

Family

ID=69976161

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/216,669 Active US11361570B2 (en) 2019-05-09 2021-03-29 Receipt identification method, apparatus, device and storage medium
US17/485,511 Pending US20220012488A1 (en) 2019-05-09 2021-09-27 Receipt identification method, apparatus, electronic device and computer-readable storage medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US17/216,669 Active US11361570B2 (en) 2019-05-09 2021-03-29 Receipt identification method, apparatus, device and storage medium

Country Status (3)

Country Link
US (2) US11361570B2 (en)
CN (3) CN110956739A (en)
WO (1) WO2020224131A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210256288A1 (en) * 2019-02-27 2021-08-19 Hangzhou Glority Software Limited Bill identification method, device, electronic device and computer-readable storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11275934B2 (en) * 2019-11-20 2022-03-15 Sap Se Positional embeddings for document processing
CN111814779A (en) * 2020-07-08 2020-10-23 重庆农村商业银行股份有限公司 Bill text recognition method, device, equipment and storage medium
CN111860450A (en) * 2020-08-03 2020-10-30 理光图像技术(上海)有限公司 Ticket recognition device and ticket information management system
CN112101995A (en) * 2020-09-11 2020-12-18 北京市商汤科技开发有限公司 Data processing method, device, equipment and storage medium
CN112685414B (en) * 2020-12-29 2023-04-25 勤智数码科技股份有限公司 Method and device for associating information resource catalog with data resource
CN113626466B (en) * 2021-08-10 2022-04-15 深圳市玄羽科技有限公司 Material management method and system based on industrial internet and computer storage medium
US11921676B2 (en) * 2021-11-29 2024-03-05 International Business Machines Corporation Analyzing deduplicated data blocks associated with unstructured documents
CN117152778B (en) * 2023-10-31 2024-01-16 安徽省立医院(中国科学技术大学附属第一医院) Medical instrument registration certificate identification method, device and medium based on OCR

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070237427A1 (en) * 2006-04-10 2007-10-11 Patel Nilesh V Method and system for simplified recordkeeping including transcription and voting based verification
US20140064618A1 (en) * 2012-08-29 2014-03-06 Palo Alto Research Center Incorporated Document information extraction using geometric models

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030050891A1 (en) * 2001-09-07 2003-03-13 James Cohen Method and system for registration and tracking of items
JP2011227543A (en) * 2010-04-15 2011-11-10 Panasonic Corp Form processing device and method and recording medium
JP5202677B2 (en) * 2011-04-08 2013-06-05 株式会社富士通マーケティング Receipt data recognition device and program thereof
JP5216890B2 (en) * 2011-04-15 2013-06-19 株式会社富士通マーケティング Receipt data recognition device and program thereof
US20140268250A1 (en) * 2013-03-15 2014-09-18 Mitek Systems, Inc. Systems and methods for receipt-based mobile image capture
US9230547B2 (en) * 2013-07-10 2016-01-05 Datascription Llc Metadata extraction of non-transcribed video and audio streams
CN104573735A (en) * 2015-01-05 2015-04-29 广东小天才科技有限公司 Method for optimizing positioning based on image shooting, intelligent terminal and server
CN104915114B (en) * 2015-05-29 2018-10-19 小米科技有限责任公司 Information recording method and device, intelligent terminal
JP2017004154A (en) * 2015-06-08 2017-01-05 ローレル精機株式会社 Paper money processor
JP2019061293A (en) * 2016-02-02 2019-04-18 日立オムロンターミナルソリューションズ株式会社 Bill processing device and bill handling method
CN107424000A (en) * 2017-04-11 2017-12-01 阿里巴巴集团控股有限公司 A kind of data capture method and device
CN107798299B (en) * 2017-10-09 2020-02-07 平安科技(深圳)有限公司 Bill information identification method, electronic device and readable storage medium
CN107808154B (en) * 2017-12-08 2021-03-30 上海慧银信息科技有限公司 Method and device for extracting cash register bill information
CN108229463A (en) * 2018-02-07 2018-06-29 众安信息技术服务有限公司 Character recognition method based on image
CN108564035B (en) * 2018-04-13 2020-09-25 杭州睿琪软件有限公司 Method and system for identifying information recorded on document
CN108717543B (en) * 2018-05-14 2022-01-14 北京市商汤科技开发有限公司 Invoice identification method and device and computer storage medium
CN109241857A (en) * 2018-08-13 2019-01-18 杭州睿琪软件有限公司 A kind of recognition methods and system of document information
CN109284750A (en) * 2018-08-14 2019-01-29 北京市商汤科技开发有限公司 Bank slip recognition method and device, electronic equipment and storage medium
CN109491623A (en) * 2018-11-14 2019-03-19 北京三快在线科技有限公司 Print data processing method and device, electronic invoice generation method and server
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
CN109711402B (en) * 2018-12-14 2021-06-04 杭州睿琪软件有限公司 Medical document identification method and computer-readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070237427A1 (en) * 2006-04-10 2007-10-11 Patel Nilesh V Method and system for simplified recordkeeping including transcription and voting based verification
US20140064618A1 (en) * 2012-08-29 2014-03-06 Palo Alto Research Center Incorporated Document information extraction using geometric models

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210256288A1 (en) * 2019-02-27 2021-08-19 Hangzhou Glority Software Limited Bill identification method, device, electronic device and computer-readable storage medium
US11966890B2 (en) * 2019-02-27 2024-04-23 Hangzhou Glority Software Limited Bill identification method, device, electronic device and computer-readable storage medium

Also Published As

Publication number Publication date
CN110956739A (en) 2020-04-03
US11361570B2 (en) 2022-06-14
WO2020224131A1 (en) 2020-11-12
CN111489487B (en) 2021-12-24
US20210216765A1 (en) 2021-07-15
CN111275880A (en) 2020-06-12
CN111275880B (en) 2021-08-31
CN111489487A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
US20220012488A1 (en) Receipt identification method, apparatus, electronic device and computer-readable storage medium
CN109685055B (en) Method and device for detecting text area in image
US9824270B1 (en) Self-learning receipt optical character recognition engine
JP6268352B2 (en) Accounting data entry system, method, and program
RU2613846C2 (en) Method and system for extracting data from images of semistructured documents
US8682917B2 (en) Method, system and computer program product for currency searching
CN110716991B (en) Method for displaying entity associated information based on electronic book and electronic equipment
WO2021169529A1 (en) Method, apparatus and device for identifying risk in code image
CN110785773A (en) Bill recognition system
CN112800848A (en) Structured extraction method, device and equipment of information after bill identification
US20200097714A1 (en) Systems and methods for obtaining product information in real-time
US11379690B2 (en) System to extract information from documents
CN111310750B (en) Information processing method, device, computing equipment and medium
US11966890B2 (en) Bill identification method, device, electronic device and computer-readable storage medium
CN111598099A (en) Method and device for testing image text recognition performance, testing equipment and medium
US20140279642A1 (en) Systems and methods for enrollment and identity management using mobile imaging
WO2019165919A1 (en) Data processing method and device, and machine readable medium
CN111178365A (en) Picture character recognition method and device, electronic equipment and storage medium
US11475686B2 (en) Extracting data from tables detected in electronic documents
US11482027B2 (en) Automated extraction of performance segments and metadata values associated with the performance segments from contract documents
CN114299509A (en) Method, device, equipment and medium for acquiring information
KR20210101560A (en) System for providing creditcard acceptable store searching service
JP5762480B2 (en) Information processing apparatus and method
CN113254598B (en) Document comparison method, device, server, medium and product
CN114419640B (en) Text processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: HANGZHOU GLORITY SOFTWARE LIMITED, CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:XU, QINGSONG;LI, QING;REEL/FRAME:057696/0794

Effective date: 20210922

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS