WO2019200781A1 - 票据识别方法、装置及存储介质 - Google Patents

票据识别方法、装置及存储介质 Download PDF

Info

Publication number
WO2019200781A1
WO2019200781A1 PCT/CN2018/100156 CN2018100156W WO2019200781A1 WO 2019200781 A1 WO2019200781 A1 WO 2019200781A1 CN 2018100156 W CN2018100156 W CN 2018100156W WO 2019200781 A1 WO2019200781 A1 WO 2019200781A1
Authority
WO
WIPO (PCT)
Prior art keywords
ticket
key
picture
type
identification
Prior art date
Application number
PCT/CN2018/100156
Other languages
English (en)
French (fr)
Inventor
李佳琳
刘鹏
赵�怡
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019200781A1 publication Critical patent/WO2019200781A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a ticket identification method, device, and storage medium.
  • a ticket identification method comprising:
  • the identifying device extracts a ticket picture from the to-be-identified picture
  • the identification device detects at least one text position from the ticket picture
  • the identification device identifies a key field at each of the at least one text position and obtains a machine recognition result of the key field at each text position and a machine recognition result of the key field at each text position Confidence;
  • the identifying device sends a picture of each key field in the first type of key fields to the crowdsourcing platform processing device;
  • the crowdsourcing platform processing device sends a picture of each key field in the first type of key fields to multiple users, so that multiple users check the pictures of the same key field in the first type of key fields;
  • the crowdsourcing platform processing device determines a recognition result of each key segment in the first type of keyword segment according to a check result of a plurality of users corresponding to each keyword segment in the first type of key field;
  • the identifying device acquires a second type of key field that does not meet the condition, and determines a machine identification result of each key field in the second type of key field as a recognition result of each key field in the second type of key field. ;
  • the identifying device summarizes the recognition result of each key segment in the first type of key segments in each ticket picture and the recognition result of each key segment in the second type of keyword segments, and outputs each ticket image Identification result.
  • a ticket identification device comprising an identification device and a crowdsourcing platform processing device
  • the identifying device acquires a picture to be identified
  • the identifying device extracts a ticket picture from the to-be-identified picture
  • the identification device detects at least one text position from the ticket picture
  • the identification device identifies a key field at each of the at least one text position and obtains a machine recognition result of the key field at each text position and a machine recognition result of the key field at each text position Confidence;
  • the identifying device sends a picture of each key field in the first type of key fields to the crowdsourcing platform processing device;
  • the crowdsourcing platform processing device sends a picture of each key field in the first type of key fields to multiple users, so that multiple users check the pictures of the same key field in the first type of key fields;
  • the crowdsourcing platform processing device determines a recognition result of each key segment in the first type of keyword segment according to a check result of a plurality of users corresponding to each keyword segment in the first type of key field;
  • the identifying device acquires a second type of key field that does not meet the condition, and determines a machine identification result of each key field in the second type of key field as a recognition result of each key field in the second type of key field. ;
  • the identifying device summarizes the recognition result of each key segment in the first type of key segments in each ticket picture and the recognition result of each key segment in the second type of keyword segments, and outputs each ticket image Identification result.
  • a non-volatile readable storage medium storing at least one instruction, the at least one instruction being executed by a processor to implement the ticket identification method of any of the embodiments.
  • the present application firstly detects and identifies a key segment by using an intelligent identification algorithm, and obtains a confidence level of the machine identification result of the key segment, and a keyword with a confidence lower than the threshold.
  • the segment is sent to the crowdsourcing platform for verification, and the same key segment is sent to multiple users for verification through the crowdsourcing platform, and the verification results of multiple users for the same key segment are obtained, and finally the recognition result of the ticket image is output.
  • FIG. 1 is an application environment diagram of a preferred embodiment of a ticket identification method of the present application.
  • FIG. 2 is a flow chart of a preferred embodiment of the ticket identification method of the present application.
  • FIG. 3 is a block diagram showing the program of a preferred embodiment of the ticket identifying apparatus of the present application.
  • FIG. 4 is a schematic structural view of a preferred embodiment of a ticket identifying apparatus in at least one example of the present application.
  • FIG. 1 is an application environment diagram of a preferred embodiment of a ticket identification method of the present application.
  • the application environment map includes an identification device and a crowdsourcing platform processing device.
  • the identification device is configured to: acquire a picture to be identified; extract a picture of the ticket from the picture to be identified; perform text detection on the picture of the ticket, determine a text position; identify a key field at the position of the text, and determine a machine identification result of the key segment And the confidence of the machine identification result; based on the confidence of the machine segment recognition result and the machine recognition result, the first type of key field that meets the condition is obtained (if the confidence of the machine recognition result is lower than or equal to the confidence threshold) Field), the eligible first type key field is sent to the crowdsourcing platform platform processing device.
  • the crowdsourcing platform processing device sends the same key segment to multiple users of the crowdsourcing platform.
  • the plurality of users in each keyword segment of the first type of keyword segment check each keyword segment in the first type of keyword segment, and provide the plurality of users in each keyword segment.
  • the verification result exceeding the number of confidence thresholds in the verification result is used as the recognition result of each key segment in the first type of key field, and is sent to the identification device.
  • Taking the machine recognition result of each key segment in the second type of key field that does not meet the condition (for example, the key segment whose confidence of the machine recognition result is higher than the confidence threshold) as each of the second type of key fields The result of the identification of the key field.
  • the identification device outputs a recognition result of each ticket picture in the picture to be identified.
  • the application combines the advantages of the intelligent identification algorithm and the crowdsourcing platform, uses the recognition algorithm to clean the data of the ticket image, locate the text position, cut and identify the key segment, and is complicated by the crowdsourcing platform for the intelligent recognition algorithm.
  • the results of the field are corrected to improve the accuracy of ticket identification and improve the efficiency of ticket entry.
  • the ticket identification method is implemented using the ticket identification device in conjunction with the following embodiments.
  • FIG. 2 is a flow chart of a first preferred embodiment of the ticket identification method of the present application.
  • the order of the steps in the flowchart may be changed according to different requirements, and some steps may be omitted.
  • the identification device acquires a picture to be identified.
  • the identification device includes, but is not limited to, a server or the like.
  • the identification device can communicate with a plurality of terminal devices, the identification device providing a user interface interface to the user. For example, a user needs to be reimbursed, and the reimbursed hospital ticket is uploaded to the identification device through the user interface through the user interface provided by the identification device.
  • the identification device extracts a ticket picture from the picture to be identified.
  • the ticket picture includes at least one ticket picture, that is, one or more ticket pictures.
  • the identifying device extracts each bill picture in the at least one bill picture from the to-be-identified picture, determines whether the position of each bill picture is tilted, and performs position correction on the positionally inclined bill picture to make each The picture of the ticket is in the standard position.
  • each bill picture can be under the same standard, which is convenient for subsequent matching with the ticket template, and improves the accuracy of text position detection.
  • each ticket picture in the at least one ticket picture is extracted using the trained ticket extraction model, wherein each ticket picture belongs to a category of training samples that train the ticket extraction model.
  • the ticket extraction model can extract picture of bills of various shapes and sizes from the to-be-identified picture, so that each bill picture can be extracted.
  • the training samples for training the ticket extraction model are various types of ticket samples, such as bill list categories, hospital bill categories, catering bill categories, and the like.
  • the ticket extraction model learns the characteristics of the various types of ticket samples, so that the trained ticket extraction model can be used to identify various types of bill images in the training samples from the to-be-identified images, and Pictures of the category of bills that are not related will not be extracted. This can improve the accuracy of ticket recognition.
  • the ticket extraction model is a deep convolutional neural network model, including but not limited to: SSD (Single Shot MultiBox Detector) model.
  • the SSD algorithm is an object detection algorithm that directly predicts the coordinates and categories of bounding boxes. For the detection of objects of different sizes, the traditional method is to convert the images into different sizes, then process them separately, and finally combine the results, and the SSD algorithm can achieve the same by using the feature maps of different convolution layers. Effect.
  • the main network structure of the algorithm is VGG16, which changes two fully connected layers into a convolution layer and then adds four convolutional layer structure network structures.
  • the output of five different convolutional layers is convolved with two 3*3 convolution kernels, one for the classification of the output, and each default box generates the first number (such as 5) confidence (this is for the VOC data set containing the second number (such as 4) of the object category); an output regression for localization, each default box generates 4 coordinates Value (x, y, w, h).
  • the five convolutional layers also generate a default box (generated coordinates) through a prior box layer. The number of default boxes for each of the five convolutional layers described above is given. Finally, the first three calculation results are combined and passed to the loss layer.
  • the process of training the ticket extraction model includes:
  • a bill picture sample of each bill picture category is separately configured, and the bill picture sample is divided into a first ratio training set and a second proportion verification set.
  • the preset bill picture category includes a plurality of types, for example, an outpatient bill and an inpatient bill, and the first preset number is, for example, 1000 sheets, the first ratio is, for example, 75%, and the second ratio is, for example, 25%, wherein the second ratio is, for example, 25%.
  • the sum of the first ratio and the second ratio is less than or equal to 1.
  • the ticket extraction model is trained using the training set in the ticket picture sample of each ticket picture category.
  • the identification device detects at least one text position from the ticket picture.
  • the detecting at least one text position from the ticket picture comprises:
  • the ticket surface color filtering technology is prior art, and is not described in detail herein.
  • the character strokes of the filtered ticket picture are more clear and prominent, and the edge of the ticket is more complete, so that subsequent detection and identification are performed. Accuracy can be improved during operation.
  • the training samples for training the text position detection model are various types of bill samples, such as bill list categories, hospital bill categories, catering bill categories, and the like.
  • the text position detection model learns the position of the key segments in the various types of ticket samples, so that the trained text position detection model can identify all the key segments from each type of ticket sample.
  • the location of the key segment of the hospital ticket category includes, but is not limited to, the location where the hospital name field is located, the location where the user name field is located, the location where the drug list field is located, the location where the date field is located, and the location where the ticket number field is located. and many more.
  • the text position detection model includes, but is not limited to, a CTPN (Connectionist Text Proposal Network) model.
  • the process of training the text position detection model includes:
  • a bill picture sample of each bill picture category is separately configured, and the bill picture sample is divided into a first ratio training set and a second proportion verification set.
  • the preset bill picture category includes a plurality of types, for example, an outpatient bill and an inpatient bill, and the first preset number is, for example, 1000 sheets, the first ratio is, for example, 75%, and the second ratio is, for example, 25%, wherein the second ratio is, for example, 25%.
  • the sum of the first ratio and the second ratio is less than or equal to 1.
  • the text position detection model is trained using the bill picture samples marked in each bill picture category.
  • the identification device identifies a key field at each text position in at least one text position, and obtains a machine recognition result of the key field at each text position and a machine recognition result of the key field at each text position. Confidence.
  • the key fields at each text location are identified using the Warp-CTC algorithm.
  • the Warp-CTC is an improved Recurrent Neural Networks (RNN) model. Baidu Silicon Valley Artificial Intelligence Lab has opened up a key code Warp-CTC that allows artificial intelligence software to run more efficiently.
  • the Warp-CTC algorithm is compiled in C language and integrated. It can solve the monitoring problem in the process of drawing input sequence to output sequence map and is applied in recognition technology.
  • the Warp-CTC algorithm requires a small storage space that is hundreds of times faster than a normal CTC (Connectionist Temporal Classification).
  • the key fields at each text position are input into the trained improved RNN model, the key fields at each text position are processed, and the machine recognition results of the key fields at each text position are output and each The confidence of the machine identification result of the key segment at the text position.
  • training the improved RNN model includes:
  • the use of the ticket is different and the key fields of the ticket are also different.
  • the key fields include, but are not limited to, a hospital name field, a user name field, a medicine and drug field, a date field, and the like.
  • the identification device acquires a first type of key field that meets the condition according to a confidence level of a machine identification result of the key segment at each text position.
  • the eligible first type key field includes but is not limited to any one or a combination of the following:
  • a key segment having a confidence level of the machine identification result lower than or equal to the confidence threshold is used as a part of the first type of key field.
  • the confidence threshold may be a pre-configured threshold, such as (0.9).
  • the confidence threshold may also be configured based on the confidence of the machine identification results of all key fields, for example, the average of the confidence of the machine identification result of the key field as the confidence threshold or the like. In this way, the confidence threshold can be determined based on the actual data, so that the configuration of the confidence threshold is more in line with actual needs.
  • the first type of key field is removed, and the unqualified key field is the second type of key field.
  • the identifying device sends a picture of each key field in the first type of key fields to the crowdsourcing platform processing device.
  • the crowdsourcing platform is typically a mode of a large public network, and each user can register as a member user on the crowdsourcing platform in a free and voluntary manner, and the crowdsourcing platform processing device is used for the crowdsourcing Platform data.
  • the crowdsourcing platform processing device sends a picture of each key field in the first type of key fields to multiple users, so that multiple users check the pictures of the same key field in the first type of key fields. .
  • the crowdsourcing platform distributes each keyword segment of the first type of keyword segments as a task to multiple users for verification, so that multiple users check the image of the same keyword segment. .
  • the crowdsourcing platform processing device determines, according to the test result of multiple users corresponding to each key segment in the first type of key field, the recognition result of each key segment in the first type of key field.
  • a check result exceeding a number of people thresholds among the test results provided by the plurality of users is used as a recognition result of each key field. For example, if the date field is sent to three users, if three users have three different answers to the check result of the date field, it is determined that there is no correct result, and if two users have the same answer, The answer of the two users is taken as the test result of the date field.
  • the unidentified test device is sent to the identification processing device. Prompting to enable the identification device to send a prompt to the terminal device to prompt the user to re-upload the ticket picture, thereby ensuring the accuracy of the identification.
  • the present application first uses the intelligent identification algorithm to detect and identify the key field, and obtains the confidence of the machine identification result of the key segment, and sends the key segment with the confidence lower than the threshold to the crowdsourcing platform for calibration.
  • the crowdsourcing platform Through the crowdsourcing platform, the same key segment is sent to multiple users for verification, and the verification results of multiple users for the same key segment are obtained, thereby improving the accuracy of the ticket identification, thereby quickly establishing a file.
  • the identifying device acquires a second type of key field that does not meet the condition, and determines a machine identification result of each key field in the second type of key field as each key segment of the second type of key field. Identify the results.
  • the identifying device summarizes the recognition result of each key segment in the first type of keyword segments in each ticket picture and the recognition result of each keyword segment in the second type of keyword segment, and outputs each The recognition result of the ticket picture.
  • the picture to be identified includes one or more bill pictures.
  • a summary output is required.
  • a user's reimbursement form has multiple bills, which are all attached to one to be recognized, if only one is returned. The recognition result of the bill picture cannot be reimbursed for subsequent calculation.
  • the present application first uses the intelligent identification algorithm to detect and identify the key field, and obtains the confidence of the machine identification result of the key segment, and sends the key segment with the confidence lower than the threshold to the crowdsourcing platform for calibration.
  • the crowdsourcing platform Through the crowdsourcing platform, the same key segment is sent to multiple users for verification, and the verification results of multiple users for the same key segment are obtained, and finally the recognition result of the ticket image is output, thereby improving the accuracy of the ticket identification. Thereby quickly file.
  • FIG. 3 is a block diagram showing the program of the first preferred embodiment of the ticket identifying apparatus of the present application.
  • the ticket identification device 4 includes, but is not limited to, one or more of the following program modules: an acquisition module 40, an extraction module 41, a training module 42, a detection module 43, an identification module 44, a transmission module 45, a data transmission module 46, and a determination module. 47.
  • a program module as referred to in the present application refers to a series of computer readable instruction segments that can be executed by a processor of the ticket identification device 4 and that are capable of performing a fixed function, which are stored in a memory. The function of each module will be detailed in the subsequent embodiments.
  • the memory of the identification device is used to store one or more of the following program modules: an acquisition module 40, an extraction module 41, a training module 42, a detection module 43, an identification module 44, a transmission module 45, and an output module 49. And executing the one or more modules by the processor of the identification device: an acquisition module 40, an extraction module 41, a training module 42, a detection module 43, an identification module 44, and a sending module 45.
  • the memory of the crowdsourcing platform processing device is configured to store one or more of the following program modules: a data sending module 46, a determining module 47, and a prompting module 48, and executing the one or the processor through the crowdsourcing platform processing device
  • a plurality of program modules a data sending module 46, a determining module 47, and a prompting module 48.
  • the obtaining module 40 acquires a picture to be identified.
  • the identification device includes, but is not limited to, a server or the like.
  • the identification device can communicate with a plurality of terminal devices, the identification device providing a user interface interface to the user. For example, a user needs to be reimbursed, and the reimbursed hospital ticket is uploaded to the identification device through the user interface through the user interface provided by the identification device.
  • the extraction module 41 extracts a ticket picture from the picture to be identified.
  • the ticket picture includes at least one ticket picture, i.e., one or more ticket pictures.
  • the extracting module 41 extracts each bill image in the at least one bill image from the to-be-identified image, determines whether the position of each bill image is tilted, and performs position correction on the positionally inclined bill image to enable Each ticket image is in a standard position.
  • each bill picture can be under the same standard, which is convenient for subsequent matching with the ticket template, and improves the accuracy of text position detection.
  • the extraction module 41 extracts each of the at least one ticket picture using the trained ticket extraction model, wherein each ticket picture belongs to a category of training samples that train the ticket extraction model.
  • the ticket extraction model can extract picture of bills of various shapes and sizes from the to-be-identified picture, so that each bill picture can be extracted.
  • the training module 42 trains the training samples of the ticket extraction model as various types of ticket samples, such as bill list categories, hospital bill categories, catering bill categories, and the like.
  • the ticket extraction model learns the characteristics of the various types of ticket samples, so that the trained ticket extraction model can be used to identify various types of bill images in the training samples from the to-be-identified images, and Pictures of the category of bills that are not related will not be extracted. This can improve the accuracy of ticket recognition.
  • the ticket extraction model is a deep convolutional neural network model, including but not limited to: SSD (Single Shot MultiBox Detector) model.
  • the SSD algorithm is an object detection algorithm that directly predicts the coordinates and categories of bounding boxes. For the detection of objects of different sizes, the traditional method is to convert the images into different sizes, then process them separately, and finally combine the results, and the SSD algorithm can achieve the same by using the feature maps of different convolution layers. Effect.
  • the main network structure of the algorithm is VGG16, which changes two fully connected layers into a convolution layer and then adds four convolutional layer structure network structures.
  • the output of five different convolutional layers is convolved with two 3*3 convolution kernels, one for the classification of the output, and each default box generates the first number (such as 5) confidence (this is for the VOC data set containing the second number (such as 4) of the object category); an output regression for localization, each default box generates 4 coordinates Value (x, y, w, h).
  • the five convolutional layers also generate a default box (generated coordinates) through a prior box layer. The number of default boxes for each of the five convolutional layers described above is given. Finally, the first three calculation results are combined and passed to the loss layer.
  • the training module 42 training the ticket extraction model includes:
  • a bill picture sample of each bill picture category is separately configured, and the bill picture sample is divided into a first ratio training set and a second proportion verification set.
  • the preset bill picture category includes a plurality of types, for example, an outpatient bill and an inpatient bill, and the first preset number is, for example, 1000 sheets, the first ratio is, for example, 75%, and the second ratio is, for example, 25%, wherein the second ratio is, for example, 25%.
  • the sum of the first ratio and the second ratio is less than or equal to 1.
  • the ticket extraction model is trained using the training set in the ticket picture sample of each ticket picture category.
  • the detection module 43 detects at least one text location from the ticket picture.
  • the detecting module 43 detects at least one text position from the ticket picture, including:
  • the ticket surface color filtering technology is prior art, and is not described in detail herein.
  • the character strokes of the filtered ticket picture are more clear and prominent, and the edge of the ticket is more complete, so that subsequent detection and identification are performed. Accuracy can be improved during operation.
  • the training samples for training the text position detection model are various types of ticket samples, such as bill list categories, hospital bill categories, catering bill categories, and the like.
  • the text position detection model learns the position of the key segments in the various types of ticket samples, so that the trained text position detection model can identify all the key segments from each type of ticket sample.
  • the location of the key segment of the hospital ticket category includes, but is not limited to, the location where the hospital name field is located, the location where the user name field is located, the location where the drug list field is located, the location where the date field is located, and the location where the ticket number field is located. and many more.
  • the text position detection model includes, but is not limited to, a CTPN (Connectionist Text Proposal Network) model.
  • the training module 42 training the text position detection model includes:
  • a bill picture sample of each bill picture category is separately configured, and the bill picture sample is divided into a first ratio training set and a second proportion verification set.
  • the preset bill picture category includes a plurality of types, for example, an outpatient bill and an inpatient bill, and the first preset number is, for example, 1000 sheets, the first ratio is, for example, 75%, and the second ratio is, for example, 25%, wherein the second ratio is, for example, 25%.
  • the sum of the first ratio and the second ratio is less than or equal to 1.
  • the text position detection model is trained using the bill picture samples marked in each bill picture category.
  • the identification module 44 identifies key fields at each of the at least one text position and obtains a machine identification result for the key field at each text position and a confidence in the machine identification result of the key field at each text position. degree.
  • the key fields at each text location are identified using the Warp-CTC algorithm.
  • the Warp-CTC is an improved Recurrent Neural Networks (RNN) model. It is a key code Warp-CTC that Baidu Silicon Valley Artificial Intelligence Lab has open sourced to make the intelligent software run more efficiently.
  • the Warp-CTC algorithm is compiled in C language and integrated. It can solve the monitoring problem in the process of drawing input sequence to output sequence map and is applied in recognition technology.
  • the Warp-CTC algorithm requires a small storage space that is hundreds of times faster than a normal CTC (Connectionist Temporal Classification).
  • the key fields at each text position are input into the trained improved RNN model, the key fields at each text position are processed, and the machine recognition results of the key fields at each text position are output and each The confidence of the machine identification result of the key segment at the text position.
  • training the improved RNN model includes:
  • the use of the ticket is different and the key fields of the ticket are also different.
  • the key fields include, but are not limited to, a hospital name field, a user name field, a medicine and drug field, a date field, and the like.
  • the obtaining module 40 acquires a first type of key field that meets the condition according to the confidence of the machine identification result of the key segment at each text position.
  • the eligible first type key field includes but is not limited to any one or a combination of the following:
  • a key segment having a confidence level of the machine identification result lower than or equal to the confidence threshold is used as a part of the first type of key field.
  • the confidence threshold may be a pre-configured threshold, such as (0.9).
  • the confidence threshold may also be configured based on the confidence of the machine identification results of all key fields, for example, the average of the confidence of the machine identification result of the key field as the confidence threshold or the like. In this way, the confidence threshold can be determined based on the actual data, so that the configuration of the confidence threshold is more in line with actual needs.
  • the first type of key field is removed, and the unqualified key field is the second type of key field.
  • the sending module 45 sends a picture of each key field in the first type of key fields to the crowdsourcing platform processing device.
  • the crowdsourcing platform is typically a mode of a large public network, and each user can register as a member user on the crowdsourcing platform in a free and voluntary manner, and the crowdsourcing platform processing device is used for the crowdsourcing Platform data.
  • the data sending module 46 sends a picture of each key field in the first type of key fields to multiple users, so that multiple users check the pictures of the same key field in the first type of key fields.
  • the crowdsourcing platform distributes each keyword segment of the first type of keyword segments as a task to multiple users for verification, so that multiple users check the image of the same keyword segment. .
  • the determining module 47 determines, according to the test result of the multiple users corresponding to each key segment in the first type of key segments, the recognition result of each key segment in the first type of key segments.
  • the determining module 47 uses, as the recognition result of each key segment, the check result exceeding the number of people thresholds in the test results provided by the plurality of users for each key segment in the first type of key fields. For example, if the date field is sent to three users, if three users have three different answers to the check result of the date field, it is determined that there is no correct result, and if two users have the same answer, The answer of the two users is taken as the test result of the date field.
  • the prompting module 48 for each keyword segment in the first type of key fields, does not have a verification result exceeding the number of people thresholds in the verification result provided by the plurality of users, to the identification processing
  • the device sends a prompt that cannot be verified, so that the identifying device sends a prompt to the terminal device to prompt the user to re-upload the ticket picture, thereby ensuring the accuracy of the identification.
  • the present application first uses the intelligent identification algorithm to detect and identify the key field, and obtains the confidence of the machine identification result of the key segment, and sends the key segment with the confidence lower than the threshold to the crowdsourcing platform for calibration.
  • the crowdsourcing platform Through the crowdsourcing platform, the same key segment is sent to multiple users for verification, and the verification results of multiple users for the same key segment are obtained, thereby improving the accuracy of the ticket identification, thereby quickly establishing a file.
  • the obtaining module 40 obtains a second type of key field that does not meet the condition, and determines a machine identification result of each key field in the second type of key field as each of the second type of key fields. The recognition result of the key fields.
  • the output module 49 summarizes the recognition result of each key segment in the first type of key segments in each ticket picture and the recognition result of each key segment in the second type of key segments, and outputs each ticket. The recognition result of the picture.
  • the picture to be identified includes one or more bill pictures.
  • a summary output is required.
  • a user's reimbursement form has multiple bills, which are all attached to one to be recognized, if only one is returned. The recognition result of the bill picture cannot be reimbursed for subsequent calculation.
  • the present application first uses the intelligent identification algorithm to detect and identify the key field, and obtains the confidence of the machine identification result of the key segment, and sends the key segment with the confidence lower than the threshold to the crowdsourcing platform for calibration.
  • the crowdsourcing platform Through the crowdsourcing platform, the same key segment is sent to multiple users for verification, and the verification results of multiple users for the same key segment are obtained, and finally the recognition result of the ticket image is output, thereby improving the accuracy of the ticket identification. Thereby quickly file.
  • the above-described integrated unit implemented in the form of a software function module can be stored in a non-volatile readable storage medium.
  • the above software function module is stored in a storage medium, and includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute the method described in each embodiment of the present application. Part of the steps.
  • the ticket identification device 4 includes at least one transmitting device 51, at least one memory 52, at least one processor 53, at least one receiving device 54, an identification device 55, and a crowdsourcing platform processing device 56, and at least one communication. bus.
  • the communication bus is used to implement connection communication between these components.
  • the identification device 55 and the crowdsourcing platform processing device 56 are not integrated in the ticket identification device 4, the identification device 55 being in communication with the crowdsourcing platform processing device 56 over a network.
  • the identification device 55 and the crowdsourcing platform processing device 56 may also be integrated into one device, such as in the ticket identification device 4, without network communication or the like.
  • the present application does not impose any limitation on the existence form of the identification device 55 and the crowdsourcing platform processing device 56 in the ticket identification device 4.
  • the identification device 55 and the crowdsourcing platform processing device 56 are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, the hardware of which includes but is not limited to a microprocessor and an application specific integrated circuit. (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded device, etc.
  • the ticket identification device 4 may also include a network device and/or a user device.
  • the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud computing-based cloud composed of a large number of hosts or network servers, where the cloud computing is distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the identification device 55 and the crowdsourcing platform processing device 56 may be, but are not limited to, any electronic product that can interact with a user through a keyboard, a touch pad or a voice control device, such as a tablet computer or a smart phone. Terminals such as Personal Digital Assistant (PDA), smart wearable devices, camera devices, and monitoring devices.
  • PDA Personal Digital Assistant
  • the network in which the identification device 55 and the crowdsourcing platform processing device 56 are located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VPN), and the like.
  • the Internet includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VPN), and the like.
  • VPN virtual private network
  • the receiving device 54 and the sending device 51 may be wired transmission ports, or may be wireless devices, for example, including antenna devices, for performing data communication with other devices.
  • the memory 52, the memory of the identification device 55, and the memory of the crowdsourcing platform processing device 56 are used to store program code.
  • the memory 52, the memory of the identification device 55, and the memory of the crowdsourcing platform processing device 56 may be circuits having a storage function, such as RAM (Random-Access Memory, random access memory), which have no physical form in the integrated circuit. ), FIFO (First In First Out,), etc.
  • the memory 52, the memory of the identification device 55, and the crowdsourcing platform processing device 56 may also be a memory having a physical form, such as a memory stick, a TF card (Trans-flash Card), and a smart media card (smart).
  • Storage devices such as media cards), secure digital cards, flash cards, and so on.
  • the processor 53, the processor of the identification device 55, and the processor of the crowdsourcing platform processing device 56 may include one or more microprocessors, digital processors.
  • the processor of the identification device 55 can invoke program code stored in the memory of the identification device 55 to perform related functions
  • the processor of the crowdsourcing platform processing device 56 can invoke the crowdsourcing platform processing device 56 Program code stored in memory to perform related functions.
  • the modules described in FIGS. 2 and 3 are program codes stored in the memory of the identification device 55 and the memory of the crowdsourcing platform processing device 56, and are processed by the processor of the identification device 55 and
  • the processor of the crowdsourcing platform processing device 56 executes to implement a ticket identification method.
  • the processor of the identification device 55 and the processor of the crowdsourcing platform processing device 56 also known as a central processing unit (CPU), are a very large-scale integrated circuit, which is an operation core (Core) and a control core. (Control Unit).
  • the processor 53 may invoke program code stored in the memory 52 to perform related functions, and the processor 53 may invoke program code stored in the memory 52 to perform related functions.
  • the various modules described in Figures 2 and 3 are program code stored in the memory 52 and executed by the processor 53 to implement a ticket identification method.
  • Embodiments of the present application also provide a non-volatile readable storage medium having stored thereon computer instructions that, when executed by a ticket identification device including one or more processors, cause the ticket identification device to perform as above The ticket identification method described in the method embodiment.
  • the memory of the identification device 55 and the memory of the crowdsourcing platform processing device 56 store a plurality of instructions to implement a ticket identification method
  • the processor of the identification device 55 being executable by the processor Deriving a plurality of instructions to: obtain a picture to be identified; extract a ticket picture from the picture to be identified; detect at least one text position from the ticket picture; identify a key at each text position in the at least one text position Field, and obtain the machine recognition result of the key field at each text position and the confidence of the machine recognition result of the key field at each text position; the confidence of the machine recognition result according to the key segment at each text position Obtaining a first type of key field that meets the condition; and sending an image of each key field in the first type of key field to the crowdsourcing platform processing device;
  • the processor of the crowdsourcing platform processing device 56 can execute the plurality of instructions to: send a picture of each key field in the first type of key fields to multiple users to enable multiple users to key to the first category The picture of the same key field in the field is verified; according to the test result of multiple users corresponding to each key field in the first type of key field, the identification of each key field in the first type of key field is determined. result;
  • the processor of the identification device 55 can execute the plurality of instructions to: obtain a second type of key field that does not meet the condition, and determine a machine identification result of each key field in the second type of key field as the first The recognition result of each key segment in the second type of key field;
  • the processor of the identification device executable to execute the plurality of instructions further includes:
  • Each of the at least one ticket picture is extracted using the trained ticket extraction model, wherein each ticket picture belongs to a category of training samples that train the ticket extraction model.
  • the processor of the identification device executable to execute the plurality of instructions further comprises: determining whether the position of each ticket picture is tilted before detecting at least one text position from the ticket picture, The positionally tilted bill picture is position corrected so that each bill picture is in a standard position.
  • the processor of the identification device executable to execute the plurality of instructions further includes:
  • the bill picture is processed by using a ticket surface color filtering technology to obtain a filtered bill picture
  • the eligible first type of key fields include, but are not limited to, any one or more of the following combinations:
  • Confidence of the machine recognition results of all key fields is sorted from large to small, and the preset number of digits is selected.
  • the processor of the crowdsourcing platform processing device executable to execute the plurality of instructions further includes:
  • a check result exceeding a number of people thresholds among the test results provided by the plurality of users is used as the recognition result of each key field.
  • the crowdsourcing platform processing device further includes:
  • the identification processing device For each key segment of the first type of key fields, when there is no verification result exceeding the number of people thresholds in the test results provided by the plurality of user segments, the identification processing device is sent to the identification processing device by the crowdsourcing processing device A prompt that cannot be verified is sent to cause the identification device to prompt the user to re-upload the ticket picture.
  • the processor of the identification device executable to execute the plurality of instructions further includes:
  • each ticket picture and the recognition result of each keyword segment in the second type of keyword segment are summarized by the identification device, and each ticket is output The recognition result of the picture.
  • the processor of the identification device executable to execute the plurality of instructions further comprises configuring the confidence threshold based on a confidence of a machine identification result of a key segment at each text location.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a non-volatile readable storage medium.
  • a computer device which may be a personal computer, server or network device, etc.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种票据识别方法:对输入的票据图片,先用智能识别算法对关键字段进行检测及识别,并得到关键字段的机器识别结果的置信度,将置信度低于阈值的关键字段发送至众包平台进行校验,通过众包平台将同一关键字段发送至多个用户进行校验,并获取多个用户对同一关键字段的校验结果,最后输出票据图片的识别结果。本申请还提供一种票据识别装置及存储介质。本申请能从而提高票据识别的准确率,从而快速建档。

Description

票据识别方法、装置及存储介质
本申请要求于2018年04月18日提交中国专利局,申请号为201810351126.1发明名称为“票据识别方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种票据识别方法、设备及存储介质。
背景技术
大型企业、机构、医院体检、保险行业等都有海量的票据需要进行信息的采集、录入以及电子化存档。目前我国的票据数字化管理程度还比较低,常采用的手动录入、人工建档的方式劳动强度大、效率低且成本开支大,而且容易出错。虽然目前能利用机器学习方法进行票据识别,但识别精度不高,这样就会造成票据多种信息的错误,无法快速建档,提高工作效率。
发明内容
鉴于以上内容,有必要提供一种票据识别方法、装置及存储介质,能提高票据识别的准确率,从而快速建档。
一种票据识别方法,所述方法包括:
识别设备获取待识别图片;
所述识别设备从所述待识别图片中提取票据图片;
所述识别设备从所述票据图片中检测至少一个文本位置;
所述识别设备识别所述至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度;
所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段;
所述识别设备将第一类关键字段中每个关键字段的图片发送至众包平台处理设备;
所述众包平台处理设备将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验;
根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果;
所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
一种票据识别装置,所述票据识别装置包括识别设备及众包平台处理设备;
所述识别设备获取待识别图片;
所述识别设备从所述待识别图片中提取票据图片;
所述识别设备从所述票据图片中检测至少一个文本位置;
所述识别设备识别所述至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度;
所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段;
所述识别设备将第一类关键字段中每个关键字段的图片发送至所述众包平台处理设备;
所述众包平台处理设备将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验;
根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果;
所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
一种非易失性可读存储介质,所述非易失性可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现任一实施例中所述票据识别方法。
由以上技术方案可知,本申请对输入的票据图片,先用智能识别算法对关键字段进行检测及识别,并得到关键字段的机器识别结果的置信度,将置信度低于阈值的关键字段发送至众包平台进行校验,通过众包平台将同一关键字段发送至多个用户进行校验,并获取多个用户对同一关键字段的校验结果,最后输出票据图片的识别结果。从而提高票据识别的准确率,从而快速建档。
附图说明
图1是实现本申请的票据识别方法的较佳实施例的应用环境图。
图2是本申请票据识别方法的较佳实施例的流程图。
图3是本申请票据识别装置的较佳实施例的程序模块图。
图4是本申请至少一个实例中票据识别装置的较佳实施例的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、 产品或设备固有的其它步骤或单元。
如图1所示,图1是实现本申请的票据识别方法的较佳实施例的应用环境图。所述应用环境图包括识别设备及众包平台处理设备。所述识别设备用于:获取待识别图片;从待识别图片中提取票据图片;并对票据图片进行文本检测,确定文本位置;识别文本位置处的关键字段,确定关键字段的机器识别结果及机器识别结果的置信度;基于关键字段的机器识别结果及机器识别结果的置信度,获取符合条件的第一类关键字段(如机器识别结果的置信度低于或者等于置信阈值的关键字段),将符合条件的第一类关键字段发送至众包平台平台处理设备。所述众包平台处理设备将同一个关键字段发送至众包平台的多个用户。所述第一类关键字段中每个关键字段的多个用户对所述第一类关键字段中每个关键字段进行校验,对将每个关键字段的多个用户提供的校验结果中超过人数置信阈值的校验结果作为所述第一类关键字段中每个关键字段的识别结果,并发送至识别设备。将不符合条件的第二类关键字段(例如机器识别结果的置信度高于置信阈值的关键字段)中每个关键字段的机器识别结果作为所述第二类关键字段中每个关键字段的识别结果。所述识别设备输出所述待识别图片中每个票据图片的识别结果。本申请结合智能识别算法与众包平台的优势,利用识别算法对票据图片进行数据的清洗、文本位置的定位,关键字段的切割及识别,并通过众包平台对智能识别算法无法识别的复杂字段的结果进行修正,从而提高票据识别的准确度,并提高票据录入的效率。
结合以下实施例详述利用所述票据识别装置实现票据识别方法。
如图2所示,是本申请票据识别方法的第一较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。
S20、所述识别设备获取待识别图片。所述识别设备包括,但不限于服务器等。所述识别设备可以与多个终端设备相通信,所述识别设备提供用户界面接口给用户。例如,一个用户需要报销,通过所述识别设备提供的用户接口,将报销的医院票据通过所述用户接口,上传至其所述识别设备中。
S21、所述识别设备从待识别图片中提取票据图片。所述票据图片包括至少一张票据图片,即一张或者多张票据图片。
优选地,所述识别设备从所述待识别图片中提取所述至少一票据图片中每张票据图片,判断每张票据图片的位置是否有倾斜,对位置倾斜的票据图片进行位置矫正以使每张票据图片都处于标准位置中。这样可以使每张票据图片都处于同一标准下,便于后续与票据模板进行匹配,提高文本位置检测的准确度。
进一步地,利用训练好的票据提取模型提取所述至少一张票据图片中每张票据图片,其中每张票据图片属于训练所述票据提取模型的训练样本的一个类别。利用票据提取模型可以从所述待识别图片中提取各种形状和大小的票据图片,从而使每个票据图片都能被提取出来。
进一步地,训练所述票据提取模型的训练样本为各种类别的票据样本,例如票据清单类别、医院票据类别、餐饮票据类别等等。在训练过程中,票 据提取模型去学习各种类别的票据样本的特征,这样利用训练好的票据提取模型,能从所述待识别图片中识别出训练样本中各种类别的票据图片,与各种类别的票据图片不相关的图片不会被提取出来。这样就可以提高票据识别精确度。
具体地,所述票据提取模型为深度卷积神经网络模型,包括,但不限于:SSD(Single Shot MultiBox Detector)模型。SSD算法是一种直接预测边界框(bounding box)的坐标和类别的目标检测(object detection)算法。针对不同大小的物体检测,传统的做法是将图像转换成不同的大小,然后分别处理,最后将结果综合起来,而SSD算法利用不同卷积层的特征映射(feature map)进行综合也能达到同样的效果。算法的主网络结构是VGG16,将两个全连接层改成卷积层再增加4个卷积层构造网络结构。对其中5个不同的卷积层的输出分别用两个3*3的卷积核进行卷积,一个输出分类用的置信(confidence),每个默认框(default box)生成第一数量(如5个)的confidence(这是针对VOC数据集包含第二数量(如4个)的目标(object)类别而言的);一个输出回归用的定位(localization),每个default box生成4个坐标值(x,y,w,h)。另外这5个卷积层还经过先验框(prior Box)层生成default box(生成的是坐标)。上面所述的5个卷积层中每一层的default box的数量是给定的。最后将前面三个计算结果分别合并然后传递给损失(loss)层。
在一可选的实施例中,训练所述票据提取模型的过程包括:
(1)对每种票据图片类别,分别配置每种票据图片类别的票据图片样本,将所述票据图片样本分为第一比例的训练集和第二比例的验证集。
其中,预设的票据图片类别包括多种,例如包括门诊类票据和住院类票据等,第一预设数量例如为1000张,第一比例例如为75%,第二比例例如为25%,其中,第一比例与第二比例之和小于等于1。
(2)利用每种票据图片类别的票据图片样本中的训练集训练所述票据提取模型。
(3)利用所述验证集验证训练的票据提取模型的准确率,若准确率大于或者等于预设准确率,则训练结束,或者,若准确率小于预设准确率,则增加每种票据图片类别的票据图片样本数量并重新执行上述步骤(2)、(3),直至训练的票据提取模型的准确率大于或者等于预设准确率。
S22、所述识别设备从票据图片中检测至少一个文本位置。
优选地,所述从票据图片中检测至少一个文本位置包括:
(a)、利用票面底色滤除技术对所述票据图片进行处理,得到滤除后的票据图片。
具体地,所述票面底色滤除技术为现有技术,此处不再详述,所述滤除后的票据图片的字符笔划更加清晰突出,票据的边线更加完整,这样后续做检测及识别操作时,可以提高准确度。
(b)、利用训练好的文本位置检测模型检测所述滤除后的票据图片中至少一个文本位置。
进一步地,训练所述文本位置检测模型的训练样本为各种类别的票据样 本,例如票据清单类别、医院票据类别、餐饮票据类别等等。在训练过程中,文本位置检测模型去学习各种类别的票据样本中关键字段所在的位置,这样训练好的文本位置检测模型,可以从每种类别的票据样本中识别出所有关键字段的位置所在。例如,医院票据类别的关键字段的位置包括,但不限于:医院名称字段所在的位置、用户名称字段所在的位置、药品清单字段所在的位置,日期字段所在的位置、票据号码字段所在的位置等等。
所述文本位置检测模型包括,但不限于:CTPN(Connectionist Text Proposal Network)模型。
在一可选的实施例中,训练所述文本位置检测模型的过程包括:
(1)对每种票据图片类别,分别配置每种票据图片类别的票据图片样本,将所述票据图片样本分为第一比例的训练集和第二比例的验证集。
其中,预设的票据图片类别包括多种,例如包括门诊类票据和住院类票据等,第一预设数量例如为1000张,第一比例例如为75%,第二比例例如为25%,其中,第一比例与第二比例之和小于等于1。
(2)标注每种票据图片类别中每个票据图片样本中的每个关键字段位置。
(3)利用每种票据图片类别中标注后的票据图片样本训练所述文本位置检测模型。
(4)利用所述验证集验证训练的文本位置检测模型的准确率,若准确率大于或者等于预设准确率,则训练结束,或者,若准确率小于预设准确率,则增加每种票据图片类别的票据图片样本数量并重新执行上述步骤(3)、(4),直至训练的文本位置检测模型的准确率大于或者等于预设准确率。
S23、所述识别设备识别至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度。
可选地,利用Warp-CTC算法对每个文本位置处的关键字段进行识别。所述Warp-CTC是一种改进的循环神经网络(RNN,Recurrent Neural Networks)模型,是百度硅谷人工智能实验室开源了可以让人工智能软件运行更高效的关键代码Warp-CTC。所述Warp-CTC算法运用C语言编译,并做了集成化处理。它可以解决绘制输入序列到输出序列图谱过程中的监督难题,应用于识别技术中。所述Warp-CTC算法所需的存储空间小,比普通CTC(Connectionist Temporal Classification)速度快数百倍。
进一步,将每个文本位置处的关键字段输入训练好的改进的RNN模型中,对每个文本位置处的关键字段进行处理,输出每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度。
具体地,训练所述改进的RNN模型包括:
(1)获取关键字段样本,将所述关键字段样本分为第一比例的训练集和第二比例的验证集。
(2)利用所述训练集中的关键字段样本训练所述改进的RNN模型。
(3)利用所述验证集验证训练的所述改进的RNN模型的准确率,若准确率大于或者等于预设准确率,则训练结束,或者,若准确率小于预设准确 率,则增加关键字段样本数量并重新执行上述步骤(2)、(3),直至训练的所述改进的RNN模型的准确率大于或者等于预设准确率。
优选地,票据的用途不同,票据的关键字段也是不同的。所述关键字段包括,但不限于:医院名称字段、用户名称字段、医药药品字段、日期字段等等。
S24、所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段。
优选地,所述符合条件的第一类关键字段包括但不限于以下任意一种或者多种的组合:
(1)将机器识别结果的置信度低于或者等于置信阈值的关键字段作为所述第一类关键字段的一部分。
进一步地,所述置信阈值可以是预先配置的阈值,例如(0.9)。也可以根据所有关键字段的机器识别结果的置信度进行配置所述置信阈值,例如,将所述关键字段的机器识别结果的置信度的平均值作为所述置信度阈值等等。这样可以根据实际数据确定置信阈值,使置信阈值的配置更符合实际需求。
(2)将所有关键字段的机器识别结果的置信度,按照从大到小进行排序,选取排在后预设位数(取后10位)的关键字段作为所述第一类关键字段的一部分。
优选地,在所有关键字段中,除去所述第一类关键字段,不符合条件的关键字段为第二类关键字段。
S25、所述识别设备将第一类关键字段中每个关键字段的图片发送至众包平台处理设备。
可选地,众包平台通常是大型的大众网络的模式,每个用户可以以自由自愿的形式在所述众包平台上注册成为会员用户,所述众包平台处理设备用于所述众包平台的数据。
S26、所述众包平台处理设备将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验。
可选地,所述众包平台将所述第一类关键字段中每个关键字段作为一个任务分发给多个用户进行校验,使多个用户对同一关键字段的图片进行校验。
S27、所述众包平台处理设备根据第一类关键字段中每个关键字段对应的多个用户的检验结果,确定第一类关键字段中每个关键字段的识别结果。
优选地,对于所述第一类关键字段中每个关键字段,将多个用户提供的检验结果中超过人数阈值的校验结果作为每个关键字段的识别结果。例如,将日期字段发送给三个用户,若三个用户对所述日期字段的校验结果有三个不同的答案,则确定没有正确结果,若三个用户中,有两个用户的答案相同,则将两个用户的答案作为日期字段的检验结果。
优选地,对于所述第一类关键字段中每个关键字段,在多个用户提供的检验结果中,不存在超过人数阈值的校验结果时,向所述识别处理设备发送无法检验的提示,以使所述识别设备向终端设备发送提示,提示用户重新上 传票据图片,从而保证识别的精度。
通过上述实施,本申请先用智能识别算法对关键字段进行检测及识别,并得到关键字段的机器识别结果的置信度,将置信度低于阈值的关键字段发送至众包平台进行校验,通过众包平台将同一关键字段发送至多个用户进行校验,并获取多个用户对同一关键字段的校验结果,从而提高票据识别的准确率,从而快速建档。
S28、所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果。
S29、所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
所述待识别图片包括一张或者多张票据图片,为了后续计算的方便,需要进行汇总输出,例如,一个用户的报销单有多个票据,都贴在一个待识别图片中,若只是返回一个票据图片的识别结果,后续无法报销计算。
通过上述实施,本申请先用智能识别算法对关键字段进行检测及识别,并得到关键字段的机器识别结果的置信度,将置信度低于阈值的关键字段发送至众包平台进行校验,通过众包平台将同一关键字段发送至多个用户进行校验,并获取多个用户对同一关键字段的校验结果,最后输出票据图片的识别结果,从而提高票据识别的准确率,从而快速建档。
如图3所示,本申请票据识别装置的第一较佳实施例的程序模块图。所述票据识别装置4包括,但不限于以下一个或者多个程序模块:获取模块40、提取模块41、训练模块42、检测模块43、识别模块44、发送模块45、数据发送模块46、确定模块47、提示模块48及输出模块49。本申请所称的程序模块是指一种能够被票据识别装置4的处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。关于各模块的功能将在后续的实施例中详述。
在优选实施例中,所述识别设备的存储器用于存储以下一个或者多个程序模块:获取模块40、提取模块41、训练模块42、检测模块43、识别模块44及发送模块45及输出模块49,并通过所述识别设备的处理器执行所述一个或者多个模块:获取模块40、提取模块41、训练模块42、检测模块43、识别模块44及发送模块45。所述众包平台处理设备的存储器用于存储以下一个或者多个程序模块:数据发送模块46、确定模块47及提示模块48,并通过所述众包平台处理设备的处理器执行所述一个或者多个程序模块:数据发送模块46、确定模块47及提示模块48。
所述获取模块40获取待识别图片。所述识别设备包括,但不限于服务器等。所述识别设备可以与多个终端设备相通信,所述识别设备提供用户界面接口给用户。例如,一个用户需要报销,通过所述识别设备提供的用户接口,将报销的医院票据通过所述用户接口,上传至其所述识别设备中。
所述提取模块41从待识别图片中提取票据图片。所述票据图片包括至 少一张票据图片,即一张或者多张票据图片。
优选地,所述提取模块41从所述待识别图片中提取所述至少一票据图片中每张票据图片,判断每张票据图片的位置是否有倾斜,对位置倾斜的票据图片进行位置矫正以使每张票据图片都处于标准位置中。这样可以使每张票据图片都处于同一标准下,便于后续与票据模板进行匹配,提高文本位置检测的准确度。
进一步地,所述提取模块41利用训练好的票据提取模型提取所述至少一张票据图片中每张票据图片,其中每张票据图片属于训练所述票据提取模型的训练样本的一个类别。利用票据提取模型可以从所述待识别图片中提取各种形状和大小的票据图片,从而使每个票据图片都能被提取出来。
进一步地,所述训练模块42训练所述票据提取模型的训练样本为各种类别的票据样本,例如票据清单类别、医院票据类别、餐饮票据类别等等。在训练过程中,票据提取模型去学习各种类别的票据样本的特征,这样利用训练好的票据提取模型,能从所述待识别图片中识别出训练样本中各种类别的票据图片,与各种类别的票据图片不相关的图片不会被提取出来。这样就可以提高票据识别精确度。
具体地,所述票据提取模型为深度卷积神经网络模型,包括,但不限于:SSD(Single Shot MultiBox Detector)模型。SSD算法是一种直接预测边界框(bounding box)的坐标和类别的目标检测(object detection)算法。针对不同大小的物体检测,传统的做法是将图像转换成不同的大小,然后分别处理,最后将结果综合起来,而SSD算法利用不同卷积层的特征映射(feature map)进行综合也能达到同样的效果。算法的主网络结构是VGG16,将两个全连接层改成卷积层再增加4个卷积层构造网络结构。对其中5个不同的卷积层的输出分别用两个3*3的卷积核进行卷积,一个输出分类用的置信(confidence),每个默认框(default box)生成第一数量(如5个)的confidence(这是针对VOC数据集包含第二数量(如4个)的目标(object)类别而言的);一个输出回归用的定位(localization),每个default box生成4个坐标值(x,y,w,h)。另外这5个卷积层还经过先验框(prior Box)层生成default box(生成的是坐标)。上面所述的5个卷积层中每一层的default box的数量是给定的。最后将前面三个计算结果分别合并然后传递给损失(loss)层。
在一可选的实施例中,所述训练模块42训练所述票据提取模型的过程包括:
(1)对每种票据图片类别,分别配置每种票据图片类别的票据图片样本,将所述票据图片样本分为第一比例的训练集和第二比例的验证集。
其中,预设的票据图片类别包括多种,例如包括门诊类票据和住院类票据等,第一预设数量例如为1000张,第一比例例如为75%,第二比例例如为25%,其中,第一比例与第二比例之和小于等于1。
(2)利用每种票据图片类别的票据图片样本中的训练集训练所述票据提取模型。
(3)利用所述验证集验证训练的票据提取模型的准确率,若准确率大于 或者等于预设准确率,则训练结束,或者,若准确率小于预设准确率,则增加每种票据图片类别的票据图片样本数量并重新执行上述步骤(2)、(3),直至训练的票据提取模型的准确率大于或者等于预设准确率。
所述检测模块43从票据图片中检测至少一个文本位置。
优选地,所述检测模块43从票据图片中检测至少一个文本位置包括:
(a)、利用票面底色滤除技术对所述票据图片进行处理,得到滤除后的票据图片。
具体地,所述票面底色滤除技术为现有技术,此处不再详述,所述滤除后的票据图片的字符笔划更加清晰突出,票据的边线更加完整,这样后续做检测及识别操作时,可以提高准确度。
(b)、利用训练好的文本位置检测模型检测所述滤除后的票据图片中至少一个文本位置。
进一步地,训练所述文本位置检测模型的训练样本为各种类别的票据样本,例如票据清单类别、医院票据类别、餐饮票据类别等等。在训练过程中,文本位置检测模型去学习各种类别的票据样本中关键字段所在的位置,这样训练好的文本位置检测模型,可以从每种类别的票据样本中识别出所有关键字段的位置所在。例如,医院票据类别的关键字段的位置包括,但不限于:医院名称字段所在的位置、用户名称字段所在的位置、药品清单字段所在的位置,日期字段所在的位置、票据号码字段所在的位置等等。
所述文本位置检测模型包括,但不限于:CTPN(Connectionist Text Proposal Network)模型。
在一可选的实施例中,所述训练模块42训练所述文本位置检测模型的过程包括:
(1)对每种票据图片类别,分别配置每种票据图片类别的票据图片样本,将所述票据图片样本分为第一比例的训练集和第二比例的验证集。
其中,预设的票据图片类别包括多种,例如包括门诊类票据和住院类票据等,第一预设数量例如为1000张,第一比例例如为75%,第二比例例如为25%,其中,第一比例与第二比例之和小于等于1。
(2)标注每种票据图片类别中每个票据图片样本中的每个关键字段位置。
(3)利用每种票据图片类别中标注后的票据图片样本训练所述文本位置检测模型。
(4)利用所述验证集验证训练的文本位置检测模型的准确率,若准确率大于或者等于预设准确率,则训练结束,或者,若准确率小于预设准确率,则增加每种票据图片类别的票据图片样本数量并重新执行上述步骤(3)、(4),直至训练的文本位置检测模型的准确率大于或者等于预设准确率。
所述识别模块44识别至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度。可选地,利用Warp-CTC算法对每个文本位置处的关键字段进行识别。所述Warp-CTC是一种改进的循环神经网络(RNN,Recurrent Neural Networks)模型,是百度硅谷人工智能实验室开源了可以让人 工智能软件运行更高效的关键代码Warp-CTC。所述Warp-CTC算法运用C语言编译,并做了集成化处理。它可以解决绘制输入序列到输出序列图谱过程中的监督难题,应用于识别技术中。所述Warp-CTC算法所需的存储空间小,比普通CTC(Connectionist Temporal Classification)速度快数百倍。
进一步,将每个文本位置处的关键字段输入训练好的改进的RNN模型中,对每个文本位置处的关键字段进行处理,输出每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度。
具体在,训练所述改进的RNN模型包括:
(1)获取关键字段样本,将所述关键字段样本分为第一比例的训练集和第二比例的验证集。
(2)利用所述训练集中的关键字段样本训练所述改进的RNN模型。
(3)利用所述验证集验证训练的所述改进的RNN模型的准确率,若准确率大于或者等于预设准确率,则训练结束,或者,若准确率小于预设准确率,则增加关键字段样本数量并重新执行上述步骤(2)、(3),直至训练的所述改进的RNN模型的准确率大于或者等于预设准确率。
优选地,票据的用途不同,票据的关键字段也是不同的。所述关键字段包括,但不限于:医院名称字段、用户名称字段、医药药品字段、日期字段等等。
所述获取模块40根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段。
优选地,所述符合条件的第一类关键字段包括但不限于以下任意一种或者多种的组合:
(1)将机器识别结果的置信度低于或者等于置信阈值的关键字段作为所述第一类关键字段的一部分。
进一步地,所述置信阈值可以是预先配置的阈值,例如(0.9)。也可以根据所有关键字段的机器识别结果的置信度进行配置所述置信阈值,例如,将所述关键字段的机器识别结果的置信度的平均值作为所述置信度阈值等等。这样可以根据实际数据确定置信阈值,使置信阈值的配置更符合实际需求。
(2)将所有关键字段的机器识别结果的置信度,按照从大到小进行排序,选取排在后预设位数(取后10位)的关键字段作为所述第一类关键字段的一部分。
优选地,在所有关键字段中,除去所述第一类关键字段,不符合条件的关键字段为第二类关键字段。
所述发送模块45将第一类关键字段中每个关键字段的图片发送至众包平台处理设备。
可选地,众包平台通常是大型的大众网络的模式,每个用户可以以自由自愿的形式在所述众包平台上注册成为会员用户,所述众包平台处理设备用于所述众包平台的数据。
所述数据发送模块46将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验。
可选地,所述众包平台将所述第一类关键字段中每个关键字段作为一个任务分发给多个用户进行校验,使多个用户对同一关键字段的图片进行校验。
所述确定模块47根据第一类关键字段中每个关键字段对应的多个用户的检验结果,确定第一类关键字段中每个关键字段的识别结果。
优选地,所述确定模块47对于所述第一类关键字段中每个关键字段,将多个用户提供的检验结果中超过人数阈值的校验结果作为每个关键字段的识别结果。例如,将日期字段发送给三个用户,若三个用户对所述日期字段的校验结果有三个不同的答案,则确定没有正确结果,若三个用户中,有两个用户的答案相同,则将两个用户的答案作为日期字段的检验结果。
优选地,所述提示模块48对于所述第一类关键字段中每个关键字段,在多个用户提供的检验结果中,不存在超过人数阈值的校验结果时,向所述识别处理设备发送无法检验的提示,以使所述识别设备向终端设备发送提示,提示用户重新上传票据图片,从而保证识别的精度。
通过上述实施,本申请先用智能识别算法对关键字段进行检测及识别,并得到关键字段的机器识别结果的置信度,将置信度低于阈值的关键字段发送至众包平台进行校验,通过众包平台将同一关键字段发送至多个用户进行校验,并获取多个用户对同一关键字段的校验结果,从而提高票据识别的准确率,从而快速建档。
在优选实施例中,所述获取模块40获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果。
所述输出模块49将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
所述待识别图片包括一张或者多张票据图片,为了后续计算的方便,需要进行汇总输出,例如,一个用户的报销单有多个票据,都贴在一个待识别图片中,若只是返回一个票据图片的识别结果,后续无法报销计算。
通过上述实施,本申请先用智能识别算法对关键字段进行检测及识别,并得到关键字段的机器识别结果的置信度,将置信度低于阈值的关键字段发送至众包平台进行校验,通过众包平台将同一关键字段发送至多个用户进行校验,并获取多个用户对同一关键字段的校验结果,最后输出票据图片的识别结果,从而提高票据识别的准确率,从而快速建档。
上述以软件功能模块的形式实现的集成的单元,可以存储在一个非易失性可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请每个实施例所述方法的部分步骤。
如图4所示,所述票据识别装置4包括至少一个发送装置51、至少一个存储器52、至少一个处理器53、至少一个接收装置54、识别设备55及众包平台处理设备56以及至少一个通信总线。其中,所述通信总线用于实现这些组件之间的连接通信。
在优选实施例中,所述识别设备55与所述众包平台处理设备56没有集成在所述票据识别装置4中,所述识别设备55与所述众包平台处理设备56通过网络相通信。在其他实施例中,所述识别设备55及众包平台处理设备56也可以集成在一个设备中,如所述票据识别装置4中,无需进行网络通信等等。本申请对所述票据识别装置4中的识别设备55及众包平台处理设备56的存在形式不做任何限制。
所述识别设备55及众包平台处理设备56是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。所述票据识别装置4还可包括网络设备和/或用户设备。其中,所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
所述识别设备55及众包平台处理设备56可以是,但不限于任何一种可与用户通过键盘、触摸板或声控设备等方式进行人机交互的电子产品,例如,平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、智能式穿戴式设备、摄像设备、监控设备等终端。
所述识别设备55及众包平台处理设备56所处的网络包括,但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
其中,所述接收装置54和所述发送装置51可以是有线发送端口,也可以为无线设备,例如包括天线装置,用于与其他设备进行数据通信。
所述存储器52、所述识别设备55的存储器及所述众包平台处理设备56的存储器用于存储程序代码。所述存储器52、所述识别设备55的存储器及所述众包平台处理设备56的存储器可以是集成电路中没有实物形式的具有存储功能的电路,如RAM(Random-Access Memory,随机存取存储器)、FIFO(First In First Out,)等。或者,所述存储器52、所述识别设备55的存储器及所述众包平台处理设备56也可以是具有实物形式的存储器,如内存条、TF卡(Trans-flash Card)、智能媒体卡(smart media card)、安全数字卡(secure digital card)、快闪存储器卡(flash card)等储存设备等等。
所述处理器53、所述识别设备55的处理器及众包平台处理设备56的处理器可以包括一个或者多个微处理器、数字处理器。所述识别设备55的处理器可调用所述识别设备55的存储器中存储的程序代码以执行相关的功能,所述众包平台处理设备56的处理器可调用所述众包平台处理设备56的存储器中存储的程序代码以执行相关的功能。例如,图2及图3中所述的各个模块是存储在所述识别设备55的存储器及所述众包平台处理设备56的存储器中的程序代码,并由所述识别设备55的处理器及所述众包平台处理设备56的处理器所执行,以实现一种票据识别方法。所述识别设备55的处理器及所述众包平台处 理设备56的处理器又称中央处理器(CPU,Central Processing Unit),是一块超大规模的集成电路,是运算核心(Core)和控制核心(Control Unit)。
在其他实施例中,所述处理器53可调用所述存储器52中存储的程序代码以执行相关的功能,所述处理器53可调用所述存储器52中存储的程序代码以执行相关的功能。例如,图2及图3中所述的各个模块是存储在所述存储器52中的程序代码,并由所述处理器53所执行,以实现一种票据识别方法。
本申请实施例还提供一种非易失性可读存储介质,其上存储有计算机指令,所述指令当被包括一个或多个处理器的票据识别装置执行时,使票据识别装置执行如上文方法实施例所述的票据识别方法。
优选地,结合图2所示,所述识别设备55的存储器及所述众包平台处理设备56的存储器存储多个指令以实现一种票据识别方法,所述识别设备55的处理器可执行所述多个指令从而实现:获取待识别图片;从所述待识别图片中提取票据图片;从所述票据图片中检测至少一个文本位置;识别所述至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度;根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段;将第一类关键字段中每个关键字段的图片发送至众包平台处理设备;
所述众包平台处理设备56的处理器可执行所述多个指令从而实现:将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验;根据第一类关键字段中每个关键字段对应的多个用户的检验结果,确定第一类关键字段中每个关键字段的识别结果;
所述识别设备55的处理器可执行所述多个指令从而实现:获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
根据本申请优选实施例,所述识别设备的处理器可执行所述多个指令还包括:
利用训练好的票据提取模型提取所述至少一张票据图片中每张票据图片,其中每张票据图片属于训练所述票据提取模型的训练样本的一个类别。
根据本申请优选实施例,所述识别设备的处理器可执行所述多个指令还包括:在从所述票据图片中检测至少一个文本位置之前,判断每张票据图片的位置是否有倾斜,对位置倾斜的票据图片进行位置矫正以使每张票据图片都处于标准位置中。
根据本申请优选实施例,所述识别设备的处理器可执行所述多个指令还包括:
利用票面底色滤除技术对所述票据图片进行处理,得到滤除后的票据图片;
利用训练好的文本位置检测模型检测所述滤除后的票据图片中至少一个文本位置,其中训练所述文本位置检测模型的训练样本为各种类别的票据样本;
根据本申请优选实施例,所述符合条件的第一类关键字段包括但不限于以 下任意一种或者多种的组合:
将机器识别结果的置信度低于或者等于置信阈值的关键字段作为所述第一类关键字段的一部分;
将所有关键字段的机器识别结果的置信度,按照从大到小进行排序,选取排在后预设位数。
根据本申请优选实施例,所述众包平台处理设备的处理器可执行所述多个指令还包括:
对于所述第一类关键字段中每个关键字段,将多个用户提供的检验结果中超过人数阈值的校验结果作为每个关键字段的识别结果。
根据本申请优选实施例,所述众包平台处理设备还包括:
对于所述第一类关键字段中每个关键字段,在多个用户提供的检验结果中,不存在超过人数阈值的校验结果时,通过所述众包处理设备向所述识别处理设备发送无法检验的提示,以使所述识别设备提示用户重新上传票据图片。
根据本申请优选实施例,所述识别设备的处理器可执行所述多个指令还包括:
通过所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
通过所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
根据本申请优选实施例,所述识别设备的处理器可执行所述多个指令还包括:根据每个文本位置处关键字段的机器识别结果的置信度配置所述置信阈值。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请的各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (20)

  1. 一种票据识别方法,其特征在于,所述方法包括:
    识别设备获取待识别图片;
    所述识别设备从所述待识别图片中提取票据图片;
    所述识别设备从所述票据图片中检测至少一个文本位置;
    所述识别设备识别所述至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度;
    所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段;
    所述识别设备将第一类关键字段中每个关键字段的图片发送至众包平台处理设备;
    所述众包平台处理设备将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验;
    根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果;
    所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
    所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
  2. 如权利要求1所述的票据识别方法,其特征在于,所述识别设备从所述待识别图片中提取票据图片包括:
    所述识别设备利用训练好的票据提取模型提取所述至少一张票据图片中每张票据图片,其中每张票据图片属于训练所述票据提取模型的训练样本的一个类别。
  3. 如权利要求1所述的票据识别方法,其特征在于,在所述识别设备从所述票据图片中检测至少一个文本位置之前,所述方法还包括:
    所述识别设备判断每张票据图片的位置是否有倾斜,对位置倾斜的票据图片进行位置矫正以使每张票据图片都处于标准位置中。
  4. 如权利要求1所述的票据识别方法,其特征在于,所述识别设备从所述票据图片中检测至少一个文本位置包括:
    所述识别设备利用票面底色滤除技术对所述票据图片进行处理,得到滤除后的票据图片;
    所述识别设备利用训练好的文本位置检测模型检测所述滤除后的票据图片中至少一个文本位置,其中训练所述文本位置检测模型的训练样本为各种类别的票据样本。
  5. 如权利要求1所述的票据识别方法,其特征在于,所述符合条件的第一类关键字段包括但不限于以下任意一种或者多种的组合:
    将机器识别结果的置信度低于或者等于置信阈值的关键字段作为所述第一类关键字段的一部分;
    将所有关键字段的机器识别结果的置信度,按照从大到小进行排序,选取排在后预设位数。
  6. 如权利要求1所述的票据识别方法,其特征在于,所述根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果包括:
    所述众包平台处理设备对于所述第一类关键字段中每个关键字段,将多个用户提供的检验结果中超过人数阈值的校验结果作为每个关键字段的识别结果。
  7. 如权利要求1所述的票据识别方法,其特征在于,所述方法还包括:
    对于所述第一类关键字段中每个关键字段,在多个用户提供的检验结果中,不存在超过人数阈值的校验结果时,所述众包处理设备向所述识别处理设备发送无法检验的提示,以使所述识别设备提示用户重新上传票据图片。
  8. 如权利要求5所述的票据识别方法,其特征在于,所述方法还包括:
    所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度配置所述置信阈值。
  9. 一种票据识别装置,其特征在于,所述票据识别装置包括识别设备及众包平台处理设备;
    所述识别设备获取待识别图片;
    所述识别设备从所述待识别图片中提取票据图片;
    所述识别设备从所述票据图片中检测至少一个文本位置;
    所述识别设备识别所述至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度;
    所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段;
    所述识别设备将第一类关键字段中每个关键字段的图片发送至所述众包平台处理设备;
    所述众包平台处理设备将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验;
    根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果;
    所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
    所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
  10. 如权利要求9所述的票据识别装置,其特征在于,所述识别设备从所述待识别图片中提取票据图片包括:
    所述识别设备利用训练好的票据提取模型提取所述至少一张票据图片中每 张票据图片,其中每张票据图片属于训练所述票据提取模型的训练样本的一个类别。
  11. 如权利要求9所述的票据识别装置,其特征在于,在所述识别设备从所述票据图片中检测至少一个文本位置之前,所述识别设备判断每张票据图片的位置是否有倾斜,对位置倾斜的票据图片进行位置矫正以使每张票据图片都处于标准位置中。
  12. 如权利要求9所述的票据识别装置,其特征在于,所述识别设备从所述票据图片中检测至少一个文本位置包括:
    所述识别设备利用票面底色滤除技术对所述票据图片进行处理,得到滤除后的票据图片;
    所述识别设备利用训练好的文本位置检测模型检测所述滤除后的票据图片中至少一个文本位置,其中训练所述文本位置检测模型的训练样本为各种类别的票据样本。
  13. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现以下步骤:
    识别设备获取待识别图片;
    所述识别设备从所述待识别图片中提取票据图片;
    所述识别设备从所述票据图片中检测至少一个文本位置;
    所述识别设备识别所述至少一个文本位置中每个文本位置处的关键字段,并得到每个文本位置处关键字段的机器识别结果及每个文本位置处关键字段的机器识别结果的置信度;
    所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度,获取符合条件的第一类关键字段;
    所述识别设备将第一类关键字段中每个关键字段的图片发送至众包平台处理设备;
    所述众包平台处理设备将第一类关键字段中每个关键字段的图片发送至多个用户以使多个用户对第一类关键字段中同一关键字段的图片进行校验;
    根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果;
    所述识别设备获取不符合条件的第二类关键字段,将第二类关键字段中每个关键字段的机器识别结果确定为第二类关键字段中每个关键字段的识别结果;
    所述识别设备将每个票据图片中第一类关键字段中每个关键字段的识别结果及第二类关键字段中每个关键字段的识别结果进行汇总,并输出每个票据图片的识别结果。
  14. 如权利要求13所述的存储介质,其特征在于,所述识别设备从所述待识别图片中提取票据图片包括:
    所述识别设备利用训练好的票据提取模型提取所述至少一张票据图片中每张票据图片,其中每张票据图片属于训练所述票据提取模型的训练样本的一个类别。
  15. 如权利要求13所述的存储介质,其特征在于,在所述识别设备从所述 票据图片中检测至少一个文本位置之前,所述至少一个指令被处理器执行时还实现以下步骤:
    所述识别设备判断每张票据图片的位置是否有倾斜,对位置倾斜的票据图片进行位置矫正以使每张票据图片都处于标准位置中。
  16. 如权利要求13所述的存储介质,其特征在于,所述识别设备从所述票据图片中检测至少一个文本位置包括:
    所述识别设备利用票面底色滤除技术对所述票据图片进行处理,得到滤除后的票据图片;
    所述识别设备利用训练好的文本位置检测模型检测所述滤除后的票据图片中至少一个文本位置,其中训练所述文本位置检测模型的训练样本为各种类别的票据样本。
  17. 如权利要求13所述的存储介质,其特征在于,所述符合条件的第一类关键字段包括但不限于以下任意一种或者多种的组合:
    将机器识别结果的置信度低于或者等于置信阈值的关键字段作为所述第一类关键字段的一部分;
    将所有关键字段的机器识别结果的置信度,按照从大到小进行排序,选取排在后预设位数。
  18. 如权利要求13所述的存储介质,其特征在于,所述根据第一类关键字段中每个关键字段对应的多个用户的检验结果,所述众包平台处理设备确定第一类关键字段中每个关键字段的识别结果包括:
    所述众包平台处理设备对于所述第一类关键字段中每个关键字段,将多个用户提供的检验结果中超过人数阈值的校验结果作为每个关键字段的识别结果。
  19. 如权利要求13所述的存储介质,其特征在于,所述至少一个指令被处理器执行时还实现以下步骤:
    对于所述第一类关键字段中每个关键字段,在多个用户提供的检验结果中,不存在超过人数阈值的校验结果时,所述众包处理设备向所述识别处理设备发送无法检验的提示,以使所述识别设备提示用户重新上传票据图片。
  20. 如权利要求17所述的存储介质,其特征在于,所述至少一个指令被处理器执行时还实现以下步骤:
    所述识别设备根据每个文本位置处关键字段的机器识别结果的置信度配置所述置信阈值。
PCT/CN2018/100156 2018-04-18 2018-08-13 票据识别方法、装置及存储介质 WO2019200781A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810351126.1A CN108664897A (zh) 2018-04-18 2018-04-18 票据识别方法、装置及存储介质
CN201810351126.1 2018-04-18

Publications (1)

Publication Number Publication Date
WO2019200781A1 true WO2019200781A1 (zh) 2019-10-24

Family

ID=63780286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100156 WO2019200781A1 (zh) 2018-04-18 2018-08-13 票据识别方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN108664897A (zh)
WO (1) WO2019200781A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942066A (zh) * 2019-11-27 2020-03-31 中国银行股份有限公司 票据核对方法及装置
CN110991456A (zh) * 2019-12-05 2020-04-10 北京百度网讯科技有限公司 票据识别方法及装置
CN111046886A (zh) * 2019-12-12 2020-04-21 吉林大学 号码牌自动识别方法、装置、设备及计算机可读存储介质
CN112232336A (zh) * 2020-09-02 2021-01-15 深圳前海微众银行股份有限公司 一种证件识别方法、装置、设备及存储介质
CN116992496A (zh) * 2023-09-28 2023-11-03 武汉彤新科技有限公司 一种用于企业服务管理的数据资源安全监督系统

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461247A (zh) * 2018-10-29 2019-03-12 北京慧流科技有限公司 票据验证方法及装置、电子设备及存储介质
CN109858420A (zh) * 2019-01-24 2019-06-07 国信电子票据平台信息服务有限公司 一种票据处理系统和处理方法
CN109977957A (zh) * 2019-03-04 2019-07-05 苏宁易购集团股份有限公司 一种基于深度学习的发票识别方法及系统
CN110135409B (zh) * 2019-04-04 2023-11-03 平安科技(深圳)有限公司 识别模型的优化方法和装置
CN110110123B (zh) * 2019-04-04 2023-07-25 平安科技(深圳)有限公司 检测模型的训练集更新方法和装置
CN110188755B (zh) * 2019-05-30 2021-09-07 北京百度网讯科技有限公司 一种图像识别的方法、装置和计算机可读存储介质
CN110263694A (zh) * 2019-06-13 2019-09-20 泰康保险集团股份有限公司 一种票据识别方法及装置
CN110399875A (zh) * 2019-07-31 2019-11-01 山东浪潮人工智能研究院有限公司 一种基于深度学习与像素投影的通用表格信息提取方法
CN111160142B (zh) * 2019-12-14 2023-07-11 上海交通大学 一种基于数值预测回归模型的证件票据定位检测方法
CN111160188A (zh) * 2019-12-20 2020-05-15 中国建设银行股份有限公司 金融票据识别方法、装置、设备及存储介质
CN111444792B (zh) * 2020-03-13 2023-05-09 安诚迈科(北京)信息技术有限公司 票据识别方法、电子设备、存储介质及装置
CN111428599B (zh) * 2020-03-17 2023-10-20 北京子敬科技有限公司 票据识别方法、装置和设备
CN111461097A (zh) * 2020-03-18 2020-07-28 北京大米未来科技有限公司 识别图像信息的方法、装置、电子设备及介质
CN111461099A (zh) * 2020-03-27 2020-07-28 重庆农村商业银行股份有限公司 一种票据识别的方法、系统、设备及可读存储介质
CN111428725A (zh) * 2020-04-13 2020-07-17 北京令才科技有限公司 数据结构化处理方法、装置和电子设备
CN112837466B (zh) * 2020-12-18 2023-04-07 北京百度网讯科技有限公司 票据识别方法、装置、设备以及存储介质
CN112861782B (zh) * 2021-03-07 2023-06-20 上海大学 票据照片关键信息提取系统及方法
CN112989990B (zh) * 2021-03-09 2023-08-04 平安科技(深圳)有限公司 医疗票据识别方法、装置、设备及存储介质
CN113963149A (zh) * 2021-10-29 2022-01-21 平安科技(深圳)有限公司 一种医疗票据图片的模糊判断方法、系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186781A (zh) * 2011-12-31 2013-07-03 北京新媒传信科技有限公司 文本识别方法
CN105005742A (zh) * 2015-07-30 2015-10-28 四川长虹电器股份有限公司 一种数据处理方法及数据处理系统
CN105243365A (zh) * 2015-09-28 2016-01-13 四川长虹电器股份有限公司 一种数据处理方法及数据处理系统
US20170351913A1 (en) * 2016-06-07 2017-12-07 The Neat Company, Inc. d/b/a Neatreceipts, Inc. Document Field Detection And Parsing
CN107766809A (zh) * 2017-10-09 2018-03-06 平安科技(深圳)有限公司 电子装置、票据信息识别方法和计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103996239B (zh) * 2014-06-13 2016-08-24 广州广电运通金融电子股份有限公司 一种基于多线索融合的票据定位识别方法及系统
CN105095919A (zh) * 2015-09-08 2015-11-25 北京百度网讯科技有限公司 图像识别方法和装置
CN106530528B (zh) * 2016-10-11 2020-02-18 上海慧银信息科技有限公司 收银票据信息识别方法及装置
CN107798299B (zh) * 2017-10-09 2020-02-07 平安科技(深圳)有限公司 票据信息识别方法、电子装置及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186781A (zh) * 2011-12-31 2013-07-03 北京新媒传信科技有限公司 文本识别方法
CN105005742A (zh) * 2015-07-30 2015-10-28 四川长虹电器股份有限公司 一种数据处理方法及数据处理系统
CN105243365A (zh) * 2015-09-28 2016-01-13 四川长虹电器股份有限公司 一种数据处理方法及数据处理系统
US20170351913A1 (en) * 2016-06-07 2017-12-07 The Neat Company, Inc. d/b/a Neatreceipts, Inc. Document Field Detection And Parsing
CN107766809A (zh) * 2017-10-09 2018-03-06 平安科技(深圳)有限公司 电子装置、票据信息识别方法和计算机可读存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942066A (zh) * 2019-11-27 2020-03-31 中国银行股份有限公司 票据核对方法及装置
CN110942066B (zh) * 2019-11-27 2023-07-25 中国银行股份有限公司 票据核对方法及装置
CN110991456A (zh) * 2019-12-05 2020-04-10 北京百度网讯科技有限公司 票据识别方法及装置
CN110991456B (zh) * 2019-12-05 2023-07-07 北京百度网讯科技有限公司 票据识别方法及装置
CN111046886A (zh) * 2019-12-12 2020-04-21 吉林大学 号码牌自动识别方法、装置、设备及计算机可读存储介质
CN111046886B (zh) * 2019-12-12 2023-05-12 吉林大学 号码牌自动识别方法、装置、设备及计算机可读存储介质
CN112232336A (zh) * 2020-09-02 2021-01-15 深圳前海微众银行股份有限公司 一种证件识别方法、装置、设备及存储介质
CN116992496A (zh) * 2023-09-28 2023-11-03 武汉彤新科技有限公司 一种用于企业服务管理的数据资源安全监督系统
CN116992496B (zh) * 2023-09-28 2023-12-29 武汉彤新科技有限公司 一种用于企业服务管理的数据资源安全监督系统

Also Published As

Publication number Publication date
CN108664897A (zh) 2018-10-16

Similar Documents

Publication Publication Date Title
WO2019200781A1 (zh) 票据识别方法、装置及存储介质
WO2019120115A1 (zh) 人脸识别的方法、装置及计算机装置
US20230013306A1 (en) Sensitive Data Classification
WO2019169688A1 (zh) 车辆定损方法、装置、电子设备及存储介质
WO2019200782A1 (zh) 样本数据分类方法、模型训练方法、电子设备及存储介质
WO2021208721A1 (zh) 联邦学习防御方法、装置、电子设备及存储介质
WO2017220032A1 (zh) 基于深度学习的车牌分类方法、系统、电子装置及存储介质
WO2019174130A1 (zh) 票据识别方法、服务器及计算机可读存储介质
CN110276366A (zh) 使用弱监督模型来检测对象
WO2019085329A1 (zh) 基于循环神经网络的人物性格分析方法、装置及存储介质
US20190102655A1 (en) Training data acquisition method and device, server and storage medium
WO2022213465A1 (zh) 基于神经网络的图像识别方法、装置、电子设备及介质
TWI712980B (zh) 理賠資訊提取方法和裝置、電子設備
US9436930B2 (en) Method and apparatus for recognizing image content
WO2022105179A1 (zh) 生物特征图像识别方法、装置、电子设备及可读存储介质
US20190294900A1 (en) Remote user identity validation with threshold-based matching
WO2019085331A1 (zh) 欺诈可能性分析方法、装置及存储介质
CN111695392B (zh) 基于级联的深层卷积神经网络的人脸识别方法及系统
WO2023015935A1 (zh) 一种体检项目推荐方法、装置、设备及介质
CN110738235B (zh) 肺结核判定方法、装置、计算机设备及存储介质
WO2019200702A1 (zh) 去网纹系统训练方法、去网纹方法、装置、设备及介质
CN112509690B (zh) 用于控制质量的方法、装置、设备和存储介质
CN112507090B (zh) 用于输出信息的方法、装置、设备和存储介质
CN108108711B (zh) 人脸布控方法、电子设备及存储介质
US20230410220A1 (en) Information processing apparatus, control method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915397

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915397

Country of ref document: EP

Kind code of ref document: A1