CN111178219A - Bill identification management method and device, storage medium and electronic equipment - Google Patents

Bill identification management method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111178219A
CN111178219A CN201911343761.6A CN201911343761A CN111178219A CN 111178219 A CN111178219 A CN 111178219A CN 201911343761 A CN201911343761 A CN 201911343761A CN 111178219 A CN111178219 A CN 111178219A
Authority
CN
China
Prior art keywords
bill
historical
image
current
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911343761.6A
Other languages
Chinese (zh)
Inventor
梁爽
李夫路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201911343761.6A priority Critical patent/CN111178219A/en
Publication of CN111178219A publication Critical patent/CN111178219A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure provides a bill identification management method, a bill identification management device, a bill identification management medium and electronic equipment, and belongs to the technical field of computers. The bill identification management method comprises the following steps: acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill; carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image; training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; acquiring a current bill image; carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image; and processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.

Description

Bill identification management method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for bill identification management, a computer-readable storage medium, and an electronic device.
Background
Invoices are an important basis for personal reimbursement and accounting by financial staff. In order to avoid finding false invoices in the checking process of the tax authority, the enterprises are identified as tax evasion and punishment, and financial staff needs to check the authenticity of a large number of invoices.
In the related technology, the invoice authenticity check mainly adopts three modes: firstly, logging in a website for query, and the query can be realized only by manually inputting all invoice information into a webpage; secondly, calling 12366 to inquire, the invoice information needs to be manually input, the steps of the inquiry process are complicated, and the time for waiting for the prompt tone is long; and thirdly, the service hall of the tax bureau is used for field verification, and a great deal of time is spent in the process of going to and from the service hall and queuing.
All three above modes are more loaded down with trivial details, and the inspection efficiency is lower, needs financial staff to spend a large amount of time.
Therefore, a new bill identification management method, apparatus, medium, and electronic device are needed.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a method, an apparatus, a medium, and an electronic device for bill identification management, so as to overcome at least to some extent the problem of low efficiency in bill authenticity identification in the related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to one aspect of the present disclosure, there is provided a bill identification management method, including: acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill; carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image; training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; acquiring a current bill image; carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image; and processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
According to an aspect of the present disclosure, there is provided a bill identifying management apparatus including: the system comprises a historical bill information acquisition unit, a data processing unit and a data processing unit, wherein the historical bill information acquisition unit is used for acquiring a historical bill image and a label thereof, and the label is one of a real bill, a false bill and a virtual opening of the real bill; the historical bill vector acquisition unit is used for carrying out optical character recognition on the historical bill image to acquire a historical bill feature vector of the historical bill image; the machine classification model training unit is used for training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; the current bill image acquisition unit is used for acquiring a current bill image; the current bill vector acquisition unit is used for carrying out optical character recognition on the current bill image to acquire a current bill feature vector of the current bill image; and the bill identification result obtaining unit is used for processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
According to an aspect of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, the program, when executed by a processor, implementing a ticket identification management method according to any one of the embodiments described above.
According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the ticket identification management method of any of the above embodiments.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in the technical scheme provided by some embodiments of the present disclosure, a machine learning classification model for automatically identifying the bill authenticity can be trained based on a large amount of historical bill information, when a new current bill image is received, the current bill image is subjected to optical character recognition to extract the current bill feature vector of the current bill image, and the current bill feature vector is input into the machine learning classification model after the training is completed, so that the authenticity identification result of the current bill image can be obtained, the automation of bill authenticity identification is realized, and the efficiency and the accuracy of bill authenticity identification are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 schematically illustrates a flow diagram of a ticket identification management method according to one embodiment of the present disclosure;
FIG. 2 schematically shows a flow chart of one embodiment of step S120 in FIG. 1;
FIG. 3 schematically shows a flow chart of another embodiment of step S120 in FIG. 1;
FIG. 4 schematically illustrates a flow diagram of a ticket identification management method according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of a ticket recognition management device according to one embodiment of the present disclosure;
FIG. 6 schematically illustrates a block diagram of a ticket recognition management device according to one embodiment of the present disclosure;
FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware units or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 schematically shows a flowchart of a ticket recognition management method according to an embodiment of the present disclosure, and an execution subject of the ticket recognition management method may be a device having a calculation processing function, such as a server and/or a mobile terminal.
As shown in fig. 1, the method for identifying and managing a ticket provided by the embodiment of the present disclosure may include the following steps.
In step S110, a history ticket image and a label thereof, the label being one of a real ticket, a fake ticket and a virtual opening of a real ticket, are acquired.
The note mentioned in the embodiments of the present disclosure may be any type of note such as an invoice, a receipt, a check, etc., and the present disclosure does not limit this. In the following embodiments, the bill is mainly used as an invoice for illustration.
The false invoice comprises two cases, namely a false invoice, namely the invoice is false and belongs to a false invoice which is forged and not subject to the supervision of a tax authority; the other is true invoice is false invoice, namely the invoice is true, but the reflected economic business is false or not in accordance with reality, the fact of false economic business, or although the invoice is true, the reflected economic business is true, the invoicer does not buy from the tax authority, but buys from other units or individuals.
For example, the labels of the historical invoice images can be classified into three categories according to the two cases of invoice counterfeiting: a true invoice (which may be labeled as "0", for example), a false invoice (which may be labeled as "1", for example), and a false invoice (which may be labeled as "2", for example).
In an exemplary embodiment, the method may further include: and storing the historical bill images, the historical bill feature vectors and the labels thereof to a historical bill library in a block chain. The historical bill images, the historical bill feature vectors and the labels thereof can be stored into the block chain as part of the historical bill information.
The Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The consensus mechanism is a mathematical algorithm for establishing trust and obtaining rights and interests among different nodes in the blockchain system.
A blockchain is essentially a decentralized database. The block chain is a series of data blocks which are associated by using a cryptographic method, and each data block contains information of one bitcoin network transaction, so that the validity (anti-counterfeiting) of the information is verified and the next block is generated.
In a narrow sense, the blockchain is a distributed account book which is a chain data structure formed by combining data blocks in a sequential connection mode according to a time sequence and is guaranteed in a cryptographic mode and cannot be tampered and forged.
Broadly, the blockchain technique is a completely new distributed infrastructure and computing approach that utilizes blockchain data structures to verify and store data, utilizes distributed node consensus algorithms to generate and update data, utilizes cryptography to secure data transmission and access, and utilizes intelligent contracts composed of automated script code to program and manipulate data.
Generally, a blockchain system consists of a data layer, a network layer, a consensus layer, a stimulus layer, a contract layer, and an application layer. The data layer encapsulates a bottom layer data block, basic data such as related data encryption and time stamp and a basic algorithm; the network layer comprises a distributed networking mechanism, a data transmission mechanism, a data verification mechanism and the like; the consensus layer mainly encapsulates various consensus algorithms of the network nodes; the incentive layer integrates economic factors into a block chain technology system, and mainly comprises an economic incentive issuing mechanism, an economic incentive distributing mechanism and the like; the contract layer mainly encapsulates various scripts, algorithms and intelligent contracts and is the basis of the programmable characteristic of the block chain; the application layer encapsulates various application scenarios and cases of the blockchain. In the model, a chained block structure based on a timestamp, a consensus mechanism of distributed nodes, economic excitation based on consensus computing power and a flexible programmable intelligent contract are the most representative innovation points of the block chain technology.
In the embodiment of the present disclosure, the method may further include a step of constructing the blockchain node and the blockchain network, which is responsible for construction, update and maintenance of the blockchain node and the blockchain network. For example, one or more groups/companies participate in the enterprise provider transaction invoice verification information sharing service and management transaction block chain network construction with the base business organization of each company as a minimum node.
In the embodiment of the present disclosure, the method may further include defining an information storage and information authentication data format in advance, that is, storing and authenticating the shared information according to the data structure manner, the information storage manner, and the protocol defined in the embodiment of the present disclosure, so as to ensure high efficiency of information storage and information processing.
In the embodiment of the disclosure, an enterprise or an individual registered in the system uploads enterprise provider transaction invoice verification and confirmation information sharing services and management cases, electronic certificate issuer information (e.g., full name, account number, bank of deposit), electronic certificate receiver information (e.g., full name, account number, bank of deposit), certificate number information and the like, and receipt verification and confirmation information sharing services and management update information to a block chain, so that it can be proved that related materials such as audio, video, images and the like of the related materials can also be uploaded to the block chain, and thus, the information stored in the block chain has the characteristics of privacy protection (e.g., technical means such as authority management, picture or video watermarking, encryption and the like), transparency in disclosure, traceability, uneasiness in tampering and the like.
In the related technology, a plurality of companies or enterprises and public institutions store the photographed images of the electronic invoices or the paper invoices by using the database to serve as backups of the paper invoices, but the centralized storage mode is easy to attack, the data storage structure is simple, the data storage structure is easy to tamper, information leakage is easy, and invoice information is easy to tamper. The bill identification management method based on the block chain can effectively realize bill authenticity identification and bill management in the block chain network. The method can utilize a transaction chain data structure of a block chain hash pointer and a mechanism of Hash calculation of cryptography and digital signature of cryptography to realize multi-level evidence confirmation in the transaction process, thereby realizing the trust problem among different individual transaction parties. Meanwhile, the bill information is stored by using the block chain, and the method has the characteristics of privacy protection, traceability, tamper resistance and the like.
In step S120, Optical Character Recognition (OCR) is performed on the historical ticket image, and a historical ticket feature vector of the historical ticket image is acquired.
Specific implementation processes can be referred to the following description of the embodiments of fig. 2 and 3.
In step S130, a machine learning classification model is trained according to the historical ticket feature vector and the label corresponding to the historical ticket image.
In an exemplary embodiment, the machine-learned classification model may include a logistic regression classification model, but the present disclosure is not so limited and any other suitable machine-learned model may also be employed.
For example, for historical electronic invoices of enterprise suppliers stored in a block chain, 8 characteristics such as invoice type x1, invoice number x2, invoice code x3, invoicing date x4, supplier name x5, supplier taxpayer identification x6, invoice amount x7, invoice tax amount x8 and the like are extracted according to data identified by an OCR image. Wherein, for the characteristics of non-quantitative description, such as invoice types, the labeling operation is carried out according to the difference of the categories. For continuously changing quantitative characteristics, such as invoice amount, firstly discretization is carried out, namely, the quantitative characteristics are divided into different grades or levels according to the numerical value from small to large, and then labeling is carried out. After preprocessing each feature value, a history bill feature vector X is formed [ X1, X2, X3, X4, X5, X6, X7, and X8 ]. Assuming that n historical electronic invoices are stored in the block chain, forming a historical bill feature matrix M of historical bill feature vectors of the n historical electronic invoices [ X1; x2; …, respectively; xn ], wherein n is a positive integer greater than or equal to 1. For the historical bill feature vector of each electronic invoice, if the corresponding electronic invoice is a true invoice, the corresponding electronic invoice is marked as 0, if the corresponding electronic invoice is a false invoice, the corresponding electronic invoice is marked as 1, and if the corresponding electronic invoice is a false invoice, the corresponding electronic invoice is marked as 2. Thus forming an object vector label corresponding to the historical bill feature matrix. The logistic regression classification model is selected as a machine learning classification model, the historical bill feature matrix M is used as the input of the logistic regression classification model, and the model is trained and the model parameters are learned according to the corresponding target vector label.
In step S140, a current ticket image is acquired.
For example, the current ticket image may be a scanned image of the invoice currently to be identified.
In step S150, performing optical character recognition on the current bill image, and acquiring a current bill feature vector of the current bill image.
Specific implementation processes can be referred to the following description of the embodiments of fig. 2 and 3.
In step S160, the current bill feature vector is processed through the machine learning classification model, and a bill identification result of the current bill image is obtained, where the bill identification result is one of a real bill, a false bill, and a virtual opening of a real bill.
In an exemplary embodiment, processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image may include: inputting the current bill feature vector into a trained machine learning classification model, and predicting the probability that the current bill image is true, false and true bill is false; and taking the result corresponding to the maximum value of the probability that the current bill image is true, false and true bill is false as the bill identification result of the current bill image.
For example, if the probability that the current invoice is true is 0.5, the probability that the current invoice is false is 0.3, and the probability that the current invoice is true invoice is 0.2 in the prediction result, the output bill identification result of the current invoice is a true invoice.
In the embodiment of the present disclosure, a preset threshold for determining whether the current invoice is a true invoice may also be set, and only when the probability of predicting to be true is greater than the preset threshold, the current invoice is determined to be a true invoice. The size of the preset threshold value can be set according to actual requirements. Generally, if the application has a higher requirement on safety, the preset threshold may be set higher, for example, 0.9; if the application has a low requirement for safety, the preset threshold may be set to be lower, for example, 0.5, which is not limited by the present disclosure.
In an exemplary embodiment, the method may further include: and correcting the credit score of the current object corresponding to the current bill image according to the bill identification result.
In the embodiment of the present disclosure, the current object may be any one or more of an operator who uploads the current ticket information, an issuing organization or an individual who issues the current ticket, a purchasing organization or an individual who purchases a commodity corresponding to the current ticket, and the like.
And if the bill identification result is that the current bill corresponding to the current bill image is true, the credit score of the current object can be improved. And if the bill identification result indicates that the current bill is false or the current bill is false, the credit score of the current object can be reduced.
Further, the method may further include: and counting the times and/or the total amount of false opening and false opening of the bill of each object, and if the counted times and/or the total amount exceed a preset threshold value, sending out early warning information, and pulling the corresponding object into a blacklist.
In an exemplary embodiment, the method may further include: and storing the current bill image, the current bill feature vector and the bill identification result thereof to the historical bill library so as to update the historical bill library. Namely, the current bill image, the current bill feature vector and the bill identification result are used as new historical bill information, and the bill identification result is used as a label of the new historical bill feature vector.
In an exemplary embodiment, the method may further include: and storing the machine learning classification model and the trained model parameters thereof into the block chain. When the historical bill library is updated, namely new historical bill information is stored, the machine learning classification model can be continuously trained according to the historical bill feature vector and the label of the new historical bill image, so that the model parameters of the machine learning classification model can be updated, and the updated model parameters of the machine learning classification model are stored in the block chain. Therefore, the classification accuracy of the machine learning classification model can be continuously improved along with the continuous accumulation of data quantity in the historical bill library.
The bill identification management method provided by the embodiment of the disclosure can train a machine learning classification model for automatically identifying the authenticity of bills based on a large amount of historical bill information, and when a new current bill image is received, the optical character identification is carried out on the current bill image, the current bill feature vector of the current bill image is extracted, and the current bill feature vector is input into the machine learning classification model after the training is completed, so that the authenticity identification result of the current bill image can be obtained, the automation of bill authenticity identification is realized, and the efficiency and the accuracy of bill authenticity identification are improved.
The embodiment of the disclosure provides a system for effectively realizing enterprise supplier transaction invoice verification and confirmation information sharing service and management in a blockchain network, wherein specific transaction information is shown in the following table 1:
TABLE 1
Figure BDA0002332790790000091
Figure BDA0002332790790000101
According to the technical scheme provided by the embodiment of the disclosure, on one hand, historical bill information is stored by using a block chain technology, a decentralized storage mode can be realized, the characteristics of privacy protection, traceability, tamper resistance and the like are achieved, and the safety and reliability of stored data are ensured, so that the information leakage of the bill data in the bill authenticity identification process can be prevented, and the safety of bill data storage is improved; on the other hand, a machine learning classification model (refer to the following) for automatically identifying the bill authenticity can be trained based on a large amount of historical bill information stored in the block chain, when a new block of the current bill image is generated in the block chain, the current bill feature vector of the current bill image is extracted, and the current bill feature vector is input into the machine learning classification model after training, so that the bill identification result of the current bill image can be obtained, the automation of bill authenticity identification is realized, and the efficiency and the accuracy of bill authenticity identification are improved.
Fig. 2 schematically shows a flow chart of an embodiment of step S120 in fig. 1.
As shown in fig. 2, in the embodiment of the present disclosure, the step S120 may further include the following steps.
In step S121, noise removal, inclination correction, and binarization processing are performed on the history bill image to obtain a preprocessed image.
Image noise refers to unnecessary or unnecessary interference information present in the image data. Different noise elimination modes can be adopted according to different types of image noise. Thus, it is possible here to first identify the image noise type and then select the corresponding noise removal mode. For example, if the noise is gaussian noise, gaussian filtering may be used to remove the gaussian noise. If the noise is impulse noise, a nonlinear filter can be used to process the impulse noise.
When the bill is optically scanned, the scanned image position is not correct due to objective reasons, and the later image processing is influenced, so that the image can be corrected. The image tilt direction and tilt angle are automatically detected from the image features, and for example, the tilt angle may be detected by a projection-based method, by a Hough transform, by a linear fit, or by performing a fourier transform to a frequency domain. Then, the inclination of the image is corrected based on the image inclination direction and the inclination angle.
And then, carrying out image binarization, and setting the gray value of a pixel point on the image to be 0 or 255, namely, the whole image presents an obvious black-and-white effect.
In step S122, the text area in the preprocessed image is identified, the text area is processed in lines, and each line of text is divided into single text one by one.
And performing layout analysis and character cutting on the preprocessed image, identifying character areas in the layout, performing line division processing, and dividing each line of characters into single characters one by one.
In step S123, the character feature vector of each character is extracted one by one.
And then, extracting character features, and extracting the standardized features of each character one by one to be used as character feature vectors for subsequent character recognition.
In step S124, the text feature vector of each text is input into a text classifier, and a text recognition result in the preprocessed image is obtained.
And then, character recognition is carried out, the character feature vector extracted from the current character is input into a character classifier which is trained in advance, and the recognized character is output. The character classifier can adopt any machine learning model, neural network model or deep learning model, the character feature vector of the sample character in the training data set is input to the character classifier, the predicted character is output, the loss function of the character classifier is calculated according to the sample character and the predicted character, and the loss function is returned in a gradient mode to optimize so as to obtain the trained character classifier.
In some embodiments, the trained word classifier and its model parameters may be stored into a blockchain.
In step S125, the language context of the character recognition result in the preprocessed image is analyzed by a language model, and the character recognition result output by the character classifier is corrected to obtain the historical bill feature vector.
The word classifier is used for recognizing a single word without considering the language context relationship between words, so that the word recognition result is possibly unreasonable. In the embodiment of the present disclosure, a Recursive Neural Network Language Model (RNNLM) may be used to perform post-processing on the text recognition result of the text classifier, so as to correct the text recognition result output by the text classifier and generate a final historical bill feature vector. RNNLM trains a language model through RNN and a variant network thereof, the task is to predict the next word through the above, the RNN breaks the limitation of a context window, all context information is summarized by using the state of a hidden layer, longer dependence can be captured, and meanwhile, the RNNLM has few ultra-parameters and stronger universality.
In an exemplary embodiment, the step S150 may further include the steps of: carrying out noise removal, gradient correction and binarization processing on the current bill image to obtain a preprocessed image; recognizing a character area in the preprocessed image, performing line division processing on the character area, and dividing each line of characters into single characters one by one; extracting character feature vectors of each character one by one; inputting the character feature vector of each character into a character classifier to obtain a character recognition result in the preprocessed image; and analyzing the language context relationship of the character recognition result in the preprocessed image through a language model, correcting the character recognition result output by the character classifier, and obtaining the current bill feature vector. The specific implementation process can refer to the generation process of the historical bill feature vector.
Fig. 3 schematically shows a flow chart of another embodiment of step S120 in fig. 1.
As shown in fig. 3, step S120 in the embodiment of fig. 1 may further include the following steps in the embodiment of the present disclosure.
In step S126, the optical character recognition is performed on the historical bill image, and the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image are obtained.
In step S127, the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount, and the bill tax amount of the history bill image are processed to generate the history bill feature vector.
In an exemplary embodiment, processing the ticket type, the ticket number, the ticket code, the billing date, the supplier name, the supplier taxpayer identification number, the ticket amount, and the ticket tax amount of the historical ticket image to generate the historical ticket feature vector may include: respectively carrying out quantitative representation on the bill type and the supplier name; and discretizing the bill amount and the bill tax amount.
In an exemplary embodiment, the method may further include: identifying the current bill image based on an OCR technology to obtain the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the current bill image; and processing the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the current bill image to generate the current bill feature vector.
Taking the current bill as the electronic invoice and the current bill image includes the electronic invoice image as an example, the electronic invoice image may be recognized based on OCR. And acquiring the verification confirmation information of the business supplier transaction invoice from the blockchain. The method comprises the steps of automatically slicing, cutting, determining types, positioning areas, preprocessing and the like on an electronic invoice image, then identifying all columns, and deriving information such as the type, number, code, date, name of a purchaser, identification number of a taxpayer of the purchaser, name of a seller, identification number of a taxpayer of the seller, amount of money, tax amount, tax adding amount and the like of the invoice.
In an exemplary embodiment, processing the ticket type, the ticket number, the ticket code, the billing date, the supplier name, the supplier taxpayer identification number, the ticket amount, and the ticket tax amount of the current ticket image to generate the current ticket feature vector may include: respectively carrying out quantitative representation on the bill type and the supplier name; and discretizing the bill amount and the bill tax amount.
It should be noted that, in different application scenarios, if the types of the tickets are different, the extracted historical ticket feature vector and the current ticket feature vector may be changed accordingly, which is not limited in this disclosure.
Taking an electronic invoice of an enterprise supplier as an example, according to data identified by the OCR image, 8 characteristics such as an invoice type x1, an invoice number x2, an invoice code x3, an invoicing date x4, a supplier name x5, a supplier taxpayer identification number x6, an invoice amount x7, an invoice tax amount x8 and the like are extracted.
Wherein, for the characteristics which are not quantitatively described, such as the invoice types, the labeling operation can be carried out on the characteristics according to different types. For example, if the invoice types include value-added tax-specific invoices, general invoices, motor vehicle-specific invoices, machine-run invoices, quota invoices, cut-out invoices, and the like, they may be set to have tag values of 1, 2, 3, 4, 5, and 6, respectively. A similar process can be done for vendor names.
The method comprises the steps of firstly discretizing continuously-changing quantitative characteristics, such as invoice amount, namely dividing the continuously-changing quantitative characteristics into different grades or levels according to numerical values from small to large, and then labeling. For example, the invoice amount is divided into a first grade between 0-100 yuan, the invoice amount is divided into a second grade between 100-1000 yuan, the invoice amount is divided into a third grade between 1000-2000 yuan, … and so on, wherein the first to m-th grades correspond to label values 1 to m, respectively.
After each feature value is preprocessed, a feature vector X of the electronic invoice is formed [ X1, X2, X3, X4, X5, X6, X7, and X8 ].
It should be noted that, in other embodiments, the feature vector of the electronic invoice does not necessarily need all the 8 features described above, and some of the features may be selected, which is limited by the present disclosure.
FIG. 4 schematically shows a flow diagram of a ticket recognition management method according to another embodiment of the present disclosure. In this embodiment, the bill is taken as an invoice, and the machine learning prediction model is taken as a logistic regression prediction model for example.
As shown in fig. 4, the difference from the above embodiment is that the bill identifying and managing method provided by the embodiment of the present disclosure may further include the following steps.
In step S410, obtaining historical object information corresponding to the historical bill image according to the historical bill feature vector.
For example, the historical bill information stored in the block chain is respectively inquired about taxpayers and enterprise related information corresponding to each collected invoice, wherein the information includes the name of a billing unit, the industry, enterprise legal persons, registration addresses, financial staff information, tax handling staff information, entry and sale items, the change condition of the amount of invoices issued in a period of time, whether tax clerks are simultaneously used as tax clerks of other enterprises in the same industry and the like.
In step S420, a history object feature vector is obtained according to the history object information.
For example, the information is preprocessed (classified and labeled) to form a history object feature vector, and history object feature vectors of taxpayers corresponding to all invoices form a history object feature matrix.
In step S430, determining a label of the feature vector of the historical object according to the historical ticket image and the label thereof, where the label is not provided with a false ticket issuing behavior or is provided with a false ticket issuing behavior.
For example, according to the historical bill information, it is counted whether each taxpayer and related enterprises have the behavior of false invoicing, that is, the number of all invoices related to the taxpayer which are identified as true invoices is counted as N, if N is greater than 0, the label of the historical object feature vector corresponding to the taxpayer is set as 1, otherwise, the label is set as 0. And forming a target vector by the label value sets corresponding to all the taxpayers.
In step S440, a machine learning prediction model is trained according to the historical object feature vectors and their labels.
And training a logistic regression prediction model according to the historical object feature matrix and the target vector of the taxpayer.
In step S450, current object information is acquired.
In step S460, a current object feature vector of the current object is extracted according to the current object information.
In step S470, the current object feature vector is processed by the machine learning prediction model, and an object prediction result of the current object is obtained, where the object prediction result is that the virtual billing behavior is not present or is present.
For example, for a taxpayer to be identified, inputting the current object feature vector of the taxpayer to a trained logistic regression prediction model, outputting a probability prediction value of the taxpayer, which may have a false invoicing behavior, and obtaining an object prediction result according to the probability prediction value, for example, if the probability prediction value exceeds a set threshold, the taxpayer has the false invoicing behavior; if the probability predicted value is smaller than the set threshold value, the taxpayer does not have a false invoicing behavior.
The method provided by the embodiment of the disclosure can not only identify the authenticity of the bill, but also further identify whether the taxpayer has the behavior of making a false invoice according to the authenticity identification result of the bill.
In an exemplary embodiment, the method may further include: and correcting the credit score of the current object according to the bill identification result and the object prediction result.
For example, if a taxpayer submits a false invoice or a false invoice, the credit score of the taxpayer is lowered, and the updated credit score is stored in the blockchain.
In an exemplary embodiment, the method may further include: and storing the historical object information, the historical object feature vector, the label of the historical object feature vector and the model parameter of the trained machine learning prediction model into a historical object library in the block chain.
In an exemplary embodiment, the method may further include: and storing the current object information, the current object characteristic vector and the object prediction result of the current object characteristic vector into a historical object library, and updating the historical object library.
In an exemplary embodiment, the method may further include: the reputation score of the current object is stored in a historical object repository.
For the content not described in detail in the embodiments of the present disclosure, reference may be made to the other embodiments described above, which are not described again.
In the embodiment of the present disclosure, the method may further include: the timeliness, effectiveness and accuracy of business provider transaction issuing and receipt verification confirmation information sharing service and management are evaluated, and system parameters are continuously adjusted and optimized based on an image and character automatic identification technology, a receipt frame and a verification method of most block chain nodes participated in by character cross random combination authentication.
The bill identification management method provided by the embodiment of the disclosure provides a verification method of participation of most blockchain nodes based on an image character automatic identification technology, a receipt frame and character cross random combination authentication according to the characteristics of enterprise supplier transaction issuing and receipt verification confirmation information sharing service stored in a blockchain, privacy protection, public transparency, traceability, uneasiness in tampering and the like of management information, based on the historical data of the business supplier transaction invoice issuing and receipt verification confirmation information sharing service and management information in the block chain, the system automatically analyzes and dynamically identifies the possible problems of the invoice receipt in real time, and updates the corresponding invoice receipt status and the credit scores of the relevant operators and relevant institutions, thereby effectively promoting the application of blockchain technology in the aspects of business provider transaction issuing and receipt verification confirmation information sharing service and management. With the wide application of the block chain technology in multiple fields of transaction issuing of enterprise suppliers, receipt verification and confirmation information sharing service and management, medical treatment, endowment, insurance, finance, logistics and the like, the invention inevitably brings considerable economic and social benefits.
The following describes embodiments of the disclosed apparatus, which can be used to implement the above-mentioned bill identification management method of the present disclosure.
Fig. 5 schematically shows a block diagram of a ticket recognition management apparatus according to one embodiment of the present disclosure.
As shown in fig. 5, the bill identifying management apparatus 500 provided by the embodiment of the present disclosure may include a history bill information acquiring unit 510, a history bill vector acquiring unit 520, a machine classification model training unit 530, a current bill image acquiring unit 540, a current bill vector acquiring unit 550, and a bill identification result acquiring unit 560.
The historical ticket information acquiring unit 510 may be configured to acquire a historical ticket image and a label thereof, where the label is one of a real ticket, a fake ticket, and a virtual opening of a real ticket.
The historical bill vector acquiring unit 520 may be configured to perform optical character recognition on the historical bill image, and acquire a historical bill feature vector of the historical bill image.
The machine classification model training unit 530 may be configured to train a machine learning classification model according to the historical ticket feature vectors and the labels corresponding to the historical ticket images.
The current ticket image acquiring unit 540 may be used to acquire a current ticket image.
The current bill vector acquiring unit 550 may be configured to perform optical character recognition on the current bill image to acquire a current bill feature vector of the current bill image.
The bill identification result obtaining unit 560 may be configured to process the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, where the bill identification result is one of a real bill, a fake bill, and a virtual opening of a real bill.
In an exemplary embodiment, the history ticket vector acquiring unit 520 may include: the image preprocessing unit can be used for carrying out noise removal, gradient correction and binarization processing on the historical bill image to obtain a preprocessed image; the layout analysis and character cutting unit can be used for identifying the character area in the preprocessed image, performing line division processing on the character area, and dividing each line of characters into single characters one by one; the character feature extraction unit can be used for extracting the character feature vector of each character one by one; the character recognition unit can be used for inputting the character feature vector of each character into the character classifier to obtain a character recognition result in the preprocessed image; and the post-processing correction unit can be used for analyzing the language context relationship of the character recognition result in the preprocessed image through a language model, correcting the character recognition result output by the character classifier and obtaining the historical bill feature vector.
In an exemplary embodiment, the history ticket vector acquiring unit 520 may include: the optical character recognition unit can be used for carrying out optical character recognition on the historical bill image to obtain the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image; and the historical bill vector generating unit can be used for processing the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image to generate the historical bill feature vector.
In an exemplary embodiment, the history ticket vector generating unit may include: the quantification unit can be used for respectively carrying out quantification representation on the bill type and the supplier name; and the discrete unit can be used for carrying out discretization representation on the bill amount and the bill tax amount.
In an exemplary embodiment, the ticket recognition management apparatus 500 may further include: the historical bill information storage unit can be used for storing the historical bill images, the historical bill feature vectors and the labels thereof to a historical bill library in a block chain; and the historical bill information updating unit can be used for storing the current bill image, the current bill feature vector and the bill identification result thereof to the historical bill library so as to update the historical bill library.
Fig. 6 schematically shows a block diagram of a bill identification management apparatus according to another embodiment of the present disclosure.
As shown in fig. 6, the bill identifying management apparatus 600 provided in the embodiment of the present disclosure may further include a history object information obtaining unit 610, a history object vector obtaining unit 620, a history object label determining unit 630, a machine prediction model training unit 640, a current object information obtaining unit 650, a current object vector extracting unit 660, and an object prediction result obtaining unit 670, which are different from the above-described embodiment.
The history object information obtaining unit 610 may be configured to obtain history object information corresponding to the history ticket image according to the history ticket feature vector.
The history object vector obtaining unit 620 may be configured to obtain a history object feature vector according to the history object information.
The history object label determining unit 630 may be configured to determine, according to the history ticket image and the label thereof, a label of the history object feature vector, where the label is not provided with the false billing behavior or is provided with the false billing behavior.
The machine prediction model training unit 640 may be configured to train a machine learning prediction model according to the historical object feature vectors and their labels.
The current object information acquisition unit 650 may be used to acquire current object information.
The current object vector extraction unit 660 may be configured to extract a current object feature vector of the current object according to the current object information.
The object prediction result obtaining unit 670 may be configured to process the current object feature vector through the machine learning prediction model to obtain an object prediction result of the current object, where the object prediction result is not associated with the fraud activity or is associated with the fraud activity.
With continued reference to fig. 6, the ticket recognition management apparatus 600 can further include an object reputation modification unit 680 that can be configured to modify the reputation score of the current object based on the ticket recognition result and the object prediction result.
For details which are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the bill identification management method of the present disclosure described above for the details which are not disclosed in the embodiments of the apparatus of the present disclosure.
Referring now to FIG. 7, shown is a block diagram of a computer system 800 suitable for use in implementing the electronic devices of embodiments of the present disclosure. The computer system 800 of the electronic device shown in fig. 7 is only an example, and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 807 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for system operation are also stored. The CPU801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted into the storage section 807 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the ticket recognition management method as described in the above embodiments.
For example, the electronic device may implement the following as shown in fig. 1: step S110, acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill; step S120, carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image; step S130, training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; step S140, acquiring a current bill image; s150, carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image; step S160, processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
As another example, the electronic device may implement the steps shown in fig. 2 to 4.
It should be noted that although in the above detailed description several units of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A bill identification management method is characterized by comprising the following steps:
acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill;
carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image;
training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image;
acquiring a current bill image;
carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image;
and processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
2. The method of claim 1, wherein performing optical character recognition on the historical ticket image to obtain a historical ticket feature vector of the historical ticket image comprises:
carrying out noise removal, gradient correction and binarization processing on the historical bill image to obtain a preprocessed image;
recognizing a character area in the preprocessed image, performing line division processing on the character area, and dividing each line of characters into single characters one by one;
extracting character feature vectors of each character one by one;
inputting the character feature vector of each character into a character classifier to obtain a character recognition result in the preprocessed image;
and analyzing the language context relationship of the character recognition result in the preprocessed image through a language model, correcting the character recognition result output by the character classifier, and obtaining the historical bill feature vector.
3. The method of claim 1, wherein performing optical character recognition on the historical ticket image to obtain a historical ticket feature vector of the historical ticket image comprises:
carrying out optical character recognition on the historical bill image to obtain the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image;
and processing the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image to generate the historical bill feature vector.
4. The method of claim 3, wherein processing the ticket type, ticket number, ticket code, billing date, vendor name, vendor taxpayer identification number, ticket amount, and ticket tax amount of the historical ticket image to generate the historical ticket feature vector comprises:
respectively carrying out quantitative representation on the bill type and the supplier name;
and discretizing the bill amount and the bill tax amount.
5. The method of claim 1, further comprising:
storing the historical bill image, the historical bill feature vector and the label thereof to a historical bill library in a block chain;
and storing the current bill image, the current bill feature vector and the bill identification result thereof to the historical bill library so as to update the historical bill library.
6. The method of claim 1, further comprising:
obtaining historical object information corresponding to the historical bill image according to the historical bill feature vector;
obtaining a historical object feature vector according to the historical object information;
determining the label of the characteristic vector of the historical object according to the historical bill image and the label thereof, wherein the label is not provided with a virtual bill opening behavior or is provided with a virtual bill opening behavior;
training a machine learning prediction model according to the historical object feature vector and the label thereof;
acquiring current object information;
extracting a current object feature vector of a current object according to the current object information;
and processing the current object feature vector through the machine learning prediction model to obtain an object prediction result of the current object, wherein the object prediction result is that the virtual billing behavior is not provided or the virtual billing behavior is provided.
7. The method of claim 6, further comprising:
and correcting the credit score of the current object according to the bill identification result and the object prediction result.
8. A bill identifying and managing apparatus, comprising:
the system comprises a historical bill information acquisition unit, a data processing unit and a data processing unit, wherein the historical bill information acquisition unit is used for acquiring a historical bill image and a label thereof, and the label is one of a real bill, a false bill and a virtual opening of the real bill;
the historical bill vector acquisition unit is used for carrying out optical character recognition on the historical bill image to acquire a historical bill feature vector of the historical bill image;
the machine classification model training unit is used for training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image;
the current bill image acquisition unit is used for acquiring a current bill image;
the current bill vector acquisition unit is used for carrying out optical character recognition on the current bill image to acquire a current bill feature vector of the current bill image;
and the bill identification result obtaining unit is used for processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
9. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a ticket identification management method according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the ticket recognition management method according to any one of claims 1 to 7.
CN201911343761.6A 2019-12-24 2019-12-24 Bill identification management method and device, storage medium and electronic equipment Pending CN111178219A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911343761.6A CN111178219A (en) 2019-12-24 2019-12-24 Bill identification management method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911343761.6A CN111178219A (en) 2019-12-24 2019-12-24 Bill identification management method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111178219A true CN111178219A (en) 2020-05-19

Family

ID=70652064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911343761.6A Pending CN111178219A (en) 2019-12-24 2019-12-24 Bill identification management method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111178219A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612964A (en) * 2020-05-21 2020-09-01 广东乐佳印刷有限公司 Bill certificate anti-counterfeiting detection method and device based on block chain
CN111625649A (en) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 Text processing method and device, electronic equipment and medium
CN111683202A (en) * 2020-06-01 2020-09-18 北京惠朗时代科技有限公司 Bill stamping method, device, equipment and storage medium
CN111753841A (en) * 2020-06-28 2020-10-09 中国银行股份有限公司 Bill identification method and device based on routing distribution
CN111768546A (en) * 2020-06-30 2020-10-13 新奥(中国)燃气投资有限公司 Method, device and system for automatically early warning abnormal enterprise invoices
CN112232305A (en) * 2020-11-19 2021-01-15 中国银联股份有限公司 Image detection method, image detection device, electronic device, and medium
CN112287828A (en) * 2020-10-29 2021-01-29 平安普惠企业管理有限公司 Financial statement generation method and device based on machine learning
CN112446346A (en) * 2020-12-10 2021-03-05 国网辽宁省电力有限公司丹东供电公司 Image data scanning processing method
CN112749639A (en) * 2020-12-29 2021-05-04 中电金信软件有限公司 Model training method and device, computer equipment and storage medium
CN112861841A (en) * 2021-03-22 2021-05-28 北京百度网讯科技有限公司 Bill confidence value model training method and device, electronic equipment and storage medium
CN113435439A (en) * 2021-06-30 2021-09-24 青岛海尔科技有限公司 Document auditing method and device, storage medium and electronic device
CN114049192A (en) * 2022-01-12 2022-02-15 广东企数标普科技有限公司 Invoice data processing method and device based on intelligent algorithm
CN114240407A (en) * 2021-11-17 2022-03-25 广东电网有限责任公司 Bill risk conduction quantitative evaluation system and method based on block chain
CN114358659A (en) * 2022-03-10 2022-04-15 广东粤海集团企业服务有限公司 Document verification information processing method and system
CN117114910A (en) * 2023-09-22 2023-11-24 浙江河马管家网络科技有限公司 Automatic ticket business accounting system and method based on machine learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780001A (en) * 2016-12-26 2017-05-31 税友软件集团股份有限公司 A kind of invoice writes out falsely enterprise supervision recognition methods and system
CN107122777A (en) * 2017-04-25 2017-09-01 云南省交通科学研究所 A kind of vehicle analysis system and analysis method based on video file
CN108876166A (en) * 2018-06-27 2018-11-23 平安科技(深圳)有限公司 Financial risk authentication processing method, device, computer equipment and storage medium
CN109255113A (en) * 2018-09-04 2019-01-22 郑州信大壹密科技有限公司 Intelligent critique system
CN109472918A (en) * 2018-10-12 2019-03-15 深圳壹账通智能科技有限公司 Invoice validation method, financing checking method, device, equipment and medium
CN109583978A (en) * 2018-11-30 2019-04-05 税友软件集团股份有限公司 The method, device and equipment of invoice enterprise is write out falsely in a kind of identification
CN109637000A (en) * 2018-10-23 2019-04-16 深圳壹账通智能科技有限公司 The invoice method of inspection and device, storage medium, electric terminal
CN110188714A (en) * 2019-06-04 2019-08-30 言图科技有限公司 A kind of method, system and storage medium for realizing financial management under chat scenario
CN110415119A (en) * 2019-07-30 2019-11-05 中国工商银行股份有限公司 Model training, bill business prediction technique, device, storage medium and equipment
CN110532542A (en) * 2019-07-15 2019-12-03 西安交通大学 It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780001A (en) * 2016-12-26 2017-05-31 税友软件集团股份有限公司 A kind of invoice writes out falsely enterprise supervision recognition methods and system
CN107122777A (en) * 2017-04-25 2017-09-01 云南省交通科学研究所 A kind of vehicle analysis system and analysis method based on video file
CN108876166A (en) * 2018-06-27 2018-11-23 平安科技(深圳)有限公司 Financial risk authentication processing method, device, computer equipment and storage medium
CN109255113A (en) * 2018-09-04 2019-01-22 郑州信大壹密科技有限公司 Intelligent critique system
CN109472918A (en) * 2018-10-12 2019-03-15 深圳壹账通智能科技有限公司 Invoice validation method, financing checking method, device, equipment and medium
CN109637000A (en) * 2018-10-23 2019-04-16 深圳壹账通智能科技有限公司 The invoice method of inspection and device, storage medium, electric terminal
CN109583978A (en) * 2018-11-30 2019-04-05 税友软件集团股份有限公司 The method, device and equipment of invoice enterprise is write out falsely in a kind of identification
CN110188714A (en) * 2019-06-04 2019-08-30 言图科技有限公司 A kind of method, system and storage medium for realizing financial management under chat scenario
CN110532542A (en) * 2019-07-15 2019-12-03 西安交通大学 It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example
CN110415119A (en) * 2019-07-30 2019-11-05 中国工商银行股份有限公司 Model training, bill business prediction technique, device, storage medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
审计署审计科研所课题组编: "《审计技术创新发展报告及案例选编 2013 下》", vol. 978, 中国铁道出版社, pages: 151 - 686 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612964A (en) * 2020-05-21 2020-09-01 广东乐佳印刷有限公司 Bill certificate anti-counterfeiting detection method and device based on block chain
CN111625649A (en) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 Text processing method and device, electronic equipment and medium
CN111683202A (en) * 2020-06-01 2020-09-18 北京惠朗时代科技有限公司 Bill stamping method, device, equipment and storage medium
CN111753841A (en) * 2020-06-28 2020-10-09 中国银行股份有限公司 Bill identification method and device based on routing distribution
CN111753841B (en) * 2020-06-28 2023-09-19 中国银行股份有限公司 Bill identification method and device based on route distribution
CN111768546A (en) * 2020-06-30 2020-10-13 新奥(中国)燃气投资有限公司 Method, device and system for automatically early warning abnormal enterprise invoices
CN112287828A (en) * 2020-10-29 2021-01-29 平安普惠企业管理有限公司 Financial statement generation method and device based on machine learning
CN112232305A (en) * 2020-11-19 2021-01-15 中国银联股份有限公司 Image detection method, image detection device, electronic device, and medium
CN112446346A (en) * 2020-12-10 2021-03-05 国网辽宁省电力有限公司丹东供电公司 Image data scanning processing method
CN112749639A (en) * 2020-12-29 2021-05-04 中电金信软件有限公司 Model training method and device, computer equipment and storage medium
CN112749639B (en) * 2020-12-29 2022-01-14 中电金信软件有限公司 Model training method and device, computer equipment and storage medium
CN112861841A (en) * 2021-03-22 2021-05-28 北京百度网讯科技有限公司 Bill confidence value model training method and device, electronic equipment and storage medium
CN112861841B (en) * 2021-03-22 2023-06-13 北京百度网讯科技有限公司 Training method and device for bill confidence value model, electronic equipment and storage medium
CN113435439A (en) * 2021-06-30 2021-09-24 青岛海尔科技有限公司 Document auditing method and device, storage medium and electronic device
CN113435439B (en) * 2021-06-30 2023-11-28 青岛海尔科技有限公司 Document auditing method and device, storage medium and electronic device
CN114240407A (en) * 2021-11-17 2022-03-25 广东电网有限责任公司 Bill risk conduction quantitative evaluation system and method based on block chain
CN114049192B (en) * 2022-01-12 2022-04-12 广东企数标普科技有限公司 Invoice data processing method and device based on intelligent algorithm
CN114049192A (en) * 2022-01-12 2022-02-15 广东企数标普科技有限公司 Invoice data processing method and device based on intelligent algorithm
CN114358659A (en) * 2022-03-10 2022-04-15 广东粤海集团企业服务有限公司 Document verification information processing method and system
CN114358659B (en) * 2022-03-10 2022-06-03 广东粤海集团企业服务有限公司 Document verification information processing method and system
CN117114910A (en) * 2023-09-22 2023-11-24 浙江河马管家网络科技有限公司 Automatic ticket business accounting system and method based on machine learning

Similar Documents

Publication Publication Date Title
CN111178219A (en) Bill identification management method and device, storage medium and electronic equipment
CN110365489B (en) Business auditing method, device and storage medium
CN107945024B (en) Method for identifying internet financial loan enterprise operation abnormity, terminal equipment and storage medium
US9646058B2 (en) Methods, systems, and computer program products for generating data quality indicators for relationships in a database
JP2022528839A (en) Personal information protection system
CA3063580A1 (en) Classifier training method and apparatus, electronic device and computer readable medium
CN108268593B (en) Method, device, server and storage medium for processing credit card insurance information
US20170262852A1 (en) Database monitoring system
WO2019243848A1 (en) Container tracking
CN117314424B (en) Block chain transaction system and method for big financial data
CA3103315A1 (en) System and process for electronic payments
CN115018513A (en) Data inspection method, device, equipment and storage medium
CN111145031B (en) Insurance business customization method, device and system
CN113159796A (en) Trade contract verification method and device
CN115564591A (en) Financing product determination method and related equipment
WO2020039173A1 (en) Transaction system and method
KR102416998B1 (en) Appatus for automatically collecting and classification tax related documents and method thereof
CN114331105A (en) Electronic draft processing system, method, electronic device and storage medium
Oliverio et al. A hybrid model for fraud detection on purchase orders
CN107993155A (en) Policy information processing method, device, server and storage medium
US11561963B1 (en) Method and system for using time-location transaction signatures to enrich user profiles
CN114880369A (en) Risk credit granting method and system based on weak data technology
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
CN114022166B (en) Information processing method, device, computer equipment and storage medium
Sury Digitization of Tax Administration in India

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200519