CN111178219A - Bill identification management method and device, storage medium and electronic equipment - Google Patents
Bill identification management method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN111178219A CN111178219A CN201911343761.6A CN201911343761A CN111178219A CN 111178219 A CN111178219 A CN 111178219A CN 201911343761 A CN201911343761 A CN 201911343761A CN 111178219 A CN111178219 A CN 111178219A
- Authority
- CN
- China
- Prior art keywords
- bill
- historical
- image
- current
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 46
- 239000013598 vector Substances 0.000 claims abstract description 129
- 238000010801 machine learning Methods 0.000 claims abstract description 39
- 238000013145 classification model Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000012015 optical character recognition Methods 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 53
- 230000006399 behavior Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012937 correction Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 238000012795 verification Methods 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 12
- 238000012790 confirmation Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 230000007246 mechanism Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000007477 logistic regression Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/125—Finance or payroll
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Technology Law (AREA)
- Bioinformatics & Computational Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the disclosure provides a bill identification management method, a bill identification management device, a bill identification management medium and electronic equipment, and belongs to the technical field of computers. The bill identification management method comprises the following steps: acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill; carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image; training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; acquiring a current bill image; carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image; and processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for bill identification management, a computer-readable storage medium, and an electronic device.
Background
Invoices are an important basis for personal reimbursement and accounting by financial staff. In order to avoid finding false invoices in the checking process of the tax authority, the enterprises are identified as tax evasion and punishment, and financial staff needs to check the authenticity of a large number of invoices.
In the related technology, the invoice authenticity check mainly adopts three modes: firstly, logging in a website for query, and the query can be realized only by manually inputting all invoice information into a webpage; secondly, calling 12366 to inquire, the invoice information needs to be manually input, the steps of the inquiry process are complicated, and the time for waiting for the prompt tone is long; and thirdly, the service hall of the tax bureau is used for field verification, and a great deal of time is spent in the process of going to and from the service hall and queuing.
All three above modes are more loaded down with trivial details, and the inspection efficiency is lower, needs financial staff to spend a large amount of time.
Therefore, a new bill identification management method, apparatus, medium, and electronic device are needed.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a method, an apparatus, a medium, and an electronic device for bill identification management, so as to overcome at least to some extent the problem of low efficiency in bill authenticity identification in the related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to one aspect of the present disclosure, there is provided a bill identification management method, including: acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill; carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image; training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; acquiring a current bill image; carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image; and processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
According to an aspect of the present disclosure, there is provided a bill identifying management apparatus including: the system comprises a historical bill information acquisition unit, a data processing unit and a data processing unit, wherein the historical bill information acquisition unit is used for acquiring a historical bill image and a label thereof, and the label is one of a real bill, a false bill and a virtual opening of the real bill; the historical bill vector acquisition unit is used for carrying out optical character recognition on the historical bill image to acquire a historical bill feature vector of the historical bill image; the machine classification model training unit is used for training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; the current bill image acquisition unit is used for acquiring a current bill image; the current bill vector acquisition unit is used for carrying out optical character recognition on the current bill image to acquire a current bill feature vector of the current bill image; and the bill identification result obtaining unit is used for processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
According to an aspect of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, the program, when executed by a processor, implementing a ticket identification management method according to any one of the embodiments described above.
According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the ticket identification management method of any of the above embodiments.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in the technical scheme provided by some embodiments of the present disclosure, a machine learning classification model for automatically identifying the bill authenticity can be trained based on a large amount of historical bill information, when a new current bill image is received, the current bill image is subjected to optical character recognition to extract the current bill feature vector of the current bill image, and the current bill feature vector is input into the machine learning classification model after the training is completed, so that the authenticity identification result of the current bill image can be obtained, the automation of bill authenticity identification is realized, and the efficiency and the accuracy of bill authenticity identification are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 schematically illustrates a flow diagram of a ticket identification management method according to one embodiment of the present disclosure;
FIG. 2 schematically shows a flow chart of one embodiment of step S120 in FIG. 1;
FIG. 3 schematically shows a flow chart of another embodiment of step S120 in FIG. 1;
FIG. 4 schematically illustrates a flow diagram of a ticket identification management method according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of a ticket recognition management device according to one embodiment of the present disclosure;
FIG. 6 schematically illustrates a block diagram of a ticket recognition management device according to one embodiment of the present disclosure;
FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware units or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 schematically shows a flowchart of a ticket recognition management method according to an embodiment of the present disclosure, and an execution subject of the ticket recognition management method may be a device having a calculation processing function, such as a server and/or a mobile terminal.
As shown in fig. 1, the method for identifying and managing a ticket provided by the embodiment of the present disclosure may include the following steps.
In step S110, a history ticket image and a label thereof, the label being one of a real ticket, a fake ticket and a virtual opening of a real ticket, are acquired.
The note mentioned in the embodiments of the present disclosure may be any type of note such as an invoice, a receipt, a check, etc., and the present disclosure does not limit this. In the following embodiments, the bill is mainly used as an invoice for illustration.
The false invoice comprises two cases, namely a false invoice, namely the invoice is false and belongs to a false invoice which is forged and not subject to the supervision of a tax authority; the other is true invoice is false invoice, namely the invoice is true, but the reflected economic business is false or not in accordance with reality, the fact of false economic business, or although the invoice is true, the reflected economic business is true, the invoicer does not buy from the tax authority, but buys from other units or individuals.
For example, the labels of the historical invoice images can be classified into three categories according to the two cases of invoice counterfeiting: a true invoice (which may be labeled as "0", for example), a false invoice (which may be labeled as "1", for example), and a false invoice (which may be labeled as "2", for example).
In an exemplary embodiment, the method may further include: and storing the historical bill images, the historical bill feature vectors and the labels thereof to a historical bill library in a block chain. The historical bill images, the historical bill feature vectors and the labels thereof can be stored into the block chain as part of the historical bill information.
The Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The consensus mechanism is a mathematical algorithm for establishing trust and obtaining rights and interests among different nodes in the blockchain system.
A blockchain is essentially a decentralized database. The block chain is a series of data blocks which are associated by using a cryptographic method, and each data block contains information of one bitcoin network transaction, so that the validity (anti-counterfeiting) of the information is verified and the next block is generated.
In a narrow sense, the blockchain is a distributed account book which is a chain data structure formed by combining data blocks in a sequential connection mode according to a time sequence and is guaranteed in a cryptographic mode and cannot be tampered and forged.
Broadly, the blockchain technique is a completely new distributed infrastructure and computing approach that utilizes blockchain data structures to verify and store data, utilizes distributed node consensus algorithms to generate and update data, utilizes cryptography to secure data transmission and access, and utilizes intelligent contracts composed of automated script code to program and manipulate data.
Generally, a blockchain system consists of a data layer, a network layer, a consensus layer, a stimulus layer, a contract layer, and an application layer. The data layer encapsulates a bottom layer data block, basic data such as related data encryption and time stamp and a basic algorithm; the network layer comprises a distributed networking mechanism, a data transmission mechanism, a data verification mechanism and the like; the consensus layer mainly encapsulates various consensus algorithms of the network nodes; the incentive layer integrates economic factors into a block chain technology system, and mainly comprises an economic incentive issuing mechanism, an economic incentive distributing mechanism and the like; the contract layer mainly encapsulates various scripts, algorithms and intelligent contracts and is the basis of the programmable characteristic of the block chain; the application layer encapsulates various application scenarios and cases of the blockchain. In the model, a chained block structure based on a timestamp, a consensus mechanism of distributed nodes, economic excitation based on consensus computing power and a flexible programmable intelligent contract are the most representative innovation points of the block chain technology.
In the embodiment of the present disclosure, the method may further include a step of constructing the blockchain node and the blockchain network, which is responsible for construction, update and maintenance of the blockchain node and the blockchain network. For example, one or more groups/companies participate in the enterprise provider transaction invoice verification information sharing service and management transaction block chain network construction with the base business organization of each company as a minimum node.
In the embodiment of the present disclosure, the method may further include defining an information storage and information authentication data format in advance, that is, storing and authenticating the shared information according to the data structure manner, the information storage manner, and the protocol defined in the embodiment of the present disclosure, so as to ensure high efficiency of information storage and information processing.
In the embodiment of the disclosure, an enterprise or an individual registered in the system uploads enterprise provider transaction invoice verification and confirmation information sharing services and management cases, electronic certificate issuer information (e.g., full name, account number, bank of deposit), electronic certificate receiver information (e.g., full name, account number, bank of deposit), certificate number information and the like, and receipt verification and confirmation information sharing services and management update information to a block chain, so that it can be proved that related materials such as audio, video, images and the like of the related materials can also be uploaded to the block chain, and thus, the information stored in the block chain has the characteristics of privacy protection (e.g., technical means such as authority management, picture or video watermarking, encryption and the like), transparency in disclosure, traceability, uneasiness in tampering and the like.
In the related technology, a plurality of companies or enterprises and public institutions store the photographed images of the electronic invoices or the paper invoices by using the database to serve as backups of the paper invoices, but the centralized storage mode is easy to attack, the data storage structure is simple, the data storage structure is easy to tamper, information leakage is easy, and invoice information is easy to tamper. The bill identification management method based on the block chain can effectively realize bill authenticity identification and bill management in the block chain network. The method can utilize a transaction chain data structure of a block chain hash pointer and a mechanism of Hash calculation of cryptography and digital signature of cryptography to realize multi-level evidence confirmation in the transaction process, thereby realizing the trust problem among different individual transaction parties. Meanwhile, the bill information is stored by using the block chain, and the method has the characteristics of privacy protection, traceability, tamper resistance and the like.
In step S120, Optical Character Recognition (OCR) is performed on the historical ticket image, and a historical ticket feature vector of the historical ticket image is acquired.
Specific implementation processes can be referred to the following description of the embodiments of fig. 2 and 3.
In step S130, a machine learning classification model is trained according to the historical ticket feature vector and the label corresponding to the historical ticket image.
In an exemplary embodiment, the machine-learned classification model may include a logistic regression classification model, but the present disclosure is not so limited and any other suitable machine-learned model may also be employed.
For example, for historical electronic invoices of enterprise suppliers stored in a block chain, 8 characteristics such as invoice type x1, invoice number x2, invoice code x3, invoicing date x4, supplier name x5, supplier taxpayer identification x6, invoice amount x7, invoice tax amount x8 and the like are extracted according to data identified by an OCR image. Wherein, for the characteristics of non-quantitative description, such as invoice types, the labeling operation is carried out according to the difference of the categories. For continuously changing quantitative characteristics, such as invoice amount, firstly discretization is carried out, namely, the quantitative characteristics are divided into different grades or levels according to the numerical value from small to large, and then labeling is carried out. After preprocessing each feature value, a history bill feature vector X is formed [ X1, X2, X3, X4, X5, X6, X7, and X8 ]. Assuming that n historical electronic invoices are stored in the block chain, forming a historical bill feature matrix M of historical bill feature vectors of the n historical electronic invoices [ X1; x2; …, respectively; xn ], wherein n is a positive integer greater than or equal to 1. For the historical bill feature vector of each electronic invoice, if the corresponding electronic invoice is a true invoice, the corresponding electronic invoice is marked as 0, if the corresponding electronic invoice is a false invoice, the corresponding electronic invoice is marked as 1, and if the corresponding electronic invoice is a false invoice, the corresponding electronic invoice is marked as 2. Thus forming an object vector label corresponding to the historical bill feature matrix. The logistic regression classification model is selected as a machine learning classification model, the historical bill feature matrix M is used as the input of the logistic regression classification model, and the model is trained and the model parameters are learned according to the corresponding target vector label.
In step S140, a current ticket image is acquired.
For example, the current ticket image may be a scanned image of the invoice currently to be identified.
In step S150, performing optical character recognition on the current bill image, and acquiring a current bill feature vector of the current bill image.
Specific implementation processes can be referred to the following description of the embodiments of fig. 2 and 3.
In step S160, the current bill feature vector is processed through the machine learning classification model, and a bill identification result of the current bill image is obtained, where the bill identification result is one of a real bill, a false bill, and a virtual opening of a real bill.
In an exemplary embodiment, processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image may include: inputting the current bill feature vector into a trained machine learning classification model, and predicting the probability that the current bill image is true, false and true bill is false; and taking the result corresponding to the maximum value of the probability that the current bill image is true, false and true bill is false as the bill identification result of the current bill image.
For example, if the probability that the current invoice is true is 0.5, the probability that the current invoice is false is 0.3, and the probability that the current invoice is true invoice is 0.2 in the prediction result, the output bill identification result of the current invoice is a true invoice.
In the embodiment of the present disclosure, a preset threshold for determining whether the current invoice is a true invoice may also be set, and only when the probability of predicting to be true is greater than the preset threshold, the current invoice is determined to be a true invoice. The size of the preset threshold value can be set according to actual requirements. Generally, if the application has a higher requirement on safety, the preset threshold may be set higher, for example, 0.9; if the application has a low requirement for safety, the preset threshold may be set to be lower, for example, 0.5, which is not limited by the present disclosure.
In an exemplary embodiment, the method may further include: and correcting the credit score of the current object corresponding to the current bill image according to the bill identification result.
In the embodiment of the present disclosure, the current object may be any one or more of an operator who uploads the current ticket information, an issuing organization or an individual who issues the current ticket, a purchasing organization or an individual who purchases a commodity corresponding to the current ticket, and the like.
And if the bill identification result is that the current bill corresponding to the current bill image is true, the credit score of the current object can be improved. And if the bill identification result indicates that the current bill is false or the current bill is false, the credit score of the current object can be reduced.
Further, the method may further include: and counting the times and/or the total amount of false opening and false opening of the bill of each object, and if the counted times and/or the total amount exceed a preset threshold value, sending out early warning information, and pulling the corresponding object into a blacklist.
In an exemplary embodiment, the method may further include: and storing the current bill image, the current bill feature vector and the bill identification result thereof to the historical bill library so as to update the historical bill library. Namely, the current bill image, the current bill feature vector and the bill identification result are used as new historical bill information, and the bill identification result is used as a label of the new historical bill feature vector.
In an exemplary embodiment, the method may further include: and storing the machine learning classification model and the trained model parameters thereof into the block chain. When the historical bill library is updated, namely new historical bill information is stored, the machine learning classification model can be continuously trained according to the historical bill feature vector and the label of the new historical bill image, so that the model parameters of the machine learning classification model can be updated, and the updated model parameters of the machine learning classification model are stored in the block chain. Therefore, the classification accuracy of the machine learning classification model can be continuously improved along with the continuous accumulation of data quantity in the historical bill library.
The bill identification management method provided by the embodiment of the disclosure can train a machine learning classification model for automatically identifying the authenticity of bills based on a large amount of historical bill information, and when a new current bill image is received, the optical character identification is carried out on the current bill image, the current bill feature vector of the current bill image is extracted, and the current bill feature vector is input into the machine learning classification model after the training is completed, so that the authenticity identification result of the current bill image can be obtained, the automation of bill authenticity identification is realized, and the efficiency and the accuracy of bill authenticity identification are improved.
The embodiment of the disclosure provides a system for effectively realizing enterprise supplier transaction invoice verification and confirmation information sharing service and management in a blockchain network, wherein specific transaction information is shown in the following table 1:
TABLE 1
According to the technical scheme provided by the embodiment of the disclosure, on one hand, historical bill information is stored by using a block chain technology, a decentralized storage mode can be realized, the characteristics of privacy protection, traceability, tamper resistance and the like are achieved, and the safety and reliability of stored data are ensured, so that the information leakage of the bill data in the bill authenticity identification process can be prevented, and the safety of bill data storage is improved; on the other hand, a machine learning classification model (refer to the following) for automatically identifying the bill authenticity can be trained based on a large amount of historical bill information stored in the block chain, when a new block of the current bill image is generated in the block chain, the current bill feature vector of the current bill image is extracted, and the current bill feature vector is input into the machine learning classification model after training, so that the bill identification result of the current bill image can be obtained, the automation of bill authenticity identification is realized, and the efficiency and the accuracy of bill authenticity identification are improved.
Fig. 2 schematically shows a flow chart of an embodiment of step S120 in fig. 1.
As shown in fig. 2, in the embodiment of the present disclosure, the step S120 may further include the following steps.
In step S121, noise removal, inclination correction, and binarization processing are performed on the history bill image to obtain a preprocessed image.
Image noise refers to unnecessary or unnecessary interference information present in the image data. Different noise elimination modes can be adopted according to different types of image noise. Thus, it is possible here to first identify the image noise type and then select the corresponding noise removal mode. For example, if the noise is gaussian noise, gaussian filtering may be used to remove the gaussian noise. If the noise is impulse noise, a nonlinear filter can be used to process the impulse noise.
When the bill is optically scanned, the scanned image position is not correct due to objective reasons, and the later image processing is influenced, so that the image can be corrected. The image tilt direction and tilt angle are automatically detected from the image features, and for example, the tilt angle may be detected by a projection-based method, by a Hough transform, by a linear fit, or by performing a fourier transform to a frequency domain. Then, the inclination of the image is corrected based on the image inclination direction and the inclination angle.
And then, carrying out image binarization, and setting the gray value of a pixel point on the image to be 0 or 255, namely, the whole image presents an obvious black-and-white effect.
In step S122, the text area in the preprocessed image is identified, the text area is processed in lines, and each line of text is divided into single text one by one.
And performing layout analysis and character cutting on the preprocessed image, identifying character areas in the layout, performing line division processing, and dividing each line of characters into single characters one by one.
In step S123, the character feature vector of each character is extracted one by one.
And then, extracting character features, and extracting the standardized features of each character one by one to be used as character feature vectors for subsequent character recognition.
In step S124, the text feature vector of each text is input into a text classifier, and a text recognition result in the preprocessed image is obtained.
And then, character recognition is carried out, the character feature vector extracted from the current character is input into a character classifier which is trained in advance, and the recognized character is output. The character classifier can adopt any machine learning model, neural network model or deep learning model, the character feature vector of the sample character in the training data set is input to the character classifier, the predicted character is output, the loss function of the character classifier is calculated according to the sample character and the predicted character, and the loss function is returned in a gradient mode to optimize so as to obtain the trained character classifier.
In some embodiments, the trained word classifier and its model parameters may be stored into a blockchain.
In step S125, the language context of the character recognition result in the preprocessed image is analyzed by a language model, and the character recognition result output by the character classifier is corrected to obtain the historical bill feature vector.
The word classifier is used for recognizing a single word without considering the language context relationship between words, so that the word recognition result is possibly unreasonable. In the embodiment of the present disclosure, a Recursive Neural Network Language Model (RNNLM) may be used to perform post-processing on the text recognition result of the text classifier, so as to correct the text recognition result output by the text classifier and generate a final historical bill feature vector. RNNLM trains a language model through RNN and a variant network thereof, the task is to predict the next word through the above, the RNN breaks the limitation of a context window, all context information is summarized by using the state of a hidden layer, longer dependence can be captured, and meanwhile, the RNNLM has few ultra-parameters and stronger universality.
In an exemplary embodiment, the step S150 may further include the steps of: carrying out noise removal, gradient correction and binarization processing on the current bill image to obtain a preprocessed image; recognizing a character area in the preprocessed image, performing line division processing on the character area, and dividing each line of characters into single characters one by one; extracting character feature vectors of each character one by one; inputting the character feature vector of each character into a character classifier to obtain a character recognition result in the preprocessed image; and analyzing the language context relationship of the character recognition result in the preprocessed image through a language model, correcting the character recognition result output by the character classifier, and obtaining the current bill feature vector. The specific implementation process can refer to the generation process of the historical bill feature vector.
Fig. 3 schematically shows a flow chart of another embodiment of step S120 in fig. 1.
As shown in fig. 3, step S120 in the embodiment of fig. 1 may further include the following steps in the embodiment of the present disclosure.
In step S126, the optical character recognition is performed on the historical bill image, and the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image are obtained.
In step S127, the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount, and the bill tax amount of the history bill image are processed to generate the history bill feature vector.
In an exemplary embodiment, processing the ticket type, the ticket number, the ticket code, the billing date, the supplier name, the supplier taxpayer identification number, the ticket amount, and the ticket tax amount of the historical ticket image to generate the historical ticket feature vector may include: respectively carrying out quantitative representation on the bill type and the supplier name; and discretizing the bill amount and the bill tax amount.
In an exemplary embodiment, the method may further include: identifying the current bill image based on an OCR technology to obtain the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the current bill image; and processing the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the current bill image to generate the current bill feature vector.
Taking the current bill as the electronic invoice and the current bill image includes the electronic invoice image as an example, the electronic invoice image may be recognized based on OCR. And acquiring the verification confirmation information of the business supplier transaction invoice from the blockchain. The method comprises the steps of automatically slicing, cutting, determining types, positioning areas, preprocessing and the like on an electronic invoice image, then identifying all columns, and deriving information such as the type, number, code, date, name of a purchaser, identification number of a taxpayer of the purchaser, name of a seller, identification number of a taxpayer of the seller, amount of money, tax amount, tax adding amount and the like of the invoice.
In an exemplary embodiment, processing the ticket type, the ticket number, the ticket code, the billing date, the supplier name, the supplier taxpayer identification number, the ticket amount, and the ticket tax amount of the current ticket image to generate the current ticket feature vector may include: respectively carrying out quantitative representation on the bill type and the supplier name; and discretizing the bill amount and the bill tax amount.
It should be noted that, in different application scenarios, if the types of the tickets are different, the extracted historical ticket feature vector and the current ticket feature vector may be changed accordingly, which is not limited in this disclosure.
Taking an electronic invoice of an enterprise supplier as an example, according to data identified by the OCR image, 8 characteristics such as an invoice type x1, an invoice number x2, an invoice code x3, an invoicing date x4, a supplier name x5, a supplier taxpayer identification number x6, an invoice amount x7, an invoice tax amount x8 and the like are extracted.
Wherein, for the characteristics which are not quantitatively described, such as the invoice types, the labeling operation can be carried out on the characteristics according to different types. For example, if the invoice types include value-added tax-specific invoices, general invoices, motor vehicle-specific invoices, machine-run invoices, quota invoices, cut-out invoices, and the like, they may be set to have tag values of 1, 2, 3, 4, 5, and 6, respectively. A similar process can be done for vendor names.
The method comprises the steps of firstly discretizing continuously-changing quantitative characteristics, such as invoice amount, namely dividing the continuously-changing quantitative characteristics into different grades or levels according to numerical values from small to large, and then labeling. For example, the invoice amount is divided into a first grade between 0-100 yuan, the invoice amount is divided into a second grade between 100-1000 yuan, the invoice amount is divided into a third grade between 1000-2000 yuan, … and so on, wherein the first to m-th grades correspond to label values 1 to m, respectively.
After each feature value is preprocessed, a feature vector X of the electronic invoice is formed [ X1, X2, X3, X4, X5, X6, X7, and X8 ].
It should be noted that, in other embodiments, the feature vector of the electronic invoice does not necessarily need all the 8 features described above, and some of the features may be selected, which is limited by the present disclosure.
FIG. 4 schematically shows a flow diagram of a ticket recognition management method according to another embodiment of the present disclosure. In this embodiment, the bill is taken as an invoice, and the machine learning prediction model is taken as a logistic regression prediction model for example.
As shown in fig. 4, the difference from the above embodiment is that the bill identifying and managing method provided by the embodiment of the present disclosure may further include the following steps.
In step S410, obtaining historical object information corresponding to the historical bill image according to the historical bill feature vector.
For example, the historical bill information stored in the block chain is respectively inquired about taxpayers and enterprise related information corresponding to each collected invoice, wherein the information includes the name of a billing unit, the industry, enterprise legal persons, registration addresses, financial staff information, tax handling staff information, entry and sale items, the change condition of the amount of invoices issued in a period of time, whether tax clerks are simultaneously used as tax clerks of other enterprises in the same industry and the like.
In step S420, a history object feature vector is obtained according to the history object information.
For example, the information is preprocessed (classified and labeled) to form a history object feature vector, and history object feature vectors of taxpayers corresponding to all invoices form a history object feature matrix.
In step S430, determining a label of the feature vector of the historical object according to the historical ticket image and the label thereof, where the label is not provided with a false ticket issuing behavior or is provided with a false ticket issuing behavior.
For example, according to the historical bill information, it is counted whether each taxpayer and related enterprises have the behavior of false invoicing, that is, the number of all invoices related to the taxpayer which are identified as true invoices is counted as N, if N is greater than 0, the label of the historical object feature vector corresponding to the taxpayer is set as 1, otherwise, the label is set as 0. And forming a target vector by the label value sets corresponding to all the taxpayers.
In step S440, a machine learning prediction model is trained according to the historical object feature vectors and their labels.
And training a logistic regression prediction model according to the historical object feature matrix and the target vector of the taxpayer.
In step S450, current object information is acquired.
In step S460, a current object feature vector of the current object is extracted according to the current object information.
In step S470, the current object feature vector is processed by the machine learning prediction model, and an object prediction result of the current object is obtained, where the object prediction result is that the virtual billing behavior is not present or is present.
For example, for a taxpayer to be identified, inputting the current object feature vector of the taxpayer to a trained logistic regression prediction model, outputting a probability prediction value of the taxpayer, which may have a false invoicing behavior, and obtaining an object prediction result according to the probability prediction value, for example, if the probability prediction value exceeds a set threshold, the taxpayer has the false invoicing behavior; if the probability predicted value is smaller than the set threshold value, the taxpayer does not have a false invoicing behavior.
The method provided by the embodiment of the disclosure can not only identify the authenticity of the bill, but also further identify whether the taxpayer has the behavior of making a false invoice according to the authenticity identification result of the bill.
In an exemplary embodiment, the method may further include: and correcting the credit score of the current object according to the bill identification result and the object prediction result.
For example, if a taxpayer submits a false invoice or a false invoice, the credit score of the taxpayer is lowered, and the updated credit score is stored in the blockchain.
In an exemplary embodiment, the method may further include: and storing the historical object information, the historical object feature vector, the label of the historical object feature vector and the model parameter of the trained machine learning prediction model into a historical object library in the block chain.
In an exemplary embodiment, the method may further include: and storing the current object information, the current object characteristic vector and the object prediction result of the current object characteristic vector into a historical object library, and updating the historical object library.
In an exemplary embodiment, the method may further include: the reputation score of the current object is stored in a historical object repository.
For the content not described in detail in the embodiments of the present disclosure, reference may be made to the other embodiments described above, which are not described again.
In the embodiment of the present disclosure, the method may further include: the timeliness, effectiveness and accuracy of business provider transaction issuing and receipt verification confirmation information sharing service and management are evaluated, and system parameters are continuously adjusted and optimized based on an image and character automatic identification technology, a receipt frame and a verification method of most block chain nodes participated in by character cross random combination authentication.
The bill identification management method provided by the embodiment of the disclosure provides a verification method of participation of most blockchain nodes based on an image character automatic identification technology, a receipt frame and character cross random combination authentication according to the characteristics of enterprise supplier transaction issuing and receipt verification confirmation information sharing service stored in a blockchain, privacy protection, public transparency, traceability, uneasiness in tampering and the like of management information, based on the historical data of the business supplier transaction invoice issuing and receipt verification confirmation information sharing service and management information in the block chain, the system automatically analyzes and dynamically identifies the possible problems of the invoice receipt in real time, and updates the corresponding invoice receipt status and the credit scores of the relevant operators and relevant institutions, thereby effectively promoting the application of blockchain technology in the aspects of business provider transaction issuing and receipt verification confirmation information sharing service and management. With the wide application of the block chain technology in multiple fields of transaction issuing of enterprise suppliers, receipt verification and confirmation information sharing service and management, medical treatment, endowment, insurance, finance, logistics and the like, the invention inevitably brings considerable economic and social benefits.
The following describes embodiments of the disclosed apparatus, which can be used to implement the above-mentioned bill identification management method of the present disclosure.
Fig. 5 schematically shows a block diagram of a ticket recognition management apparatus according to one embodiment of the present disclosure.
As shown in fig. 5, the bill identifying management apparatus 500 provided by the embodiment of the present disclosure may include a history bill information acquiring unit 510, a history bill vector acquiring unit 520, a machine classification model training unit 530, a current bill image acquiring unit 540, a current bill vector acquiring unit 550, and a bill identification result acquiring unit 560.
The historical ticket information acquiring unit 510 may be configured to acquire a historical ticket image and a label thereof, where the label is one of a real ticket, a fake ticket, and a virtual opening of a real ticket.
The historical bill vector acquiring unit 520 may be configured to perform optical character recognition on the historical bill image, and acquire a historical bill feature vector of the historical bill image.
The machine classification model training unit 530 may be configured to train a machine learning classification model according to the historical ticket feature vectors and the labels corresponding to the historical ticket images.
The current ticket image acquiring unit 540 may be used to acquire a current ticket image.
The current bill vector acquiring unit 550 may be configured to perform optical character recognition on the current bill image to acquire a current bill feature vector of the current bill image.
The bill identification result obtaining unit 560 may be configured to process the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, where the bill identification result is one of a real bill, a fake bill, and a virtual opening of a real bill.
In an exemplary embodiment, the history ticket vector acquiring unit 520 may include: the image preprocessing unit can be used for carrying out noise removal, gradient correction and binarization processing on the historical bill image to obtain a preprocessed image; the layout analysis and character cutting unit can be used for identifying the character area in the preprocessed image, performing line division processing on the character area, and dividing each line of characters into single characters one by one; the character feature extraction unit can be used for extracting the character feature vector of each character one by one; the character recognition unit can be used for inputting the character feature vector of each character into the character classifier to obtain a character recognition result in the preprocessed image; and the post-processing correction unit can be used for analyzing the language context relationship of the character recognition result in the preprocessed image through a language model, correcting the character recognition result output by the character classifier and obtaining the historical bill feature vector.
In an exemplary embodiment, the history ticket vector acquiring unit 520 may include: the optical character recognition unit can be used for carrying out optical character recognition on the historical bill image to obtain the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image; and the historical bill vector generating unit can be used for processing the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image to generate the historical bill feature vector.
In an exemplary embodiment, the history ticket vector generating unit may include: the quantification unit can be used for respectively carrying out quantification representation on the bill type and the supplier name; and the discrete unit can be used for carrying out discretization representation on the bill amount and the bill tax amount.
In an exemplary embodiment, the ticket recognition management apparatus 500 may further include: the historical bill information storage unit can be used for storing the historical bill images, the historical bill feature vectors and the labels thereof to a historical bill library in a block chain; and the historical bill information updating unit can be used for storing the current bill image, the current bill feature vector and the bill identification result thereof to the historical bill library so as to update the historical bill library.
Fig. 6 schematically shows a block diagram of a bill identification management apparatus according to another embodiment of the present disclosure.
As shown in fig. 6, the bill identifying management apparatus 600 provided in the embodiment of the present disclosure may further include a history object information obtaining unit 610, a history object vector obtaining unit 620, a history object label determining unit 630, a machine prediction model training unit 640, a current object information obtaining unit 650, a current object vector extracting unit 660, and an object prediction result obtaining unit 670, which are different from the above-described embodiment.
The history object information obtaining unit 610 may be configured to obtain history object information corresponding to the history ticket image according to the history ticket feature vector.
The history object vector obtaining unit 620 may be configured to obtain a history object feature vector according to the history object information.
The history object label determining unit 630 may be configured to determine, according to the history ticket image and the label thereof, a label of the history object feature vector, where the label is not provided with the false billing behavior or is provided with the false billing behavior.
The machine prediction model training unit 640 may be configured to train a machine learning prediction model according to the historical object feature vectors and their labels.
The current object information acquisition unit 650 may be used to acquire current object information.
The current object vector extraction unit 660 may be configured to extract a current object feature vector of the current object according to the current object information.
The object prediction result obtaining unit 670 may be configured to process the current object feature vector through the machine learning prediction model to obtain an object prediction result of the current object, where the object prediction result is not associated with the fraud activity or is associated with the fraud activity.
With continued reference to fig. 6, the ticket recognition management apparatus 600 can further include an object reputation modification unit 680 that can be configured to modify the reputation score of the current object based on the ticket recognition result and the object prediction result.
For details which are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the bill identification management method of the present disclosure described above for the details which are not disclosed in the embodiments of the apparatus of the present disclosure.
Referring now to FIG. 7, shown is a block diagram of a computer system 800 suitable for use in implementing the electronic devices of embodiments of the present disclosure. The computer system 800 of the electronic device shown in fig. 7 is only an example, and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 807 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for system operation are also stored. The CPU801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted into the storage section 807 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the ticket recognition management method as described in the above embodiments.
For example, the electronic device may implement the following as shown in fig. 1: step S110, acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill; step S120, carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image; step S130, training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image; step S140, acquiring a current bill image; s150, carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image; step S160, processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
As another example, the electronic device may implement the steps shown in fig. 2 to 4.
It should be noted that although in the above detailed description several units of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (10)
1. A bill identification management method is characterized by comprising the following steps:
acquiring a historical bill image and a label thereof, wherein the label is one of a real bill, a fake bill and a virtual opening of the real bill;
carrying out optical character recognition on the historical bill image to obtain a historical bill feature vector of the historical bill image;
training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image;
acquiring a current bill image;
carrying out optical character recognition on the current bill image to obtain a current bill feature vector of the current bill image;
and processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
2. The method of claim 1, wherein performing optical character recognition on the historical ticket image to obtain a historical ticket feature vector of the historical ticket image comprises:
carrying out noise removal, gradient correction and binarization processing on the historical bill image to obtain a preprocessed image;
recognizing a character area in the preprocessed image, performing line division processing on the character area, and dividing each line of characters into single characters one by one;
extracting character feature vectors of each character one by one;
inputting the character feature vector of each character into a character classifier to obtain a character recognition result in the preprocessed image;
and analyzing the language context relationship of the character recognition result in the preprocessed image through a language model, correcting the character recognition result output by the character classifier, and obtaining the historical bill feature vector.
3. The method of claim 1, wherein performing optical character recognition on the historical ticket image to obtain a historical ticket feature vector of the historical ticket image comprises:
carrying out optical character recognition on the historical bill image to obtain the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image;
and processing the bill type, the bill number, the bill code, the billing date, the supplier name, the supplier taxpayer identification number, the bill amount and the bill tax amount of the historical bill image to generate the historical bill feature vector.
4. The method of claim 3, wherein processing the ticket type, ticket number, ticket code, billing date, vendor name, vendor taxpayer identification number, ticket amount, and ticket tax amount of the historical ticket image to generate the historical ticket feature vector comprises:
respectively carrying out quantitative representation on the bill type and the supplier name;
and discretizing the bill amount and the bill tax amount.
5. The method of claim 1, further comprising:
storing the historical bill image, the historical bill feature vector and the label thereof to a historical bill library in a block chain;
and storing the current bill image, the current bill feature vector and the bill identification result thereof to the historical bill library so as to update the historical bill library.
6. The method of claim 1, further comprising:
obtaining historical object information corresponding to the historical bill image according to the historical bill feature vector;
obtaining a historical object feature vector according to the historical object information;
determining the label of the characteristic vector of the historical object according to the historical bill image and the label thereof, wherein the label is not provided with a virtual bill opening behavior or is provided with a virtual bill opening behavior;
training a machine learning prediction model according to the historical object feature vector and the label thereof;
acquiring current object information;
extracting a current object feature vector of a current object according to the current object information;
and processing the current object feature vector through the machine learning prediction model to obtain an object prediction result of the current object, wherein the object prediction result is that the virtual billing behavior is not provided or the virtual billing behavior is provided.
7. The method of claim 6, further comprising:
and correcting the credit score of the current object according to the bill identification result and the object prediction result.
8. A bill identifying and managing apparatus, comprising:
the system comprises a historical bill information acquisition unit, a data processing unit and a data processing unit, wherein the historical bill information acquisition unit is used for acquiring a historical bill image and a label thereof, and the label is one of a real bill, a false bill and a virtual opening of the real bill;
the historical bill vector acquisition unit is used for carrying out optical character recognition on the historical bill image to acquire a historical bill feature vector of the historical bill image;
the machine classification model training unit is used for training a machine learning classification model according to the historical bill feature vector and the label corresponding to the historical bill image;
the current bill image acquisition unit is used for acquiring a current bill image;
the current bill vector acquisition unit is used for carrying out optical character recognition on the current bill image to acquire a current bill feature vector of the current bill image;
and the bill identification result obtaining unit is used for processing the current bill feature vector through the machine learning classification model to obtain a bill identification result of the current bill image, wherein the bill identification result is one of a real bill, a false bill and a virtual opening of the real bill.
9. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a ticket identification management method according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the ticket recognition management method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911343761.6A CN111178219A (en) | 2019-12-24 | 2019-12-24 | Bill identification management method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911343761.6A CN111178219A (en) | 2019-12-24 | 2019-12-24 | Bill identification management method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111178219A true CN111178219A (en) | 2020-05-19 |
Family
ID=70652064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911343761.6A Pending CN111178219A (en) | 2019-12-24 | 2019-12-24 | Bill identification management method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111178219A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612964A (en) * | 2020-05-21 | 2020-09-01 | 广东乐佳印刷有限公司 | Bill certificate anti-counterfeiting detection method and device based on block chain |
CN111625649A (en) * | 2020-05-28 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Text processing method and device, electronic equipment and medium |
CN111683202A (en) * | 2020-06-01 | 2020-09-18 | 北京惠朗时代科技有限公司 | Bill stamping method, device, equipment and storage medium |
CN111753841A (en) * | 2020-06-28 | 2020-10-09 | 中国银行股份有限公司 | Bill identification method and device based on routing distribution |
CN111768546A (en) * | 2020-06-30 | 2020-10-13 | 新奥(中国)燃气投资有限公司 | Method, device and system for automatically early warning abnormal enterprise invoices |
CN112232305A (en) * | 2020-11-19 | 2021-01-15 | 中国银联股份有限公司 | Image detection method, image detection device, electronic device, and medium |
CN112287828A (en) * | 2020-10-29 | 2021-01-29 | 平安普惠企业管理有限公司 | Financial statement generation method and device based on machine learning |
CN112446346A (en) * | 2020-12-10 | 2021-03-05 | 国网辽宁省电力有限公司丹东供电公司 | Image data scanning processing method |
CN112749639A (en) * | 2020-12-29 | 2021-05-04 | 中电金信软件有限公司 | Model training method and device, computer equipment and storage medium |
CN112861841A (en) * | 2021-03-22 | 2021-05-28 | 北京百度网讯科技有限公司 | Bill confidence value model training method and device, electronic equipment and storage medium |
CN113435439A (en) * | 2021-06-30 | 2021-09-24 | 青岛海尔科技有限公司 | Document auditing method and device, storage medium and electronic device |
CN114049192A (en) * | 2022-01-12 | 2022-02-15 | 广东企数标普科技有限公司 | Invoice data processing method and device based on intelligent algorithm |
CN114240407A (en) * | 2021-11-17 | 2022-03-25 | 广东电网有限责任公司 | Bill risk conduction quantitative evaluation system and method based on block chain |
CN114358659A (en) * | 2022-03-10 | 2022-04-15 | 广东粤海集团企业服务有限公司 | Document verification information processing method and system |
CN117114910A (en) * | 2023-09-22 | 2023-11-24 | 浙江河马管家网络科技有限公司 | Automatic ticket business accounting system and method based on machine learning |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780001A (en) * | 2016-12-26 | 2017-05-31 | 税友软件集团股份有限公司 | A kind of invoice writes out falsely enterprise supervision recognition methods and system |
CN107122777A (en) * | 2017-04-25 | 2017-09-01 | 云南省交通科学研究所 | A kind of vehicle analysis system and analysis method based on video file |
CN108876166A (en) * | 2018-06-27 | 2018-11-23 | 平安科技(深圳)有限公司 | Financial risk authentication processing method, device, computer equipment and storage medium |
CN109255113A (en) * | 2018-09-04 | 2019-01-22 | 郑州信大壹密科技有限公司 | Intelligent critique system |
CN109472918A (en) * | 2018-10-12 | 2019-03-15 | 深圳壹账通智能科技有限公司 | Invoice validation method, financing checking method, device, equipment and medium |
CN109583978A (en) * | 2018-11-30 | 2019-04-05 | 税友软件集团股份有限公司 | The method, device and equipment of invoice enterprise is write out falsely in a kind of identification |
CN109637000A (en) * | 2018-10-23 | 2019-04-16 | 深圳壹账通智能科技有限公司 | The invoice method of inspection and device, storage medium, electric terminal |
CN110188714A (en) * | 2019-06-04 | 2019-08-30 | 言图科技有限公司 | A kind of method, system and storage medium for realizing financial management under chat scenario |
CN110415119A (en) * | 2019-07-30 | 2019-11-05 | 中国工商银行股份有限公司 | Model training, bill business prediction technique, device, storage medium and equipment |
CN110532542A (en) * | 2019-07-15 | 2019-12-03 | 西安交通大学 | It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example |
-
2019
- 2019-12-24 CN CN201911343761.6A patent/CN111178219A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780001A (en) * | 2016-12-26 | 2017-05-31 | 税友软件集团股份有限公司 | A kind of invoice writes out falsely enterprise supervision recognition methods and system |
CN107122777A (en) * | 2017-04-25 | 2017-09-01 | 云南省交通科学研究所 | A kind of vehicle analysis system and analysis method based on video file |
CN108876166A (en) * | 2018-06-27 | 2018-11-23 | 平安科技(深圳)有限公司 | Financial risk authentication processing method, device, computer equipment and storage medium |
CN109255113A (en) * | 2018-09-04 | 2019-01-22 | 郑州信大壹密科技有限公司 | Intelligent critique system |
CN109472918A (en) * | 2018-10-12 | 2019-03-15 | 深圳壹账通智能科技有限公司 | Invoice validation method, financing checking method, device, equipment and medium |
CN109637000A (en) * | 2018-10-23 | 2019-04-16 | 深圳壹账通智能科技有限公司 | The invoice method of inspection and device, storage medium, electric terminal |
CN109583978A (en) * | 2018-11-30 | 2019-04-05 | 税友软件集团股份有限公司 | The method, device and equipment of invoice enterprise is write out falsely in a kind of identification |
CN110188714A (en) * | 2019-06-04 | 2019-08-30 | 言图科技有限公司 | A kind of method, system and storage medium for realizing financial management under chat scenario |
CN110532542A (en) * | 2019-07-15 | 2019-12-03 | 西安交通大学 | It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example |
CN110415119A (en) * | 2019-07-30 | 2019-11-05 | 中国工商银行股份有限公司 | Model training, bill business prediction technique, device, storage medium and equipment |
Non-Patent Citations (1)
Title |
---|
审计署审计科研所课题组编: "《审计技术创新发展报告及案例选编 2013 下》", vol. 978, 中国铁道出版社, pages: 151 - 686 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612964A (en) * | 2020-05-21 | 2020-09-01 | 广东乐佳印刷有限公司 | Bill certificate anti-counterfeiting detection method and device based on block chain |
CN111625649A (en) * | 2020-05-28 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Text processing method and device, electronic equipment and medium |
CN111683202A (en) * | 2020-06-01 | 2020-09-18 | 北京惠朗时代科技有限公司 | Bill stamping method, device, equipment and storage medium |
CN111753841A (en) * | 2020-06-28 | 2020-10-09 | 中国银行股份有限公司 | Bill identification method and device based on routing distribution |
CN111753841B (en) * | 2020-06-28 | 2023-09-19 | 中国银行股份有限公司 | Bill identification method and device based on route distribution |
CN111768546A (en) * | 2020-06-30 | 2020-10-13 | 新奥(中国)燃气投资有限公司 | Method, device and system for automatically early warning abnormal enterprise invoices |
CN112287828A (en) * | 2020-10-29 | 2021-01-29 | 平安普惠企业管理有限公司 | Financial statement generation method and device based on machine learning |
CN112232305A (en) * | 2020-11-19 | 2021-01-15 | 中国银联股份有限公司 | Image detection method, image detection device, electronic device, and medium |
CN112446346A (en) * | 2020-12-10 | 2021-03-05 | 国网辽宁省电力有限公司丹东供电公司 | Image data scanning processing method |
CN112749639A (en) * | 2020-12-29 | 2021-05-04 | 中电金信软件有限公司 | Model training method and device, computer equipment and storage medium |
CN112749639B (en) * | 2020-12-29 | 2022-01-14 | 中电金信软件有限公司 | Model training method and device, computer equipment and storage medium |
CN112861841A (en) * | 2021-03-22 | 2021-05-28 | 北京百度网讯科技有限公司 | Bill confidence value model training method and device, electronic equipment and storage medium |
CN112861841B (en) * | 2021-03-22 | 2023-06-13 | 北京百度网讯科技有限公司 | Training method and device for bill confidence value model, electronic equipment and storage medium |
CN113435439A (en) * | 2021-06-30 | 2021-09-24 | 青岛海尔科技有限公司 | Document auditing method and device, storage medium and electronic device |
CN113435439B (en) * | 2021-06-30 | 2023-11-28 | 青岛海尔科技有限公司 | Document auditing method and device, storage medium and electronic device |
CN114240407A (en) * | 2021-11-17 | 2022-03-25 | 广东电网有限责任公司 | Bill risk conduction quantitative evaluation system and method based on block chain |
CN114049192B (en) * | 2022-01-12 | 2022-04-12 | 广东企数标普科技有限公司 | Invoice data processing method and device based on intelligent algorithm |
CN114049192A (en) * | 2022-01-12 | 2022-02-15 | 广东企数标普科技有限公司 | Invoice data processing method and device based on intelligent algorithm |
CN114358659A (en) * | 2022-03-10 | 2022-04-15 | 广东粤海集团企业服务有限公司 | Document verification information processing method and system |
CN114358659B (en) * | 2022-03-10 | 2022-06-03 | 广东粤海集团企业服务有限公司 | Document verification information processing method and system |
CN117114910A (en) * | 2023-09-22 | 2023-11-24 | 浙江河马管家网络科技有限公司 | Automatic ticket business accounting system and method based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111178219A (en) | Bill identification management method and device, storage medium and electronic equipment | |
CN110365489B (en) | Business auditing method, device and storage medium | |
CN107945024B (en) | Method for identifying internet financial loan enterprise operation abnormity, terminal equipment and storage medium | |
US9646058B2 (en) | Methods, systems, and computer program products for generating data quality indicators for relationships in a database | |
JP2022528839A (en) | Personal information protection system | |
CA3063580A1 (en) | Classifier training method and apparatus, electronic device and computer readable medium | |
CN108268593B (en) | Method, device, server and storage medium for processing credit card insurance information | |
US20170262852A1 (en) | Database monitoring system | |
WO2019243848A1 (en) | Container tracking | |
CN117314424B (en) | Block chain transaction system and method for big financial data | |
CA3103315A1 (en) | System and process for electronic payments | |
CN115018513A (en) | Data inspection method, device, equipment and storage medium | |
CN111145031B (en) | Insurance business customization method, device and system | |
CN113159796A (en) | Trade contract verification method and device | |
CN115564591A (en) | Financing product determination method and related equipment | |
WO2020039173A1 (en) | Transaction system and method | |
KR102416998B1 (en) | Appatus for automatically collecting and classification tax related documents and method thereof | |
CN114331105A (en) | Electronic draft processing system, method, electronic device and storage medium | |
Oliverio et al. | A hybrid model for fraud detection on purchase orders | |
CN107993155A (en) | Policy information processing method, device, server and storage medium | |
US11561963B1 (en) | Method and system for using time-location transaction signatures to enrich user profiles | |
CN114880369A (en) | Risk credit granting method and system based on weak data technology | |
CN113094595A (en) | Object recognition method, device, computer system and readable storage medium | |
CN114022166B (en) | Information processing method, device, computer equipment and storage medium | |
Sury | Digitization of Tax Administration in India |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200519 |