CN116049358A - Invoice information approximation degree detection method, storage medium and computer equipment - Google Patents

Invoice information approximation degree detection method, storage medium and computer equipment Download PDF

Info

Publication number
CN116049358A
CN116049358A CN202310330626.8A CN202310330626A CN116049358A CN 116049358 A CN116049358 A CN 116049358A CN 202310330626 A CN202310330626 A CN 202310330626A CN 116049358 A CN116049358 A CN 116049358A
Authority
CN
China
Prior art keywords
vector
information
name
invoice
comprehensive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310330626.8A
Other languages
Chinese (zh)
Inventor
孙元臻
王宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Score Digital Technology Zhuhai Co ltd
Original Assignee
Score Digital Technology Zhuhai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Score Digital Technology Zhuhai Co ltd filed Critical Score Digital Technology Zhuhai Co ltd
Priority to CN202310330626.8A priority Critical patent/CN116049358A/en
Publication of CN116049358A publication Critical patent/CN116049358A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an invoice information approximation degree detection method, a storage medium and computer equipment, and belongs to the technical field of electronic information processing. According to the invoice information approximation degree detection method, the project vectors of at least two core projects in the invoice information are integrated, so that a comprehensive vector containing comprehensive information can be generated for each invoice, and the comprehensive vector contains semantic information, so that the vectors generated by semantically similar contents are similar; and then, determining the information approximation degree of the invoices by comparing the distances between the comprehensive vectors of different invoices, thereby providing information support for invoice approximation degree inquiry and illegal billing.

Description

Invoice information approximation degree detection method, storage medium and computer equipment
Technical Field
The invention belongs to the technical field of electronic information processing and processing, and particularly relates to an invoice information approximation degree detection method, a storage medium and computer equipment.
Background
The existing invoice information processing mainly comprises the steps of identifying and extracting certain core information in an invoice, and then simply storing the core information in a database in fields for inquiry, statistics and the like. There is no relevant technology in terms of proximity monitoring of invoice information. Specifically, regarding the extraction of invoice information, in the prior art, information extraction is performed on each item based on image recognition or semantic recognition, for example, the information of each item such as a purchaser, a seller, a service, an amount, and an invoicing date on one invoice is extracted and stored. The prior art does not carry out comprehensive processing on the information of each core item of the invoice, and does not utilize the processing result to detect the invoice approximation degree.
Disclosure of Invention
The invention aims to provide an invoice information approximation degree detection method, a storage medium and computer equipment, which can detect invoice approximation degree. The invention is realized by the following technical scheme:
an invoice information approximation degree detection method comprises the following steps:
(1) Pre-designating at least two core items in invoice information;
(2) Acquiring corresponding information of each core item in the detected invoice, and respectively converting the corresponding information of each core item into an item vector;
(3) Splicing the item vectors corresponding to at least two core items to form a comprehensive vector;
(4) And determining the information approximation degree of the invoices by comparing the relative distances of the comprehensive vectors of different invoices.
According to the technical scheme, the project vectors of at least two core projects in invoice information are integrated to generate the comprehensive vector containing comprehensive information, and then the information approximation degree of the invoices is determined by comparing the distances between the comprehensive vectors of different invoices, so that information support is provided for invoice approximation degree inquiry and illegal invoicing.
As a specific scheme, the core item in the step (1) comprises a buyer name, a seller name and business content; in the step (2), the name of the purchaser is divided into a words, the name of the seller is divided into b words, and the business is built inThe capacity is divided into c words; converting the buyer name information into a N-dimensional buyer name vectors V by means of a word vector algorithm 1-1 Converting vendor name information into b N-dimensional vendor name vectors V 1-2 Converting service content information into c N-dimensional service content vectors V 2
As a specific scheme, the step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 and V2 And splicing to form the comprehensive vector V.
The specific scheme includes combining the buyer name vector and the seller name vector in no direction to obtain the combined purchase and sale vector V 1 The system is the same, and can detect whether the enterprise has the condition of increasing business income in a virtual way.
As a specific scheme, the core item in the step (1) further comprises an invoicing amount and an invoicing date; step (2) further comprises: converting the billing amount information into 1-dimensional billing amount vector V through a first preset algorithm 3 Converting the billing date information into 1-dimensional billing date vector V through a second preset algorithm 4
As a specific scheme, the step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 、V 2 、V 3 、V 4 And splicing to form the comprehensive vector V.
The above embodiments are in addition toThe purchaser name vector and the seller name vector are combined in a non-directional manner, so that the obtained purchase and sale combination vector V when the purchase and sale two sides are exchanged 1 The system is the same, and can detect whether the enterprise has the condition of increasing business income in a virtual way; in addition, newly introduced billing amount vectors and billing date vectors are spliced to form the comprehensive vector with more abundant information content.
As a specific scheme, the first preset algorithm adopts the following formula:
Figure SMS_1
wherein, beta is the invoicing amount of the invoice.
The specific scheme mainly considers that the tax-containing total price (invoicing amount) of different invoices has larger difference, and weakens the invoicing amount in the invoicing amount vector V after the invoicing amount is processed by adopting the formula 3 The influence of the amount of invoices on the comprehensive vector V is reduced, and the judgment of excessive final similarity of single items due to the amount of invoices is avoided.
As a specific scheme, the second preset algorithm adopts the following formula:
Figure SMS_2
wherein ,dindicating the date of invoicing and a preset standard start dateSDays of difference between them.
Above specific scheme, billing date vector V 4 Is calculated based on the relative date of invoices of the preset standard starting date S, which comprises the relative date of invoices, and based on the vector V of the date of invoices 4 And the invoice information in the relative time period is more convenient to compare.
As a specific scheme, in the step (4), a specific method for determining the information similarity of the invoices comprises the following steps:
assume that the comprehensive vectors V of two invoices are respectively
Figure SMS_3
and />
Figure SMS_4
Calculated by
Figure SMS_5
and />
Figure SMS_6
Cosine distance of (c):
Figure SMS_7
calculated by
Figure SMS_8
and />
Figure SMS_9
Is a euclidean distance of (2):
Figure SMS_10
calculated by
Figure SMS_11
and />
Figure SMS_12
Is a comprehensive distance of:
Figure SMS_13
;/>
and determining the information approximation degree of the invoices by at least one of the cosine distance, the Euclidean distance and the comprehensive distance.
In the step (2), before the purchaser name is divided into a words and the seller name is divided into b words, the place name description and the general description information in the purchaser name and the seller name are removed; before dividing the service content into c words, the general description information and the specification and model description information in the service content are removed.
According to the specific scheme, through eliminating some general description words (such as eliminating 'company', 'limited responsibility', 'Beijing', and the like) and some special specification type description words (such as eliminating 'X-2S', 'diameter 2 cm', and the like), the obtained vector can more highlight main characteristics of the vector, and has more definite pertinence and representativeness.
The invention also provides a computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps described above.
The invention also provides a computer device, which is characterized by comprising a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above.
Drawings
Fig. 1 is a system architecture diagram based on which an invoice information approximation degree detection method provided by an embodiment of the present invention is based.
Fig. 2 is a flowchart of an invoice information proximity detection method according to an embodiment of the present invention.
Fig. 3 is a specific flowchart of step (3) in the invoice information proximity detection method according to the embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. It should be noted that, the invoice information proximity detection method provided in the present application is generally executed by a computer device.
FIG. 1 illustrates an exemplary system architecture that may be applied to the invoice information proximity detection method of the present application. As shown in fig. 1, the system architecture may include: a computer device 101 and a server 102. Communication between computer device 101 and server 102 may be through a network, which may include various types of wired or wireless communication links, such as: the wired communication link includes an optical fiber, a twisted pair wire, a coaxial cable, or the like, and the WIreless communication link includes a bluetooth communication link, a WIreless-FIdelity (Wi-Fi) communication link, a microwave communication link, or the like.
The electronic invoice information, for example, an electronic invoice or an electronic document after information identification after paper invoice scanning, is stored in the computer device 101 and/or the server 102, and the computer device 101 may obtain invoice information to be detected from the server 102 or the local area.
The computer device 101 and the server 102 may be hardware or software. When the computer device 101 and the server 102 are hardware, they may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When the computer device 101 and the server 102 are software, they may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, which is not specifically limited herein.
The method for detecting the invoice information approximation degree provided in the embodiment of the present application will be described in detail with reference to fig. 2 and 3. Referring to fig. 2, the invoice information proximity detection method provided in this embodiment includes the following steps:
(1) Pre-designating at least two core items in invoice information;
(2) Acquiring corresponding information of each core item in the detected invoice, and respectively converting the corresponding information of each core item into an item vector;
(3) Splicing the item vectors corresponding to at least two core items to form a comprehensive vector;
(4) And determining the information approximation degree of the invoices by comparing the relative distances of the comprehensive vectors of different invoices.
In the step (1), the core items pre-designated in the invoice information comprise a buyer name, a seller name, business content, an invoicing amount and an invoicing date. The business content corresponds to the items of goods or tax service names in the invoice, and the invoicing amount corresponds to the items of price tax summation in the invoice. In addition, it should be noted that, in this embodiment, a suitable core item may be selected according to different emphasis in invoice information comparison; generally, the purchaser name, seller name, business content are the primary items of interest, and the amount and date of invoices may be the primary or secondary items of interest; of course, more core items may also be pre-specified.
In the step (2), firstly, the name of the purchaser is divided into a words, the name of the seller is divided into b words, and the business content is divided into c words; then the purchaser name information is converted into a N-dimensional purchaser name vector V by a word vector algorithm 1-1 Converting vendor name information into b N-dimensional vendor name vectors V 1-2 Converting service content information into c N-dimensional service content vectors V 2 The method comprises the steps of carrying out a first treatment on the surface of the In addition, the billing amount information is converted into 1-dimensional billing amount vector V through a first preset algorithm 3 Converting the billing date information into 1-dimensional billing date vector V through a second preset algorithm 4
Typically, the unified social code is a unique identity of an enterprise or organization, but since the unified social code cannot reflect potential relationships between companies, the present embodiment uses the chinese name of the purchaser for processing the purchaser information. The buyer name vector V is illustrated below by way of one example 1-1 Is generated by the following steps:
s210, eliminating punctuation marks, place name descriptions and general description information in the names of the buyers and sellers, wherein the method comprises the following steps:
s211, removing various Chinese punctuation marks. For example: the "AB online network technology (Beijing) Limited company" is converted into "AB online network technology Beijing Limited company". In order to avoid the introduction of other unit names, the actual Chinese characters are replaced by 'AB', and the following 'CD' is the same.
S212, remove general description words such as "share", "limited", "responsibility", "group", "company", and the like. For example: the 'AB online network technology Beijing limited company' is converted into the 'AB online network technology Beijing'.
S213, removing description words of the place name (province, city). For example: the 'AB online network technology Beijing' is converted into the 'AB online network technology'.
S220, dividing the name of the buyer into a words by using a Chinese word segmentation algorithm. For example: the term "AB online networking technology" is converted to 3 words "AB/online/networking technology".
S230, processing the segmented purchaser name AB/online/network technology by using a word vector algorithm to generate 3 64-dimensional vectors V 1-1 (in this embodiment, N is an integer multiple of 16, for example, 32, 64, 128, preferably 64), as follows:
Figure SMS_14
the present embodiment also uses the chinese name of the seller when processing the seller information. Vendor name vector V 1-2 Is generated with the buyer name vector V described above 1-1 Similar to the generation process of (a), the following is illustrated by way of an example:
for example: the name of the seller is named as 'Zhuhai CD electric appliance stock Co., ltd', the name description and the general description information are removed to obtain 'CD electric appliance', the 'CD electric appliance' is divided into 2 words 'CD/electric appliance', and finally the 'CD/electric appliance' is converted into 2 vectors V with 64 dimensions 1-2 The following are provided:
Figure SMS_15
in the scheme, some general description words (such as eliminating 'company', 'limited responsibility', 'Beijing', and the like) are eliminated, so that the obtained vector can more highlight the main characteristics of the name of the purchaser or the seller, and has more definite pertinence and representativeness.
The present embodiment uses the vendor name vector V when processing the business content information 1-2 Buyer name vector V 1-1 The generation process is similar to that of the followingThe description is made in connection with the examples:
firstly, general description information and specification and model description information in service content are removed, common words with no meaning are removed, and then the service content is divided into c words. For example: the technology development cost is converted into: "technology/development"; for another example: the "aviation grade screw 12cm x 2cm" was converted into: "aviation/stage/screw".
Then, processing is performed using an algorithm of "word vector" to generate a plurality of 32-dimensional vectors (a plurality of=number of word segments). Because the names of goods or labor are in actual use, more general words are often filled in, such as: service fees, meal fees, office supplies, and the like. Therefore, in order to reduce the amount of computation, complexity, and amount of data stored, 32 dimensions may be generated.
Finally, the vector of each word of the business content is spliced after the average value and the maximum value are calculated according to the dimension, a 64-dimension vector is generated and is recorded as V 2 . Wherein the average value and the maximum value of the vector of each word of the business content are calculated according to dimensions and then are spliced, see the following for the name vector V of the buyer 1-1 Vendor name vector V 1-2 Averaging, maximizing and splicing (i.e., steps S3-1 to S3-3 described below).
In addition, since the invoicing amount beta (including tax total price) of different invoices may be quite different, the invoicing amount information is converted into 1-dimensional invoicing amount vector V by processing through a first preset algorithm 3 The first preset algorithm adopts the following formula:
Figure SMS_16
after the billing amount is processed by adopting the formula, the billing amount is weakened in the billing amount vector V 3 The influence of the amount of invoices on the comprehensive vector V is reduced, and the judgment of excessive final similarity of single items due to the amount of invoices is avoided.
Finally, the billing date information is converted into 1-dimensional billing date through a second preset algorithmVector V 4 The second preset algorithm adopts the following formula:
Figure SMS_17
wherein d represents the number of days of the difference between the date of invoicing and a preset standard start date S. Assuming that s=01/01/2020, the date of billing is 2021/01, d=366.
From the above, the date of invoicing vector V 4 Is based on a preset standard start dateSIs calculated based on the relative date of the invoices, which includes the relative date of the invoices, based on the vector V of the date of the invoices 4 And the invoice information in the relative time period is more convenient to compare.
Referring to fig. 3, step (3) specifically includes:
and S3-1, carrying out dimension averaging on the purchaser name vector and the seller name vector to obtain an average value vector of 1N dimensions, wherein N=64. In conjunction with the example described above, the purchaser and seller have a total of 5 64-dimensional vectors:
Figure SMS_18
after averaging, a 64-dimensional average vector is obtained as follows:
Figure SMS_19
and S3-2, obtaining a maximum value vector of 1N dimensions by dimension of the purchaser name vector and the seller name vector, wherein N=64.
As for dimension 1, calculate:
Figure SMS_20
the method comprises the steps of carrying out a first treatment on the surface of the And then sequentially calculating the maximum value of all the dimensions to obtain a maximum value vector of 64 dimensions, wherein the maximum value vector is as follows:
Figure SMS_21
s3-3, splicing the average value vector and the maximum value vector together to obtain a 128-dimensional purchase-sale combination vector V 1 The following are provided:
Figure SMS_22
s3-4, will V 1 、V 2 、V 3 、V 4 Splicing to obtain a 194-dimensional final result vector:
Figure SMS_23
the method comprises the steps of carrying out a first treatment on the surface of the The vector is stored in a database for use.
It should be noted that, as described above, a suitable core item may be selected according to the focus of attention during the invoice information comparison; in the present embodiment, only V may be used 1 and V2 And splicing to form a comprehensive vector V with a certain representativeness. Of course, after the billing amount vector and the billing date vector are introduced and spliced, the information content of the integrated vector is richer.
In the above scheme, the buyer name vector and the seller name vector are combined in a non-directional manner (direction-independent meaning: whether AB is sold to CD or CD is sold to AB, the combined vectors should be the same), so that when the two sides of purchase and sale are exchanged, the obtained purchase and sale combined vector V 1 Is identical to be used to identify the behavior of the business to virtually increase revenue.
In step (4) of the invoice information approximation degree detection method of the embodiment, a specific method for determining the information approximation degree of the invoices comprises the following steps:
assume that the comprehensive vectors V of two invoices are respectively
Figure SMS_24
and />
Figure SMS_25
Calculated by
Figure SMS_26
and />
Figure SMS_27
Cosine distance of (c):
Figure SMS_28
calculated by
Figure SMS_29
and />
Figure SMS_30
Is a euclidean distance of (2):
Figure SMS_31
calculated by
Figure SMS_32
and />
Figure SMS_33
Is a comprehensive distance of:
Figure SMS_34
and determining the information approximation degree of the invoices with each other through at least one of the cosine distance, the Euclidean distance and the comprehensive distance, wherein the approximation degree is higher when the distance is closer.
The present embodiment also provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps. The embodiment also provides a computer device, comprising a processor and a memory; the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above.
According to the invoice information similarity detection method, the project vectors of at least two core projects in the invoice information are integrated, so that a comprehensive vector containing comprehensive information can be generated for each invoice, and the comprehensive vector contains semantic information, so that the vectors generated by semantically similar contents are similar; and then, determining the information approximation degree of the invoices by comparing the distances between the comprehensive vectors of different invoices, thereby providing information support for invoice approximation degree inquiry and illegal billing. Such as: the buyer of invoice 1 is AB online network technology (Beijing) Limited company, the buyer of invoice 2 is AB times network technology (Beijing) Limited company, and after vector coding, the vectors of invoice 1 and invoice 2 are very similar. Based on the invention, similar invoices can be quickly found from massive invoice data according to the comprehensive vector of the invoices. The efficiency of anti-fraud and money backwashing can be greatly improved. Moreover, the similarity calculation logic operation amount is small, and the analysis of hundreds of millions of invoice scales can be supported.
The above embodiments are merely for fully disclosing the present invention, but not limiting the present invention, and substitution of equivalent technical features based on the gist of the present invention, which can be obtained without inventive effort, should be considered as the scope of the present disclosure.

Claims (10)

1. The invoice information approximation degree detection method is characterized by comprising the following steps of:
(1) Pre-designating at least two core items in invoice information;
(2) Acquiring corresponding information of each core item in the detected invoice, and respectively converting the corresponding information of each core item into an item vector;
(3) Splicing the item vectors corresponding to at least two core items to form a comprehensive vector;
(4) And determining the information approximation degree of the invoices by comparing the relative distances of the comprehensive vectors of different invoices.
2. The invoice information proximity detection method according to claim 1, wherein the core item in step (1) includes a purchaser name, a seller name, a business content; step (2)) Dividing the name of the purchaser into a words, dividing the name of the seller into b words, and dividing the business content into c words; converting the buyer name information into a N-dimensional buyer name vectors V by means of a word vector algorithm 1-1 Converting vendor name information into b N-dimensional vendor name vectors V 1-2 Converting service content information into c N-dimensional service content vectors V 2
3. The invoice information proximity detection method according to claim 2, wherein step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 and V2 And splicing to form the comprehensive vector V.
4. The invoice information proximity detection method according to claim 2, wherein the core item in step (1) further includes an invoicing amount and an invoicing date; step (2) further comprises: converting the billing amount information into 1-dimensional billing amount vector V by a first preset algorithm 3 Converting the billing date information into 1-dimensional billing date vector V by a second preset algorithm 4
5. The invoice information proximity detection method according to claim 4, wherein step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 、V 2 、V 3 、V 4 And splicing to form the comprehensive vector V.
6. The invoice information proximity detection method according to claim 4, wherein the first preset algorithm adopts the following formula:
Figure QLYQS_1
wherein, beta is the invoicing amount of the invoice.
7. The invoice information proximity detection method according to claim 4, wherein the second preset algorithm adopts the following formula:
Figure QLYQS_2
wherein ,dindicating the date of invoicing and a preset standard start dateSDays of difference between them.
8. The invoice information proximity detection method according to claim 3 or 5, wherein in step (4), the specific method for determining the information proximity of invoices to each other is:
assume that the comprehensive vectors V of two invoices are respectively
Figure QLYQS_3
and />
Figure QLYQS_4
Calculated by
Figure QLYQS_5
and />
Figure QLYQS_6
Cosine distance of (c):
Figure QLYQS_7
calculated by
Figure QLYQS_8
and />
Figure QLYQS_9
Is a euclidean distance of (2):
Figure QLYQS_10
calculated by
Figure QLYQS_11
and />
Figure QLYQS_12
Is a comprehensive distance of:
Figure QLYQS_13
and determining the information approximation degree of the invoices by at least one of the cosine distance, the Euclidean distance and the comprehensive distance.
9. A storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method of any one of claims 1-8.
10. A computer device comprising a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the method according to any of claims 1-8.
CN202310330626.8A 2023-03-31 2023-03-31 Invoice information approximation degree detection method, storage medium and computer equipment Pending CN116049358A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310330626.8A CN116049358A (en) 2023-03-31 2023-03-31 Invoice information approximation degree detection method, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310330626.8A CN116049358A (en) 2023-03-31 2023-03-31 Invoice information approximation degree detection method, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN116049358A true CN116049358A (en) 2023-05-02

Family

ID=86129820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310330626.8A Pending CN116049358A (en) 2023-03-31 2023-03-31 Invoice information approximation degree detection method, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN116049358A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN104636971A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of detecting one number for multiple names of value added tax invoice and system thereof
CN108595634A (en) * 2018-04-25 2018-09-28 腾讯科技(深圳)有限公司 Message management method, device and electronic equipment
CN109740642A (en) * 2018-12-19 2019-05-10 北京邮电大学 Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing
CN111753060A (en) * 2020-07-29 2020-10-09 腾讯科技(深圳)有限公司 Information retrieval method, device, equipment and computer readable storage medium
CN112232149A (en) * 2020-09-28 2021-01-15 北京易道博识科技有限公司 Document multi-mode information and relation extraction method and system
CN114022699A (en) * 2021-10-15 2022-02-08 众安在线财产保险股份有限公司 Image classification method and device, computer equipment and storage medium
CN115809887A (en) * 2022-12-09 2023-03-17 蔷薇大树科技有限公司 Method and device for determining main business range of enterprise based on invoice data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN104636971A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of detecting one number for multiple names of value added tax invoice and system thereof
CN108595634A (en) * 2018-04-25 2018-09-28 腾讯科技(深圳)有限公司 Message management method, device and electronic equipment
CN109740642A (en) * 2018-12-19 2019-05-10 北京邮电大学 Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing
CN111753060A (en) * 2020-07-29 2020-10-09 腾讯科技(深圳)有限公司 Information retrieval method, device, equipment and computer readable storage medium
CN112232149A (en) * 2020-09-28 2021-01-15 北京易道博识科技有限公司 Document multi-mode information and relation extraction method and system
CN114022699A (en) * 2021-10-15 2022-02-08 众安在线财产保险股份有限公司 Image classification method and device, computer equipment and storage medium
CN115809887A (en) * 2022-12-09 2023-03-17 蔷薇大树科技有限公司 Method and device for determining main business range of enterprise based on invoice data

Similar Documents

Publication Publication Date Title
WO2019109918A1 (en) Abstract text generation method, computer readable storage medium and computer device
US11062132B2 (en) System and method for identification of missing data elements in electronic documents
EP3380958A1 (en) System and method for automatic validation
TWI534735B (en) Information identification methods and equipment
US8793201B1 (en) System and method for seeding rule-based machine learning models
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
CN115249007A (en) Method and device for detecting enclosing and bidding behavior based on electronic bidding document comparison
US20180018312A1 (en) System and method for monitoring electronic documents
CN110362702B (en) Picture management method and equipment
CN108470065B (en) Method and device for determining abnormal comment text
US8577814B1 (en) System and method for genetic creation of a rule set for duplicate detection
JP7170689B2 (en) Output device, output method and output program
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
WO2021055868A1 (en) Associating user-provided content items to interest nodes
EP3430540A1 (en) System and method for automatically generating reporting data based on electronic documents
CN116049358A (en) Invoice information approximation degree detection method, storage medium and computer equipment
CN114996579A (en) Information pushing method and device, electronic equipment and computer readable medium
CN114169928A (en) Novel store sales management method, system, equipment and readable storage medium
CN113127597A (en) Processing method and device for search information and electronic equipment
US10387561B2 (en) System and method for obtaining reissues of electronic documents lacking required data
CN113342969A (en) Data processing method and device
CN110895564A (en) Potential customer data processing method and device
CN112199578B (en) Information processing method and apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination