CN116049358A - Invoice information approximation degree detection method, storage medium and computer equipment - Google Patents
Invoice information approximation degree detection method, storage medium and computer equipment Download PDFInfo
- Publication number
- CN116049358A CN116049358A CN202310330626.8A CN202310330626A CN116049358A CN 116049358 A CN116049358 A CN 116049358A CN 202310330626 A CN202310330626 A CN 202310330626A CN 116049358 A CN116049358 A CN 116049358A
- Authority
- CN
- China
- Prior art keywords
- vector
- information
- name
- invoice
- comprehensive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 148
- 238000000034 method Methods 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 19
- 238000012935 Averaging Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 3
- 230000010365 information processing Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 14
- 238000012545 processing Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- VZSRBBMJRBPUNF-UHFFFAOYSA-N 2-(2,3-dihydro-1H-inden-2-ylamino)-N-[3-oxo-3-(2,4,6,7-tetrahydrotriazolo[4,5-c]pyridin-5-yl)propyl]pyrimidine-5-carboxamide Chemical compound C1C(CC2=CC=CC=C12)NC1=NC=C(C=N1)C(=O)NCCC(N1CC2=C(CC1)NN=N2)=O VZSRBBMJRBPUNF-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000011001 backwashing Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/04—Billing or invoicing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an invoice information approximation degree detection method, a storage medium and computer equipment, and belongs to the technical field of electronic information processing. According to the invoice information approximation degree detection method, the project vectors of at least two core projects in the invoice information are integrated, so that a comprehensive vector containing comprehensive information can be generated for each invoice, and the comprehensive vector contains semantic information, so that the vectors generated by semantically similar contents are similar; and then, determining the information approximation degree of the invoices by comparing the distances between the comprehensive vectors of different invoices, thereby providing information support for invoice approximation degree inquiry and illegal billing.
Description
Technical Field
The invention belongs to the technical field of electronic information processing and processing, and particularly relates to an invoice information approximation degree detection method, a storage medium and computer equipment.
Background
The existing invoice information processing mainly comprises the steps of identifying and extracting certain core information in an invoice, and then simply storing the core information in a database in fields for inquiry, statistics and the like. There is no relevant technology in terms of proximity monitoring of invoice information. Specifically, regarding the extraction of invoice information, in the prior art, information extraction is performed on each item based on image recognition or semantic recognition, for example, the information of each item such as a purchaser, a seller, a service, an amount, and an invoicing date on one invoice is extracted and stored. The prior art does not carry out comprehensive processing on the information of each core item of the invoice, and does not utilize the processing result to detect the invoice approximation degree.
Disclosure of Invention
The invention aims to provide an invoice information approximation degree detection method, a storage medium and computer equipment, which can detect invoice approximation degree. The invention is realized by the following technical scheme:
an invoice information approximation degree detection method comprises the following steps:
(1) Pre-designating at least two core items in invoice information;
(2) Acquiring corresponding information of each core item in the detected invoice, and respectively converting the corresponding information of each core item into an item vector;
(3) Splicing the item vectors corresponding to at least two core items to form a comprehensive vector;
(4) And determining the information approximation degree of the invoices by comparing the relative distances of the comprehensive vectors of different invoices.
According to the technical scheme, the project vectors of at least two core projects in invoice information are integrated to generate the comprehensive vector containing comprehensive information, and then the information approximation degree of the invoices is determined by comparing the distances between the comprehensive vectors of different invoices, so that information support is provided for invoice approximation degree inquiry and illegal invoicing.
As a specific scheme, the core item in the step (1) comprises a buyer name, a seller name and business content; in the step (2), the name of the purchaser is divided into a words, the name of the seller is divided into b words, and the business is built inThe capacity is divided into c words; converting the buyer name information into a N-dimensional buyer name vectors V by means of a word vector algorithm 1-1 Converting vendor name information into b N-dimensional vendor name vectors V 1-2 Converting service content information into c N-dimensional service content vectors V 2 。
As a specific scheme, the step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 and V2 And splicing to form the comprehensive vector V.
The specific scheme includes combining the buyer name vector and the seller name vector in no direction to obtain the combined purchase and sale vector V 1 The system is the same, and can detect whether the enterprise has the condition of increasing business income in a virtual way.
As a specific scheme, the core item in the step (1) further comprises an invoicing amount and an invoicing date; step (2) further comprises: converting the billing amount information into 1-dimensional billing amount vector V through a first preset algorithm 3 Converting the billing date information into 1-dimensional billing date vector V through a second preset algorithm 4 。
As a specific scheme, the step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 、V 2 、V 3 、V 4 And splicing to form the comprehensive vector V.
The above embodiments are in addition toThe purchaser name vector and the seller name vector are combined in a non-directional manner, so that the obtained purchase and sale combination vector V when the purchase and sale two sides are exchanged 1 The system is the same, and can detect whether the enterprise has the condition of increasing business income in a virtual way; in addition, newly introduced billing amount vectors and billing date vectors are spliced to form the comprehensive vector with more abundant information content.
As a specific scheme, the first preset algorithm adopts the following formula:
wherein, beta is the invoicing amount of the invoice.
The specific scheme mainly considers that the tax-containing total price (invoicing amount) of different invoices has larger difference, and weakens the invoicing amount in the invoicing amount vector V after the invoicing amount is processed by adopting the formula 3 The influence of the amount of invoices on the comprehensive vector V is reduced, and the judgment of excessive final similarity of single items due to the amount of invoices is avoided.
As a specific scheme, the second preset algorithm adopts the following formula:
wherein ,dindicating the date of invoicing and a preset standard start dateSDays of difference between them.
Above specific scheme, billing date vector V 4 Is calculated based on the relative date of invoices of the preset standard starting date S, which comprises the relative date of invoices, and based on the vector V of the date of invoices 4 And the invoice information in the relative time period is more convenient to compare.
As a specific scheme, in the step (4), a specific method for determining the information similarity of the invoices comprises the following steps:
and determining the information approximation degree of the invoices by at least one of the cosine distance, the Euclidean distance and the comprehensive distance.
In the step (2), before the purchaser name is divided into a words and the seller name is divided into b words, the place name description and the general description information in the purchaser name and the seller name are removed; before dividing the service content into c words, the general description information and the specification and model description information in the service content are removed.
According to the specific scheme, through eliminating some general description words (such as eliminating 'company', 'limited responsibility', 'Beijing', and the like) and some special specification type description words (such as eliminating 'X-2S', 'diameter 2 cm', and the like), the obtained vector can more highlight main characteristics of the vector, and has more definite pertinence and representativeness.
The invention also provides a computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps described above.
The invention also provides a computer device, which is characterized by comprising a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above.
Drawings
Fig. 1 is a system architecture diagram based on which an invoice information approximation degree detection method provided by an embodiment of the present invention is based.
Fig. 2 is a flowchart of an invoice information proximity detection method according to an embodiment of the present invention.
Fig. 3 is a specific flowchart of step (3) in the invoice information proximity detection method according to the embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. It should be noted that, the invoice information proximity detection method provided in the present application is generally executed by a computer device.
FIG. 1 illustrates an exemplary system architecture that may be applied to the invoice information proximity detection method of the present application. As shown in fig. 1, the system architecture may include: a computer device 101 and a server 102. Communication between computer device 101 and server 102 may be through a network, which may include various types of wired or wireless communication links, such as: the wired communication link includes an optical fiber, a twisted pair wire, a coaxial cable, or the like, and the WIreless communication link includes a bluetooth communication link, a WIreless-FIdelity (Wi-Fi) communication link, a microwave communication link, or the like.
The electronic invoice information, for example, an electronic invoice or an electronic document after information identification after paper invoice scanning, is stored in the computer device 101 and/or the server 102, and the computer device 101 may obtain invoice information to be detected from the server 102 or the local area.
The computer device 101 and the server 102 may be hardware or software. When the computer device 101 and the server 102 are hardware, they may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When the computer device 101 and the server 102 are software, they may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, which is not specifically limited herein.
The method for detecting the invoice information approximation degree provided in the embodiment of the present application will be described in detail with reference to fig. 2 and 3. Referring to fig. 2, the invoice information proximity detection method provided in this embodiment includes the following steps:
(1) Pre-designating at least two core items in invoice information;
(2) Acquiring corresponding information of each core item in the detected invoice, and respectively converting the corresponding information of each core item into an item vector;
(3) Splicing the item vectors corresponding to at least two core items to form a comprehensive vector;
(4) And determining the information approximation degree of the invoices by comparing the relative distances of the comprehensive vectors of different invoices.
In the step (1), the core items pre-designated in the invoice information comprise a buyer name, a seller name, business content, an invoicing amount and an invoicing date. The business content corresponds to the items of goods or tax service names in the invoice, and the invoicing amount corresponds to the items of price tax summation in the invoice. In addition, it should be noted that, in this embodiment, a suitable core item may be selected according to different emphasis in invoice information comparison; generally, the purchaser name, seller name, business content are the primary items of interest, and the amount and date of invoices may be the primary or secondary items of interest; of course, more core items may also be pre-specified.
In the step (2), firstly, the name of the purchaser is divided into a words, the name of the seller is divided into b words, and the business content is divided into c words; then the purchaser name information is converted into a N-dimensional purchaser name vector V by a word vector algorithm 1-1 Converting vendor name information into b N-dimensional vendor name vectors V 1-2 Converting service content information into c N-dimensional service content vectors V 2 The method comprises the steps of carrying out a first treatment on the surface of the In addition, the billing amount information is converted into 1-dimensional billing amount vector V through a first preset algorithm 3 Converting the billing date information into 1-dimensional billing date vector V through a second preset algorithm 4 。
Typically, the unified social code is a unique identity of an enterprise or organization, but since the unified social code cannot reflect potential relationships between companies, the present embodiment uses the chinese name of the purchaser for processing the purchaser information. The buyer name vector V is illustrated below by way of one example 1-1 Is generated by the following steps:
s210, eliminating punctuation marks, place name descriptions and general description information in the names of the buyers and sellers, wherein the method comprises the following steps:
s211, removing various Chinese punctuation marks. For example: the "AB online network technology (Beijing) Limited company" is converted into "AB online network technology Beijing Limited company". In order to avoid the introduction of other unit names, the actual Chinese characters are replaced by 'AB', and the following 'CD' is the same.
S212, remove general description words such as "share", "limited", "responsibility", "group", "company", and the like. For example: the 'AB online network technology Beijing limited company' is converted into the 'AB online network technology Beijing'.
S213, removing description words of the place name (province, city). For example: the 'AB online network technology Beijing' is converted into the 'AB online network technology'.
S220, dividing the name of the buyer into a words by using a Chinese word segmentation algorithm. For example: the term "AB online networking technology" is converted to 3 words "AB/online/networking technology".
S230, processing the segmented purchaser name AB/online/network technology by using a word vector algorithm to generate 3 64-dimensional vectors V 1-1 (in this embodiment, N is an integer multiple of 16, for example, 32, 64, 128, preferably 64), as follows:
the present embodiment also uses the chinese name of the seller when processing the seller information. Vendor name vector V 1-2 Is generated with the buyer name vector V described above 1-1 Similar to the generation process of (a), the following is illustrated by way of an example:
for example: the name of the seller is named as 'Zhuhai CD electric appliance stock Co., ltd', the name description and the general description information are removed to obtain 'CD electric appliance', the 'CD electric appliance' is divided into 2 words 'CD/electric appliance', and finally the 'CD/electric appliance' is converted into 2 vectors V with 64 dimensions 1-2 The following are provided:
in the scheme, some general description words (such as eliminating 'company', 'limited responsibility', 'Beijing', and the like) are eliminated, so that the obtained vector can more highlight the main characteristics of the name of the purchaser or the seller, and has more definite pertinence and representativeness.
The present embodiment uses the vendor name vector V when processing the business content information 1-2 Buyer name vector V 1-1 The generation process is similar to that of the followingThe description is made in connection with the examples:
firstly, general description information and specification and model description information in service content are removed, common words with no meaning are removed, and then the service content is divided into c words. For example: the technology development cost is converted into: "technology/development"; for another example: the "aviation grade screw 12cm x 2cm" was converted into: "aviation/stage/screw".
Then, processing is performed using an algorithm of "word vector" to generate a plurality of 32-dimensional vectors (a plurality of=number of word segments). Because the names of goods or labor are in actual use, more general words are often filled in, such as: service fees, meal fees, office supplies, and the like. Therefore, in order to reduce the amount of computation, complexity, and amount of data stored, 32 dimensions may be generated.
Finally, the vector of each word of the business content is spliced after the average value and the maximum value are calculated according to the dimension, a 64-dimension vector is generated and is recorded as V 2 . Wherein the average value and the maximum value of the vector of each word of the business content are calculated according to dimensions and then are spliced, see the following for the name vector V of the buyer 1-1 Vendor name vector V 1-2 Averaging, maximizing and splicing (i.e., steps S3-1 to S3-3 described below).
In addition, since the invoicing amount beta (including tax total price) of different invoices may be quite different, the invoicing amount information is converted into 1-dimensional invoicing amount vector V by processing through a first preset algorithm 3 The first preset algorithm adopts the following formula:
after the billing amount is processed by adopting the formula, the billing amount is weakened in the billing amount vector V 3 The influence of the amount of invoices on the comprehensive vector V is reduced, and the judgment of excessive final similarity of single items due to the amount of invoices is avoided.
Finally, the billing date information is converted into 1-dimensional billing date through a second preset algorithmVector V 4 The second preset algorithm adopts the following formula:
wherein d represents the number of days of the difference between the date of invoicing and a preset standard start date S. Assuming that s=01/01/2020, the date of billing is 2021/01, d=366.
From the above, the date of invoicing vector V 4 Is based on a preset standard start dateSIs calculated based on the relative date of the invoices, which includes the relative date of the invoices, based on the vector V of the date of the invoices 4 And the invoice information in the relative time period is more convenient to compare.
Referring to fig. 3, step (3) specifically includes:
and S3-1, carrying out dimension averaging on the purchaser name vector and the seller name vector to obtain an average value vector of 1N dimensions, wherein N=64. In conjunction with the example described above, the purchaser and seller have a total of 5 64-dimensional vectors:
after averaging, a 64-dimensional average vector is obtained as follows:
and S3-2, obtaining a maximum value vector of 1N dimensions by dimension of the purchaser name vector and the seller name vector, wherein N=64.
As for dimension 1, calculate:the method comprises the steps of carrying out a first treatment on the surface of the And then sequentially calculating the maximum value of all the dimensions to obtain a maximum value vector of 64 dimensions, wherein the maximum value vector is as follows:
s3-3, splicing the average value vector and the maximum value vector together to obtain a 128-dimensional purchase-sale combination vector V 1 The following are provided:
s3-4, will V 1 、V 2 、V 3 、V 4 Splicing to obtain a 194-dimensional final result vector:
the method comprises the steps of carrying out a first treatment on the surface of the The vector is stored in a database for use.
It should be noted that, as described above, a suitable core item may be selected according to the focus of attention during the invoice information comparison; in the present embodiment, only V may be used 1 and V2 And splicing to form a comprehensive vector V with a certain representativeness. Of course, after the billing amount vector and the billing date vector are introduced and spliced, the information content of the integrated vector is richer.
In the above scheme, the buyer name vector and the seller name vector are combined in a non-directional manner (direction-independent meaning: whether AB is sold to CD or CD is sold to AB, the combined vectors should be the same), so that when the two sides of purchase and sale are exchanged, the obtained purchase and sale combined vector V 1 Is identical to be used to identify the behavior of the business to virtually increase revenue.
In step (4) of the invoice information approximation degree detection method of the embodiment, a specific method for determining the information approximation degree of the invoices comprises the following steps:
and determining the information approximation degree of the invoices with each other through at least one of the cosine distance, the Euclidean distance and the comprehensive distance, wherein the approximation degree is higher when the distance is closer.
The present embodiment also provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps. The embodiment also provides a computer device, comprising a processor and a memory; the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above.
According to the invoice information similarity detection method, the project vectors of at least two core projects in the invoice information are integrated, so that a comprehensive vector containing comprehensive information can be generated for each invoice, and the comprehensive vector contains semantic information, so that the vectors generated by semantically similar contents are similar; and then, determining the information approximation degree of the invoices by comparing the distances between the comprehensive vectors of different invoices, thereby providing information support for invoice approximation degree inquiry and illegal billing. Such as: the buyer of invoice 1 is AB online network technology (Beijing) Limited company, the buyer of invoice 2 is AB times network technology (Beijing) Limited company, and after vector coding, the vectors of invoice 1 and invoice 2 are very similar. Based on the invention, similar invoices can be quickly found from massive invoice data according to the comprehensive vector of the invoices. The efficiency of anti-fraud and money backwashing can be greatly improved. Moreover, the similarity calculation logic operation amount is small, and the analysis of hundreds of millions of invoice scales can be supported.
The above embodiments are merely for fully disclosing the present invention, but not limiting the present invention, and substitution of equivalent technical features based on the gist of the present invention, which can be obtained without inventive effort, should be considered as the scope of the present disclosure.
Claims (10)
1. The invoice information approximation degree detection method is characterized by comprising the following steps of:
(1) Pre-designating at least two core items in invoice information;
(2) Acquiring corresponding information of each core item in the detected invoice, and respectively converting the corresponding information of each core item into an item vector;
(3) Splicing the item vectors corresponding to at least two core items to form a comprehensive vector;
(4) And determining the information approximation degree of the invoices by comparing the relative distances of the comprehensive vectors of different invoices.
2. The invoice information proximity detection method according to claim 1, wherein the core item in step (1) includes a purchaser name, a seller name, a business content; step (2)) Dividing the name of the purchaser into a words, dividing the name of the seller into b words, and dividing the business content into c words; converting the buyer name information into a N-dimensional buyer name vectors V by means of a word vector algorithm 1-1 Converting vendor name information into b N-dimensional vendor name vectors V 1-2 Converting service content information into c N-dimensional service content vectors V 2 。
3. The invoice information proximity detection method according to claim 2, wherein step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 and V2 And splicing to form the comprehensive vector V.
4. The invoice information proximity detection method according to claim 2, wherein the core item in step (1) further includes an invoicing amount and an invoicing date; step (2) further comprises: converting the billing amount information into 1-dimensional billing amount vector V by a first preset algorithm 3 Converting the billing date information into 1-dimensional billing date vector V by a second preset algorithm 4 。
5. The invoice information proximity detection method according to claim 4, wherein step (3) specifically includes: s3-1, carrying out dimension averaging on the buyer name vector and the seller name vector to obtain 1N-dimensional average value vector; s3-2, solving the maximum value of the purchaser name vector and the seller name vector according to the dimension, and solving 1N-dimensional maximum value vector; s3-3, splicing the average value vector and the maximum value vector together to obtain a purchase-sale combination vector V with 2N dimensions 1 The method comprises the steps of carrying out a first treatment on the surface of the S3-4, will V 1 、V 2 、V 3 、V 4 And splicing to form the comprehensive vector V.
8. The invoice information proximity detection method according to claim 3 or 5, wherein in step (4), the specific method for determining the information proximity of invoices to each other is:
and determining the information approximation degree of the invoices by at least one of the cosine distance, the Euclidean distance and the comprehensive distance.
9. A storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method of any one of claims 1-8.
10. A computer device comprising a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the method according to any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310330626.8A CN116049358A (en) | 2023-03-31 | 2023-03-31 | Invoice information approximation degree detection method, storage medium and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310330626.8A CN116049358A (en) | 2023-03-31 | 2023-03-31 | Invoice information approximation degree detection method, storage medium and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116049358A true CN116049358A (en) | 2023-05-02 |
Family
ID=86129820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310330626.8A Pending CN116049358A (en) | 2023-03-31 | 2023-03-31 | Invoice information approximation degree detection method, storage medium and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116049358A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104424613A (en) * | 2013-09-04 | 2015-03-18 | 航天信息股份有限公司 | Value added tax invoice monitoring method and system thereof |
CN104636971A (en) * | 2013-11-06 | 2015-05-20 | 航天信息股份有限公司 | Method of detecting one number for multiple names of value added tax invoice and system thereof |
CN108595634A (en) * | 2018-04-25 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Message management method, device and electronic equipment |
CN109740642A (en) * | 2018-12-19 | 2019-05-10 | 北京邮电大学 | Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing |
CN111753060A (en) * | 2020-07-29 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Information retrieval method, device, equipment and computer readable storage medium |
CN112232149A (en) * | 2020-09-28 | 2021-01-15 | 北京易道博识科技有限公司 | Document multi-mode information and relation extraction method and system |
CN114022699A (en) * | 2021-10-15 | 2022-02-08 | 众安在线财产保险股份有限公司 | Image classification method and device, computer equipment and storage medium |
CN115809887A (en) * | 2022-12-09 | 2023-03-17 | 蔷薇大树科技有限公司 | Method and device for determining main business range of enterprise based on invoice data |
-
2023
- 2023-03-31 CN CN202310330626.8A patent/CN116049358A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104424613A (en) * | 2013-09-04 | 2015-03-18 | 航天信息股份有限公司 | Value added tax invoice monitoring method and system thereof |
CN104636971A (en) * | 2013-11-06 | 2015-05-20 | 航天信息股份有限公司 | Method of detecting one number for multiple names of value added tax invoice and system thereof |
CN108595634A (en) * | 2018-04-25 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Message management method, device and electronic equipment |
CN109740642A (en) * | 2018-12-19 | 2019-05-10 | 北京邮电大学 | Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing |
CN111753060A (en) * | 2020-07-29 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Information retrieval method, device, equipment and computer readable storage medium |
CN112232149A (en) * | 2020-09-28 | 2021-01-15 | 北京易道博识科技有限公司 | Document multi-mode information and relation extraction method and system |
CN114022699A (en) * | 2021-10-15 | 2022-02-08 | 众安在线财产保险股份有限公司 | Image classification method and device, computer equipment and storage medium |
CN115809887A (en) * | 2022-12-09 | 2023-03-17 | 蔷薇大树科技有限公司 | Method and device for determining main business range of enterprise based on invoice data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019109918A1 (en) | Abstract text generation method, computer readable storage medium and computer device | |
US11062132B2 (en) | System and method for identification of missing data elements in electronic documents | |
EP3380958A1 (en) | System and method for automatic validation | |
TWI534735B (en) | Information identification methods and equipment | |
US8793201B1 (en) | System and method for seeding rule-based machine learning models | |
US20190080352A1 (en) | Segment Extension Based on Lookalike Selection | |
CN111966886A (en) | Object recommendation method, object recommendation device, electronic equipment and storage medium | |
CN112883990A (en) | Data classification method and device, computer storage medium and electronic equipment | |
CN115249007A (en) | Method and device for detecting enclosing and bidding behavior based on electronic bidding document comparison | |
US20180018312A1 (en) | System and method for monitoring electronic documents | |
CN110362702B (en) | Picture management method and equipment | |
CN108470065B (en) | Method and device for determining abnormal comment text | |
US8577814B1 (en) | System and method for genetic creation of a rule set for duplicate detection | |
JP7170689B2 (en) | Output device, output method and output program | |
CN113327132A (en) | Multimedia recommendation method, device, equipment and storage medium | |
WO2021055868A1 (en) | Associating user-provided content items to interest nodes | |
EP3430540A1 (en) | System and method for automatically generating reporting data based on electronic documents | |
CN116049358A (en) | Invoice information approximation degree detection method, storage medium and computer equipment | |
CN114996579A (en) | Information pushing method and device, electronic equipment and computer readable medium | |
CN114169928A (en) | Novel store sales management method, system, equipment and readable storage medium | |
CN113127597A (en) | Processing method and device for search information and electronic equipment | |
US10387561B2 (en) | System and method for obtaining reissues of electronic documents lacking required data | |
CN113342969A (en) | Data processing method and device | |
CN110895564A (en) | Potential customer data processing method and device | |
CN112199578B (en) | Information processing method and apparatus, electronic device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |