CN116844171A - Large data identification model method and device for virtual open value-added tax invoice - Google Patents

Large data identification model method and device for virtual open value-added tax invoice Download PDF

Info

Publication number
CN116844171A
CN116844171A CN202310733840.8A CN202310733840A CN116844171A CN 116844171 A CN116844171 A CN 116844171A CN 202310733840 A CN202310733840 A CN 202310733840A CN 116844171 A CN116844171 A CN 116844171A
Authority
CN
China
Prior art keywords
feature
behavior
data
characteristic
dimension model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310733840.8A
Other languages
Chinese (zh)
Inventor
王骏涛
徐斌
周雨
董建军
郭莉
石薇
杨琰
潘爱平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Big Data Industry Development Co ltd
Original Assignee
Wuhan Big Data Industry Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Big Data Industry Development Co ltd filed Critical Wuhan Big Data Industry Development Co ltd
Priority to CN202310733840.8A priority Critical patent/CN116844171A/en
Publication of CN116844171A publication Critical patent/CN116844171A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a device for identifying big data of a virtual-open value-added tax invoice, wherein the method comprises the following steps: acquiring historical data of a value-added tax case and information data of an enterprise currently issuing a value-added tax invoice; creating a feature dimension model, and determining a feature weight set of the behavior features in the feature dimension model based on historical data; carrying out feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature; and determining abnormal events in the behavior characteristics of the enterprise according to the abnormal score set. In the embodiment of the invention, the behavior characteristics and the characteristic weight set which are more in line with the real situation can be obtained by creating the characteristic dimension model and processing the characteristic dimension model through the history data, so that the problem that the virtual value-added tax invoice type problem is difficult to effectively prevent and strike by a clue manual investigation mode in the prior art is solved.

Description

Large data identification model method and device for virtual open value-added tax invoice
Technical Field
The invention relates to the technical field of virtual-open value-added tax invoices, in particular to a method and a device for identifying a model of big data of the virtual-open value-added tax invoice.
Background
Special attention has long been paid to value-added tax invoices due to their special status. The virtual open value-added tax common invoice has great economic benefit, and the case of the virtual open large-amount common invoice is increased sharply. At present, aiming at abnormal events such as virtual tax invoice, related personnel are difficult to effectively prevent and strike.
Therefore, a large data identification model method for virtually issuing value-added tax invoices is urgently needed, and a pre-early warning pushing strategy is implemented on the abnormal events in a characteristic analysis and personnel prediction mode. Therefore, the occurrence of abnormal events can be prevented, timely discovery and timely striking can be realized, and tax loss is reduced.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method and a device for identifying big data of a virtual tax invoice, which are used for solving the technical problems in the prior art that the virtual tax invoice is difficult to be effectively prevented and beaten by means of clue manual investigation.
In one aspect, the invention provides a method for identifying big data of a virtual open value-added tax invoice, which comprises the following steps:
acquiring historical data of a value-added tax case and information data of an enterprise currently issuing a value-added tax invoice;
creating a feature dimension model, and determining a feature weight set of the behavior features in the feature dimension model based on the historical data;
performing feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature;
and determining abnormal events of the enterprise in the behavior characteristics according to the abnormal score set.
In some possible implementations, the performing feature analysis on the information data based on the feature dimension model according to the feature weight set to obtain an abnormal score set of the behavior feature includes:
performing feature analysis on the information data based on the feature dimension model to obtain a feature result set of the enterprise;
and calculating the characteristic result set and the characteristic weight set according to a calculation formula corresponding to the behavior characteristic to obtain an abnormal score set of the behavior characteristic.
In some possible implementations, the behavioral characteristics include at least one of: : regional behavior features, repetitive behavior features, abnormal behavior features, keyword features, and enterprise behavior features.
In some possible implementations, the creating a feature dimension model, determining a feature weight set of a feature in the feature dimension model based on the historical data, includes:
creating a feature dimension model comprising the behavioral features;
analyzing the behavior characteristics of the characteristic dimension model according to the historical data to obtain first behavior characteristics;
and calculating the historical data and the first behavior feature according to a machine learning algorithm to obtain a feature weight set of the first behavior feature.
In some possible implementations, the information data includes at least one of: abnormal event high-risk area data, telecom card opening data, key personnel data, ticket data, electronic fence data, telephone inquiry data, posting data and hotel check-in information.
In some possible implementations, the calculation formula corresponding to the behavior feature is:
wherein T is n P being the feature judgment result of the nth feature in the feature result set n And the feature weight of the nth feature in the feature weight set is the feature weight of the nth feature.
In some possible implementations, after determining the abnormal event of the enterprise in the behavior feature according to the abnormal score set, the method further includes:
when the enterprise is abnormal, analyzing the behavior characteristics of the characteristic dimension model according to the information data of the enterprise to obtain second behavior characteristics;
calculating the information data and the second behavior characteristics according to a machine learning algorithm to obtain a characteristic weight set of the second behavior characteristics;
and optimizing the feature dimension model according to the second behavior feature and the feature weight set of the second behavior feature.
On the other hand, the invention also provides a big data identification model device of the virtual value-added tax invoice, which comprises the following steps:
the data acquisition module is used for acquiring historical data of the value-added tax case and information data of an enterprise currently issuing the value-added tax invoice;
the model creation module is used for creating a characteristic dimension model and determining a characteristic weight set of the behavior characteristics in the characteristic dimension model based on the historical data;
the feature analysis module is used for carrying out feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature;
and the abnormality determining module is used for determining abnormal events of the enterprise in the behavior characteristics according to the abnormal score set.
Correspondingly, the embodiment of the invention discloses an electronic device, which comprises: the system comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the steps of the big data identification model method embodiment of the virtual tax invoice when being executed by the processor.
Correspondingly, the embodiment of the invention discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the steps of the big data identification model method embodiment of the virtual tax invoice when being executed by a processor.
The beneficial effects of adopting the embodiment are as follows: according to the large data identification model method for the virtual value-added tax invoice, which is provided by the invention, the characteristic dimension model is created to perform characteristic analysis on the information data, so that a worker can be helped to obtain characteristic analysis data more quickly and accurately, and abnormal events are determined through the characteristic analysis data, so that the working efficiency of the worker is improved. Further, the characteristic dimension model is processed through the historical data, so that the behavior characteristics and the characteristic weight set which are more in line with the real situation can be obtained, and the accuracy of identifying the tax invoice with the virtual increment value by the characteristic dimension model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an embodiment of a method for identifying big data of a virtual added tax invoice provided by the invention;
FIG. 2 is a schematic structural diagram of an embodiment of a feature dimension model provided by the present invention;
FIG. 3 is a schematic diagram of an embodiment of a large data identification model device for a virtual added tax invoice provided by the invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor systems and/or microcontroller systems.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The embodiment of the invention provides a method and a device for identifying big data of a virtual-open value-added tax invoice, which are respectively described below.
FIG. 1 is a flow chart of an embodiment of a method for identifying big data of a virtual added tax invoice according to the present invention, as shown in FIG. 1, the method for identifying big data of a virtual added tax invoice includes:
s101, acquiring historical data of a value-added tax case and information data of an enterprise currently issuing a value-added tax invoice;
s102, creating a feature dimension model, and determining a feature weight set of the behavior features in the feature dimension model based on historical data;
s103, carrying out feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature;
s104, determining abnormal events in the behavior characteristics of the enterprise according to the abnormal score set.
Compared with the prior art, the large data identification model method for the virtual value-added tax invoice provided by the invention has the advantages that the characteristic dimension model is created to perform characteristic analysis on the information data, so that a worker can be helped to obtain the characteristic analysis data more quickly and accurately, and abnormal events are determined through the characteristic analysis data, so that the working efficiency of the worker is improved. Further, the characteristic dimension model is processed through the historical data, so that the behavior characteristics and the characteristic weight set which are more in line with the real situation can be obtained, and the accuracy of identifying the tax invoice with the virtual increment value by the characteristic dimension model is improved.
It should be understood that: the history data of the value added tax case acquired in step S101 may be information data of an enterprise in the value added tax case that has occurred.
It should be noted that, in order to more accurately determine the enterprise, the acquiring information data of the enterprise in step S101 may include at least one of the following: abnormal event high-risk area data, telecom card opening data, key personnel data, ticket data, electronic fence data, telephone inquiry data, posting data and hotel check-in information.
In particular embodiments of the present invention, the data for the high-risk region of the abnormal event may include time, case type, high-risk region administration code, high-risk region, description, etc.
The mobile communication card data may include customer name, customer address, time of opening an account, mailbox, account name, account number, cell phone number, billing address, IMSI, account name, account number, cell phone number, product status, downtime, product installation address, etc.
The accent person data may include time, name, accent person type, historical abnormal event information, and the like.
The call ticket data may include time, number of times of communication, latest time of communication, earliest time of communication, type of communication, calling number home location, called number, calling number, etc.
The electronic fence data can comprise track types, mobile phone IMSI identification, GPS longitude and latitude information, acquisition time and the like.
The telephone query data may include time, number of communications, latest communications time, earliest communications time, communications type, calling number home, called number, calling number, etc.
The shipping data may include order number, package identifier, total volume of goods, total weight of goods, price of goods, number of goods, type of goods, name of goods, amount of goods to be received, type of order service, detailed address of receiver, telephone of receiver, postal code of receiver, detailed address of receiver, name of receiver, telephone of receiver, postal code of receiver, contact of receiver, name of receiver, time of warehouse in, batch number, description of goods, time of arrival expected, time of warehouse in, contact of receiver, time of receipt in, time of receiver in receiver, and so on.
Hotel check-in information may include check-in time, check-in person phone number, check-in room, check-in hotel, address, check-in time, check-in person, contact phone number, check-in status, check-in time, check-out time, remarks, etc.
The embodiment of the invention can carry out strict examination on enterprises by collecting enterprise data in all aspects, and effectively prevent the problems of virtual value-added tax invoices.
In some embodiments of the present invention, step S102 includes:
creating a feature dimension model comprising behavioral features;
analyzing the behavior characteristics of the characteristic dimension model according to the historical data to obtain first behavior characteristics;
and calculating the historical data and the first behavior feature according to a machine learning algorithm to obtain a feature weight set of the first behavior feature.
According to the embodiment of the invention, the historical data is analyzed through the characteristic dimension model, the first behavior characteristics included in the historical data can be obtained, the historical data and the first behavior characteristics are calculated according to the machine learning algorithm, and the characteristic weight set of the first behavior characteristics can be obtained, so that the characteristic dimension model can be used for datamation of the first behavior characteristics to obtain the characteristic weight set, the information data is processed according to the characteristic weight set, and the accuracy of identifying the virtual tax invoice is improved.
In some embodiments of the present invention, the behavioral characteristics in step S102 may include at least one of: regional behavior features, repetitive behavior features, abnormal behavior features, keyword features, and enterprise behavior features.
In a specific embodiment of the invention, the regional behavior features can comprise features such as a foreign person company opening feature, a high-risk regional person company opening feature and the like, the repeated behavior features can comprise features such as a person repeatedly opening a company in a short time, a same place registration multi-company feature, an enterprise legal person and financial coincidence feature enterprise legal person and tax personnel coincidence feature and the like, the abnormal behavior features can comprise features such as a key person library coincidence feature, a ticket dialing behavior abnormality feature and a delivery communication behavior abnormality feature, the key word features can comprise features such as an enterprise registration name and business key word coincidence feature, a delivery key word feature abnormality feature and a short message content key word extraction feature, and the enterprise behavior features can comprise features such as a new opening phone number and a bank card registration company feature, a feature reaching a home in five days before registering a company, a recent foot drop point and office place coincidence feature, a plurality of company opening persons gathering features in the same place in a short time, and a hotel in-same place abnormal feature and the like.
In a specific embodiment of the present invention, as shown in fig. 2, the feature dimension model may include a virtual open value-added tax feature analysis unit and a sample library machine learning unit, where the virtual open value-added tax feature analysis unit may include algorithms corresponding to all behavior features, such as: foreign person company opening feature algorithm, high risk area person company opening feature algorithm, person company opening feature algorithm for many times in short time, same location registration multi-company feature algorithm, enterprise legal person and financial coincidence feature algorithm, enterprise legal person and tax person coincidence feature algorithm, new person and bank card registration company feature algorithm, key person library coincidence feature algorithm, ticket dialing behavior abnormal feature algorithm, delivery communication behavior abnormal feature algorithm, enterprise registration name and business keyword coincidence feature algorithm, delivery keyword feature abnormal feature algorithm, short message content keyword extraction feature algorithm, registration company arrival feature algorithm in the first five days, recent foot drop point and office location coincidence feature algorithm, and hotel check-in abnormal feature algorithm collected by a plurality of company opening persons in the same location in short time.
The library machine learning unit can comprise sample data input, sample data analysis and characteristic parameter optimization, wherein the historical data can be sample data, after a characteristic dimension model is created, the historical data can be input into the characteristic dimension model, a sample data input step is carried out, the historical data is analyzed through a sample data analysis step to obtain a first behavior characteristic, a characteristic weight set is obtained through a characteristic parameter optimization step, finally, a virtual value-added tax characteristic analysis unit for sending the first behavior characteristic and the characteristic weight set is used for optimizing the behavior characteristic and the characteristic weight corresponding to the behavior characteristic in the virtual value-added tax characteristic analysis unit, so that the information data can be processed through the optimized virtual value-added tax characteristic analysis unit in a subsequent step to obtain more accurate characteristic data.
In some embodiments of the present invention, step S103 includes:
performing feature analysis on the information data based on the feature dimension model to obtain a feature result set of the enterprise;
and calculating the feature result set and the feature weight set according to a calculation formula corresponding to the behavior feature to obtain an abnormal score set of the behavior feature.
In a specific embodiment of the present invention, the feature result set may be a set of results obtained by comparing the information data with each behavior feature in the feature dimension model, for example: the feature dimension model judges the registering company personnel home position according to registering company personnel information, screens foreign country household personnel, if the registering company personnel is a local personnel, the foreign country personnel is not in line with the feature of the foreign country personnel, if the registering company personnel is a foreign country personnel, the foreign country personnel is in line with the feature of the foreign country personnel, so that a result set of the behavior features in the feature dimension model is obtained, and the feature result set and the feature weight set can be calculated according to a calculation formula to obtain an abnormal score of each behavior feature, so that an abnormal score set is obtained.
In some embodiments of the present invention, the calculation formula corresponding to the behavior feature is shown in formula (1):
wherein T is n And Pn is the feature weight of the nth feature in the feature weight set as the feature judgment result of the nth feature in the feature result set.
In a specific embodiment of the present invention, the behavioral characteristics and the characteristic weights may be calculated according to the formula (1) to obtain each abnormal score, thereby obtaining an abnormal score set of the behavioral characteristics.
Through sample analysis of historical cases, about 23% of cases occur in the foreign people, and as the regional centralized characteristic exists in the virtual tax invoice, the foreign people are taken as one of the analyzed behavior characteristics, the weight P1 of the characteristic T1 of the foreign people for opening the company can be set, the T1 of the local people can be set to 0, and the T1 of the foreign people can be set to 100, so that calculation can be performed according to the weights P1 and T1, wherein the characteristic T1 and the weight P1 can be set according to actual conditions, and the embodiment of the invention is not limited herein.
Through sample analysis of historical cases, about 15% of cases are found to occur in high-risk area personnel in China, due to the regional centralized characteristic of the virtual open value tax invoice, the high-risk area personnel open company can be used as one of the analyzed behavior characteristics, the weight P2 of the foreign personnel open company characteristic T2 can be set, the T2 of the high-risk area personnel belonging to the identity information can be set as 100, and the other T2 is set as 0, so that calculation can be performed according to the weights P2 and T2, wherein the characteristics T2 and the weight P2 can be set according to actual conditions, and the embodiment of the invention is not limited in this regard.
The situation that the registered personnel open the company in a certain time range can be counted through a model, a list of people with more company opening in a short time can be screened out according to descending order of the number of company opening, the weight P3 of the characteristic T3 of one person opening the company for many times in a short time can be set, T3 can be determined through judging the number N of the company opening by a single person in a short time, for example, when more than N1 company is opened by a single person in 30 days, and N1> =5, T3 is 100,3< = N1<5, T3 takes a value of 100 x N1/5, and therefore calculation can be carried out according to the weights P3 and T3, wherein the characteristic T3, the weight P3 and the weight N1 can be set according to practical conditions, and the embodiment of the invention is not limited.
The registering addresses of the companies can be analyzed through the model, if a registering address is multiple companies, the registering personnel have problems, so that the weight P4 of the characteristic T4 of registering multiple companies at the same place can be set, more than N2 company personnel registered at the same office place can be judged, when N2> =3, T4 is 100, N2<3, T4 takes a value of 0, and therefore calculation can be carried out according to the weights P4 and T4, wherein the characteristic T4, the weights P4 and N2 can be set according to actual conditions, and the embodiment of the invention is not limited herein.
The tax information can be analyzed through the model, if the coincidence condition of the enterprise registration legal person and the financial staff exists, the registration staff has a problem, the weight P5 of the enterprise legal person and the financial coincidence characteristic T5 can be set, whether coincidence is carried out or not can be judged through the identification card of the analysis legal person and the financial staff, the value of T5 is 100 when coincidence is carried out, otherwise, the value of T5 is 0, and therefore calculation can be carried out according to the weights P5 and T5, wherein the characteristic T5 and the weight P5 can be set according to the actual condition, and the embodiment of the invention is not limited.
The model can also analyze through tax information, if the registration legal person and tax personnel of the enterprise have a superposition condition, the registration personnel has a problem, the weight P6 of the superposition characteristic T6 of the legal person and tax personnel of the enterprise can be set, when the registration legal person and tax personnel of the enterprise have a superposition, the T6 takes a value of 100, otherwise, the value is 0, so that the calculation can be performed according to the weights P6 and T6, wherein the characteristic T6 and the weight P6 can be set according to the actual condition, and the embodiment of the invention is not limited herein.
The call ticket data of key personnel related to invoices can be analyzed and counted through a model, people with more call records are screened to enter a key personnel library, then a registered personnel list is compared with the personnel in the key personnel library, repeated personnel are screened, the weight P7 of the key personnel library overlapping characteristic T7 can be set, whether the close call records exist between the key personnel and the key personnel in the last N3 days (30 days for example) can be judged, if the T7 of the key personnel is 100, otherwise, the T7 of the key personnel is 0, and therefore calculation can be carried out according to the weights P7 and T7, wherein the characteristic T7, the weight P7 and the weight N3 can be set according to practical conditions, and the embodiment of the invention is not limited.
The call ticket data can be analyzed through the model, if the situation of frequently dialing the foreign numbers exists, the registrant has a problem, the weight P8 of the call ticket dialing behavior abnormal characteristic T8 can be set, the target call ticket data can be analyzed, and the numbers dialed in the foreign places can be screened out; counting the number C1 of frequently dialing different places in the last N4 days (for example, 30 he), counting the total number C of the main dialing times in the last N4 days, judging the duty ratio of the telephone of the dialing outside places in the last N4 days, and when the duty ratio of C1/C is more than 50%, taking the value of T8 as 100, so that the calculation can be performed according to the weights P8 and T8, wherein the characteristics T8, the weights P8 and N4 can be set according to the actual situation, and the embodiment of the invention is not limited herein.
Since most of the personnel who send out the invoice virtually are mailing express delivery, but not receiving express delivery, whether abnormal sending data exist or not can be judged through sending behaviors, the model is used for analyzing sending data through aggregation of legal persons, company contacts and tax persons, if sending is far larger than receiving, the registrant has problems, the weight P9 of the abnormal feature T9 of the communication behavior can be sent, data statistics can be carried out by using a legal person, company contacts and tax person as a collection, sending quantity C1 in the last N5 days (for example, 100 days) of the sending data statistics collection, sending quantity C in the last N5 days of the sending data statistics collection is taken as 100 when the C1/C ratio is larger than 95%, calculation can be carried out according to the weights P9 and T9, wherein the feature T9, the weights P9 and N5 can be set according to practical conditions, and the embodiment of the invention is not limited.
The registration name and registration operation range of a company can be analyzed through a model, if the registration name contains special keywords such as textile, clothing and the like, then the registrant has problems, the weight P10 of the superposition characteristic T10 of the registration name and the business keyword of the enterprise can be set, the registration keyword TOPn1 can be counted through the sample learning of registration information of a tax history case during data initialization, the keyword precision is improved through the continuous sample learning of a formed case at the later stage, the keyword set [ K1, K2...Kn1] is obtained, the hit times of the keyword set K is carried out on the information of the enterprise, when the keyword is hit, the T10 value is 100, otherwise, the value is 0, so that the calculation can be carried out according to the weights P10 and T10, wherein the characteristics T10 and the weights P10 and n1 can be set according to actual conditions, and the embodiment of the invention is not limited.
The posting data can be analyzed through a model, the documents and invoice keywords are filtered, the weight P11 of the posting keyword characteristic abnormal characteristic T11 can be set, the keyword TOPn2 can be counted through the Fenix of the tax related history case express names when the data is initialized, the keyword precision is improved through continuous sample learning of the finished cases in the later period, the keyword set [ K1, K2...Kn2] is obtained, the mailing keywords of enterprises are analyzed, whether the occurrence number N6 of the keyword K is hit or not is judged, when the occurrence number N6 is more than 3, the T11 is 100, otherwise, the weight P11 and the T11 can be calculated, wherein the characteristics T11, the weights P11, N2 and N6 can be set according to actual conditions, and the embodiment of the invention is not limited.
A keyword matching database can be established, the short message content with the suspicious word is used as the suspicious short message through regular matching, the database is continuously and automatically updated, the weight P12 of the keyword extraction feature T12 of the short message content can be set, the keyword TOPn3 can be counted through analysis of the tax related history case registration mobile phone short message during data initialization, the keyword precision is improved through continuous sample learning of a formed case at the later stage, keyword sets [ K1, K2...Kn3] are obtained, the short message analysis of enterprises is carried out, whether the occurrence number N7 of the keyword K is hit or not is judged, when the occurrence number N7>10, T12 is 100, otherwise, the occurrence number N12 is 0, and therefore calculation can be carried out according to the weights P12 and T12, the weights P12, N3 and N7 can be set according to practical conditions.
The mobile phone number and the bank card number reserved by the registrant during the registration of the company can be focused through the model, the time difference between the mobile phone number card opening time and the time difference between the bank card opening time and the registration time can be compared, a personnel list with the time difference smaller than N8 days can be screened, the weight P13 of the new mobile phone number and the characteristic T13 of the bank card registration company can be set, the time of the card opening and the registration company can be compared, the value of the T13 is 100 when the time of the card opening and the registration company is smaller than N8 days (for example, 5 days), otherwise, the value of the T13 is 0, and therefore calculation can be carried out according to the weights P13 and T13, wherein the characteristic T13, the weight P13 and the weight N8 can be set according to actual conditions.
The mobile phone number of the registrant can be compared with the increment mobile phone in the range of the home city in five days through a model, the personnel in the home city just appearing in N9 days (for example, 5 days) can be screened, the weight P14 reaching the characteristic T14 of the home city in N9 days before the registration company can be set, whether the personnel reaching the home city in N9 days is judged, the personnel reaching the registration company is T14 is 100, or else the personnel reaching the registration company is 0, so that calculation can be performed according to the weights P14 and T14, wherein the characteristic T14, the weights P14 and N9 can be set according to actual conditions, and the embodiment of the invention is not limited.
The recent track can be analyzed through the model, the foot drop point of the problematic person is analyzed, if the office place position is not filled in the foot drop point, the registrant has a problem, the weight P15 of the coincidence characteristic T15 of the recent foot drop point and the office place can be set, the office place position information can be calculated through the system through the recent track, the mobile phone signal of the person can be acquired through the system, the position information of the target which stays for more than one hour last N10 days (for example, 30 days) is acquired, the data are compared, if the position record of the target every day is carried out, T15 is 100, and the other is 0, so that the calculation can be carried out according to the weights P15 and T15, wherein the characteristics T15, the weights P15, N10 and the percentages can be set according to the actual situation, and the embodiment of the invention is not limited.
The method comprises the steps that the activity tracks of personnel of a company in a certain time range can be analyzed through a model, personnel with the activity tracks are screened out, wherein the time range of a registered company and the time range of the activity tracks are respectively adjustable parameters, the weight P16 of the characteristic T16 of a plurality of personnel of the company in the same place in a short time can be set, the data of the company in the last N11 days is analyzed, and an enterprise legal personnel library set is constructed; the electronic fence and telephone inquiry data of the last N12 days are analyzed, the information of people meeting the people is searched, the situation that people gather at the same position of a legal person every day is calculated, people with more than N13 times are hit respectively, enterprise legal persons with accompanying relations can be hit respectively, the number of times of people is greater than N13, T16 is 100, the other people are 0, and accordingly calculation can be carried out according to weights P16 and T16, wherein characteristics T16, weights P16, N11, N12 and N13 can be set according to actual conditions, and the embodiment of the invention is not limited herein.
Since tax-related companies are typically foreign, hotel check-in data can be analyzed by the model. If the corporate and employee gathering hotel check-in and check-out features exist, the registrant has problems, the weight P17 of the store check-in abnormal feature T17 can be set, statistics is carried out on the corporate in the last N14 days (for example, 100 days), and a corporate database data set is established; the number of times of checking in hotels is counted by starting the hotel check-in data to date N15 days (for example, 5 days) before the corporate registration of legal persons, the number of times of checking in hotels is greater than N15 days, T17 is 100, and the other times are 0 (local long-term staff generally check in hotels rarely), so that the calculation can be performed according to weights P17 and T17, wherein the characteristics T17, weights P17, N14 and N15 can be set according to practical situations, and embodiments of the present invention are not limited herein.
According to the embodiment of the invention, the abnormal score of each behavior feature can be obtained by calculating the feature results and the feature weights of different behavior features, so that the abnormal score set of the behavior feature is obtained, and a worker can determine the abnormal event of an enterprise according to the abnormal scores in the abnormal score set.
In some embodiments of the present invention, after step S104, further comprising:
when the enterprise is abnormal, analyzing the behavior characteristics of the characteristic dimension model according to the information data of the enterprise to obtain second behavior characteristics;
calculating the information data and the second behavior characteristics according to a machine learning algorithm to obtain a characteristic weight set of the second behavior characteristics;
and optimizing the feature dimension model according to the second behavior feature and the feature weight set of the second behavior feature.
After analyzing the behavior characteristics of the information data of the enterprise, if the information data has an abnormality, a worker may introduce the information data as sample data into a sample library machine learning unit of the feature dimension model, so that the sample library machine learning unit may analyze the information data to obtain a second behavior characteristic, thereby obtaining a feature weight set of the second behavior characteristic through a machine learning algorithm, and optimizing an opening value-added tax feature analysis unit of the feature dimension model through the second behavior characteristic and the feature weight set of the second behavior characteristic, where the second behavior characteristic may include a behavior characteristic not included in the first behavior characteristic.
According to the embodiment of the invention, the characteristic dimension model can accurately identify the virtual tax invoice by carrying out real-time optimization and updating on the characteristic dimension model, so that the accuracy of the characteristic dimension model is improved.
In order to better implement the method for identifying the big data of the virtual-added tax invoice in the embodiment of the invention, correspondingly, the embodiment of the invention also provides a device for identifying the big data of the virtual-added tax invoice based on the method for identifying the big data of the virtual-added tax invoice, as shown in fig. 3, the device for identifying the big data of the virtual-added tax invoice comprises:
the data acquisition module 201 is configured to acquire historical data of a value-added tax case and information data of an enterprise who currently issues a value-added tax invoice;
a model creation module 202, configured to create a feature dimension model, and determine a feature weight set of a feature in the feature dimension model based on the history data;
the feature analysis module 203 is configured to perform feature analysis on the information data according to the feature weight set based on the feature dimension model, so as to obtain an abnormal score set of the behavior feature;
an anomaly determination module 204 is configured to determine, according to the anomaly score set, an anomaly event for the enterprise in the behavioral characteristic.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
As shown in fig. 4, the invention also provides a big data identification model device 1000 for the virtual value-added tax invoice. The dummy value-added tax invoice big data identification model device 1000 comprises a processor 1001, a memory 1002 and a display 1003. Fig. 4 shows only some of the components of the big data identification model apparatus 1000 for a virtual value added tax invoice, but it should be understood that not all of the illustrated components are required to be implemented, and more or fewer components may alternatively be implemented.
The memory 1002 may in some embodiments be an internal storage unit of the large data identification model device 1000 for the virtual tax invoice, for example a hard disk or a memory of the large data identification model device 1000 for the virtual tax invoice. The memory 1002 may also be an external storage device of the large data identification model device 1000 for the virtual tax invoice, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like provided on the large data identification model device 1000 for the virtual tax invoice.
Further, the memory 1002 may also include both an internal storage unit and an external storage device of the big data identification model device 1000 for the virtual tax invoice. The memory 1002 is used for storing application software and various data of the large data identification model device 1000 for installing the virtual tax invoice.
The processor 1001 may be, in some embodiments, a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 1002, such as the large data identification model method of the virtual tax invoice in the present invention.
The display 1003 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like in some embodiments. The display 1003 is used to display information on the big data identification model device 1000 of the virtual tax invoice and to display a visual user interface. The components 1001-1003 of the virtual added tax invoice big data identification model apparatus 1000 communicate with each other via a system bus.
In some embodiments of the present invention, when the processor 1001 executes the big data identification model program of the virtual added tax invoice in the memory 1002, the following steps may be implemented:
acquiring historical data of a value-added tax case and information data of an enterprise currently issuing a value-added tax invoice;
creating a feature dimension model, and determining a feature weight set of the behavior features in the feature dimension model based on the historical data;
performing feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature;
and determining abnormal events of the enterprise in the behavior characteristics according to the abnormal score set.
It should be understood that: the processor 1001 may perform other functions in addition to the above functions when executing the large data identification model program of the virtual added tax invoice in the memory 1002, and in particular, reference may be made to the description of the corresponding method embodiments above.
Further, the type of the above-mentioned large data identification model device 1000 of the virtual-open value-added tax invoice is not particularly limited, and the large data identification model device 1000 of the virtual-open value-added tax invoice can be a portable large data identification model device of the virtual-open value-added tax invoice such as a mobile phone, a tablet computer, a personal digital assistant (personal digitalassistant, PDA), a wearable device, a laptop (laptop) and the like. Exemplary embodiments of the portable virtual tax invoice big data identification model apparatus include, but are not limited to, portable virtual tax invoice big data identification model apparatus hosting IOS, android, microsoft or other operating systems. The above-mentioned big data identification model device of the portable virtual tax invoice can also be other big data identification model devices of the portable virtual tax invoice, such as a laptop computer (laptop) with a touch-sensitive surface (e.g. touch panel), etc. It should also be appreciated that in other embodiments of the present invention, the large data identification model device 1000 of the virtual tax invoice may be a desktop computer having a touch-sensitive surface (e.g., touch panel) instead of a portable large data identification model device of the virtual tax invoice.
The embodiment of the invention also provides a computer readable storage medium, the computer readable storage medium stores a computer program, when the computer program is executed by a processor, the processes of the big data identification model method embodiment of the virtual tax invoice can be realized, the same technical effect can be achieved, and the repetition is avoided, and the description is omitted here.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program stored in a computer readable storage medium to instruct related hardware (e.g., a processor, a controller, etc.). The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The method and the device for identifying the big data of the virtual tax invoice provided by the invention are described in detail, and specific examples are applied to the description of the principle and the implementation mode of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims (10)

1. A method for identifying a model by using big data of a virtual tax invoice is characterized by comprising the following steps:
acquiring historical data of a value-added tax case and information data of an enterprise currently issuing a value-added tax invoice;
creating a feature dimension model, and determining a feature weight set of the behavior features in the feature dimension model based on the historical data;
performing feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature;
and determining abnormal events of the enterprise in the behavior characteristics according to the abnormal score set.
2. The method for identifying big data of a virtual tax invoice according to claim 1, wherein the feature analysis is performed on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature, and the method comprises the following steps:
performing feature analysis on the information data based on the feature dimension model to obtain a feature result set of the enterprise;
and calculating the characteristic result set and the characteristic weight set according to a calculation formula corresponding to the behavior characteristic to obtain an abnormal score set of the behavior characteristic.
3. The method of claim 1, wherein the behavioral characteristics include at least one of: regional behavior features, repetitive behavior features, abnormal behavior features, keyword features, and enterprise behavior features.
4. The method of claim 1, wherein creating a feature dimension model, determining a feature weight set for a feature in the feature dimension model based on the historical data, comprises:
creating a feature dimension model comprising the behavioral features;
analyzing the behavior characteristics of the characteristic dimension model according to the historical data to obtain first behavior characteristics;
and calculating the historical data and the first behavior feature according to a machine learning algorithm to obtain a feature weight set of the first behavior feature.
5. The method of claim 1, wherein the information data comprises at least one of: abnormal event high-risk area data, telecom card opening data, key personnel data, ticket data, electronic fence data, telephone inquiry data, posting data and hotel check-in information.
6. The method for identifying big data of virtual tax invoice according to claim 2, wherein the calculation formula corresponding to the behavior feature is:
wherein T is n Feature determination for the nth feature in the feature result setAs a result, P n And the feature weight of the nth feature in the feature weight set is the feature weight of the nth feature.
7. The method of claim 4, wherein after determining an abnormal event of the business in the behavioral characteristics according to the abnormal score set, further comprising:
when the enterprise is abnormal, analyzing the behavior characteristics of the characteristic dimension model according to the information data of the enterprise to obtain second behavior characteristics;
calculating the information data and the second behavior characteristics according to a machine learning algorithm to obtain a characteristic weight set of the second behavior characteristics;
and optimizing the feature dimension model according to the second behavior feature and the feature weight set of the second behavior feature.
8. The utility model provides a big data recognition model device of virtual value-added tax invoice which characterized in that includes:
the data acquisition module is used for acquiring historical data of the value-added tax case and information data of an enterprise currently issuing the value-added tax invoice;
the model creation module is used for creating a characteristic dimension model and determining a characteristic weight set of the behavior characteristics in the characteristic dimension model based on the historical data;
the feature analysis module is used for carrying out feature analysis on the information data according to the feature weight set based on the feature dimension model to obtain an abnormal score set of the behavior feature;
and the abnormality determining module is used for determining abnormal events of the enterprise in the behavior characteristics according to the abnormal score set.
9. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor, performs the steps of the method of the big data identification model of a virtual added tax invoice as claimed in any one of claims 1 to 7.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of the big data identification model method of a virtual added tax invoice according to any one of claims 1-7.
CN202310733840.8A 2023-06-19 2023-06-19 Large data identification model method and device for virtual open value-added tax invoice Pending CN116844171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310733840.8A CN116844171A (en) 2023-06-19 2023-06-19 Large data identification model method and device for virtual open value-added tax invoice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310733840.8A CN116844171A (en) 2023-06-19 2023-06-19 Large data identification model method and device for virtual open value-added tax invoice

Publications (1)

Publication Number Publication Date
CN116844171A true CN116844171A (en) 2023-10-03

Family

ID=88160978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310733840.8A Pending CN116844171A (en) 2023-06-19 2023-06-19 Large data identification model method and device for virtual open value-added tax invoice

Country Status (1)

Country Link
CN (1) CN116844171A (en)

Similar Documents

Publication Publication Date Title
CN106384273B (en) Malicious bill-swiping detection system and method
US20030153299A1 (en) Event manager for use in fraud detection
US20070073617A1 (en) System and method for evaluation of money transfer patterns
CN102622552A (en) Detection method and detection system for fraud access to business to business (B2B) platform based on data mining
CN103077344A (en) Terminal and method for providing risk of application using the same
WO2019041774A1 (en) Customer information screening method and apparatus, electronic device, and medium
CN110532461B (en) Information platform pushing method and device, computer equipment and storage medium
CN111429073A (en) Express receipt method, device, equipment and storage medium
CN114419631A (en) Network management virtual system based on RPA
CN108563706A (en) A kind of collection big data intelligent service system and its operation method
CN111582722B (en) Risk identification method and device, electronic equipment and readable storage medium
US8918422B2 (en) Method and system for using email domains to improve quality of name and postal address matching
CN110502529B (en) Data processing method, device, server and storage medium
CN110348983B (en) Transaction information management method and device, electronic equipment and non-transitory storage medium
CN116844171A (en) Large data identification model method and device for virtual open value-added tax invoice
CN110807702A (en) Method, device, equipment and storage medium for managing information after loan
CN109377391A (en) A kind of tracking of information method, storage medium and server
CN114943479A (en) Risk identification method, device and equipment of business event and computer readable medium
CN109614416A (en) A kind of invoice management method and device based on data statistic analysis
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
CN112084408A (en) List data screening method and device, computer equipment and storage medium
CN112861140A (en) Business data processing method and device and readable storage medium
CN206557835U (en) The express delivery management system that a kind of interior is quickly positioned
CN112581337A (en) Method and terminal for judging whether real population is missed or not
CN111460052A (en) Low-security fund supervision method and system based on supervised data correlation analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination