CN117237130B

CN117237130B - Tax risk data acquisition and monitoring method and system

Info

Publication number: CN117237130B
Application number: CN202311293563.XA
Authority: CN
Inventors: 程爱珺
Original assignee: Guangdong Yuanheng Software Technology Co ltd
Current assignee: Guangdong Yuanheng Software Technology Co ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-06-14
Anticipated expiration: 2043-10-09
Also published as: CN117237130A

Abstract

The invention relates to the technical field of data processing, in particular to a tax risk data acquisition and monitoring method and system, comprising the following steps: obtaining a standard tax data sequence, and carrying out neighborhood analysis on the standard tax data sequence to obtain a standard risk average value of each standard tax data; obtaining the correlation between the projects according to the difference of the standard risk average values among different projects; threshold screening is carried out on the items according to the correlation to obtain the correlation combination of all the items; obtaining the risk degree of each item according to the relevance combination and the relevance; clustering all the items according to the risk degree to obtain a plurality of initial data clusters; obtaining a promotion index according to the initial data cluster; adjusting the project according to the promotion index to obtain a distributed data cluster; and constructing a distributed memory according to the distributed data clusters to realize tax risk data acquisition and monitoring. The invention reduces the waste of storage space and improves the efficiency of the tax system in processing tax risk data.

Description

Tax risk data acquisition and monitoring method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a tax risk data acquisition and monitoring method and system.

Background

For tax authorities, risk-oriented enterprise tax audit is a brand new exploration of important enterprise tax risks by using limited collection and management resources, provides a reference for the enterprise to improve the internal control level and prevent tax risks, and needs to monitor and analyze tax risk data so as to establish comprehensive, whole tax type and whole-industry risk management models. In the process of monitoring and analyzing the tax risk data, the collected tax risk data needs to be stored, but due to the complex characteristics of the tax risk data, the tax risk data has the condition of processing load in the processes of migration, system upgrading and storage, so the tax risk data is generally stored through distributed storage of the tax risk data.

In the prior art, tax risk data are distinguished, the tax risk data are stored mainly by means of distribution characteristics of the tax risk data, access concurrency requirements and risk degrees among the tax risk data are not considered, so that waste of storage space is caused, and the efficiency of a tax system for processing the tax risk data is reduced; therefore, the invention provides a tax risk data acquisition monitoring method and system, which are used for acquiring the risk degree by analyzing tax data of each item, carrying out initial clustering according to the risk degree to obtain an initial data cluster taking the risk grade as a standard, constructing a promotion model for tax data in a low-grade initial data cluster according to access concurrency demand analysis, correcting the initial data cluster to obtain a distributed data cluster, and constructing and storing the distributed data cluster, thereby reducing the waste of storage space and improving the efficiency of a tax system for processing tax risk data.

Disclosure of Invention

The invention provides a tax risk data acquisition and monitoring method and system, which aim to solve the existing problems.

The tax risk data acquisition and monitoring method and system provided by the invention adopt the following technical scheme:

The embodiment of the invention provides a tax risk data acquisition and monitoring method, which comprises the following steps:

Collecting tax data sequences of items and corresponding concurrency frequencies; the tax data sequence comprises a plurality of tax data;

The tax data sequence is subjected to standardization processing to obtain a standard tax data sequence, and neighborhood analysis is carried out on the standard tax data sequence to obtain a standard risk average value of each standard tax data; obtaining the correlation between the projects according to the difference of the standard risk average values among different projects; threshold screening is carried out on the items according to the correlation to obtain the correlation combination of all the items;

Obtaining the risk degree of each item according to the relevance combination and the relevance; clustering all the projects according to the risk degree to obtain a high-risk initial data cluster, a medium-risk initial data cluster, a low-risk initial data cluster and a risk-free initial data cluster;

Obtaining a promotion index of each promotion data cluster of each item promotion according to the high-risk initial data cluster, the medium-risk initial data cluster, the low-risk initial data cluster, the risk-free initial data cluster and the concurrence frequency; adjusting all projects according to the promotion indexes to obtain a plurality of distributed data clusters; and constructing a distributed memory according to the distributed data clusters to realize tax risk data acquisition and monitoring.

Preferably, the neighborhood analysis is performed on the standard tax data sequence to obtain a standard risk average value of each standard tax data, which comprises the following specific methods:

Presetting a neighborhood preset range, marking any one standard tax data in any item as reference tax data, and marking the average value of the standard tax data of the reference tax data in the neighborhood preset range as the standard risk average value of the reference tax data.

Preferably, the correlation between the items is obtained according to the difference of the standard risk average values between the different items, and the specific method comprises the following steps:

；

recording any two items as items respectively With item/>In the above, the ratio of/>Representing item/>With item/>Related correlations of (3); /(I)Indicating the number of moments; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; Representing item/> Standard deviation of standard tax data of (a); /(I)Representing item/>Standard deviation of standard tax data of (a).

Preferably, the threshold screening is performed on the items according to the correlation to obtain the correlation combination of all the items, including the following specific methods:

Marking any item as a marked item, presetting a correlation threshold value, and obtaining all initial item combinations of the marked item;

for any initial item combination of the marked items, if the correlation of the initial item combination is greater than or equal to a correlation threshold, marking the initial item combination as the correlation combination of the marked items; if the relevance of the initial item combination is smaller than the relevance threshold, not performing any processing on the initial item combination.

Preferably, the specific method for acquiring all initial item combinations of the marked items includes:

combining all the items pairwise to obtain a plurality of item combinations; all combinations of items including the tagged items are obtained and noted as initial combinations of items of the tagged items.

Preferably, the risk degree of each item is obtained according to the relevance combination and the relevance, and the specific method comprises the following steps:

Any one item is recorded as an item ; For item/>(1 /)The items/>, are combined by the relevance(1 /)Items/>, within a personal relevance groupAnother item other than that is noted as item/>(1 /)A plurality of associated items;

；

In the method, in the process of the invention, Representing item/>Risk level of (2); /(I)Representing item/>Is a number of associative combinations of (a); /(I)Indicating the number of moments; /(I)Representing item/>(1 /)Correlation relevance of the combination of correlations; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Expressed in item/>(1 /)In the standard tax data sequence of the associated items, the/>Standard risk means for the individual standard tax data; /(I)Expressed in item/>(1 /)In the associated items, the average value of the standard risk average value; /(I)Representing the super parameter.

Preferably, the clustering of all items according to the risk degree obtains a high-risk initial data cluster, a medium-risk initial data cluster, a low-risk initial data cluster and a risk-free initial data cluster, and the specific method includes:

K-Means clustering is carried out on all projects according to the risk degrees to obtain a plurality of clusters, the average value of the risk degrees of the clusters is used for sorting the clusters according to the order from big to small, and the sorted clusters are sequentially marked as high-risk initial data clusters, medium-risk initial data clusters, low-risk initial data clusters and risk-free initial data clusters.

Preferably, the method for obtaining the promotion index of each promotion data cluster of each item promotion according to the high-risk initial data cluster, the medium-risk initial data cluster, the low-risk initial data cluster, the non-risk initial data cluster and the concurrence frequency includes the following specific steps:

Any item is marked as a target item, an initial data cluster to which the target item belongs is marked as a first data cluster of the target item, and an initial data cluster to which the target item can promote is marked as a promotion data cluster of the target item; acquiring a plurality of promotion data clusters of a target item, wherein any promotion data cluster of the target item is selected; the inter-class distance between the first data cluster of the target item and the promotion data cluster is recorded as the promotion requirement of the promotion data cluster of the target item;

the method for calculating the promotion index of the promotion data cluster of the target item comprises the following steps:

；

In the method, in the process of the invention, Promotion indexes of promotion data clusters representing target items; /(I)A promotion requirement of a promotion data cluster representing a target item; /(I)Representing the number of items contained in a promotion data cluster; /(I)Representing the concurrency frequency of the qth item and the target item in the promotion data cluster; /(I)Representing the number of all items; /(I)An exponential function based on a natural constant is represented.

Preferably, the method for adjusting all the items according to the promotion index to obtain a plurality of distributed data clusters includes the following specific steps:

For any item except all items contained in the initial data cluster with high risk, the initial data cluster is adjusted according to the promotion index of the item: if the promotion index of the promotion data cluster corresponding to the item promotion is smaller than the clustering threshold, the item is put into the corresponding promotion data cluster, an initial data cluster which is one level higher than the promotion data cluster is recorded as a new promotion data cluster of the item, and then the promotion index is judged according to the new promotion data cluster of the item; and then the method is analogiced until the promotion index of the promotion data cluster corresponding to the promotion of the project is more than or equal to the clustering threshold value, or the promotion data cluster does not exist in the project, and the project is adjusted; and adjusting all items except all items in the initial data cluster with high risk until all items are adjusted, and marking the initial data cluster after all items are adjusted as a distributed data cluster.

The embodiment of the invention provides a tax risk data acquisition and monitoring system, which comprises a tax data acquisition module, a relevance combination acquisition module, an initial data cluster acquisition module based on a risk level and a tax risk data acquisition and monitoring module, wherein:

the tax data acquisition module acquires tax data sequences of the projects and corresponding concurrence frequencies; the tax data sequence comprises a plurality of tax data;

The relevance combination acquisition module is used for carrying out standardized processing on the tax data sequence to obtain a standard tax data sequence, and carrying out neighborhood analysis on the standard tax data sequence to obtain a standard risk average value of each standard tax data; obtaining the correlation between the projects according to the difference of the standard risk average values among different projects; threshold screening is carried out on the items according to the correlation to obtain the correlation combination of all the items;

The initial data cluster acquisition module based on the risk level acquires the risk degree of each item according to the relevance combination and the relevant relevance; clustering all the projects according to the risk degree to obtain a high-risk initial data cluster, a medium-risk initial data cluster, a low-risk initial data cluster and a risk-free initial data cluster;

The tax risk data acquisition monitoring module acquires a promotion index of each promotion data cluster of each project according to the high-risk initial data cluster, the medium-risk initial data cluster, the low-risk initial data cluster, the risk-free initial data cluster and the concurrence frequency; adjusting all projects according to the promotion indexes to obtain a plurality of distributed data clusters; and constructing a distributed memory according to the distributed data clusters to realize tax risk data acquisition and monitoring.

The technical scheme of the invention has the beneficial effects that: compared with the existing method for acquiring tax data, when the distributed storage is used for improving the storage efficiency, the method does not consider the access concurrency requirement and the risk degree of tax items acquired by acquisition, and when the method is used for analyzing tax risks, the problem of performance waste of a risk analysis system is caused; according to the method, the tax data of each item is analyzed to obtain the risk degree, initial clustering is carried out according to the risk degree to obtain the initial data cluster taking the risk grade as a standard, a promotion model is constructed for tax data in the low-grade initial data cluster according to the access concurrency demand analysis, the initial data cluster is corrected to obtain the distributed data cluster, and the distributed data cluster is used for constructing a distributed memory and storing, so that the waste of storage space is reduced, and the efficiency of a tax system for processing tax risk data is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of a tax risk data collection monitoring method of the present invention;

Fig. 2 is a block diagram of a tax risk data collection and monitoring system according to the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to specific implementation, structure, characteristics and effects of the tax risk data acquisition monitoring method and system according to the invention in combination with the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the tax risk data acquisition and monitoring method and system provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart illustrating steps of a tax risk data collection and monitoring method according to an embodiment of the present invention is shown, the method includes the following steps:

Step S001: tax data of enterprise projects and corresponding concurrency frequencies are collected.

It should be noted that, in the prior art, tax data are mainly stored by means of distribution characteristics of the tax data, access concurrency requirements and risk degrees among the tax data are not considered, so that waste of storage space is caused, and efficiency of processing the tax data by a tax system is reduced; therefore, the embodiment provides a tax risk data acquisition monitoring method, which is used for obtaining the risk degree by analyzing tax data of each item, carrying out initial clustering according to the risk degree to obtain an initial data cluster taking the risk grade as a standard, constructing a promotion model for tax data in a low-grade initial data cluster according to access concurrency demand analysis, correcting the initial data cluster to obtain a distributed data cluster, and constructing and storing the distributed data cluster, thereby reducing the waste of storage space and improving the efficiency of a tax system for processing tax risk data.

Specifically, in order to implement the tax data collection and monitoring method provided in this embodiment, this embodiment is not described with respect to a certain enterprise, and any enterprise is taken as an example, and tax data needs to be collected first, where the tax data includes different types of historical financial statement data, historical asset load table data, profit table data and the like.

Specifically, for a plurality of items of the enterprise, tax data is collected once every minute for one time, and 60 minutes are collected altogether, wherein one tax data corresponding to each item is collected, a sequence of the tax data of each item after the tax data of each item are arranged according to the collection time sequence is recorded as a tax data sequence of each item, wherein the length of the tax data sequence of each item is consistent with the time quantity, and each item corresponds to one tax data, for example: the tax data corresponding to the first item is historical financial statement data, and the tax data corresponding to the second item is historical asset load table data.

Further, in the relational database of the enterprise, the frequency of simultaneous occurrence of any two projects is obtained and recorded as the frequency of simultaneous occurrence of any two projects, for example: item a appears 3 times, item b appears 2 times, and item c appears 1 time; the concurrence frequency of the item a and the item b is 2, the concurrence frequency of the item a and the item c is 1, and the concurrence frequency of the item b and the item c is 1.

Thus, the tax data sequences of a plurality of items and the corresponding concurrence frequency are obtained through the method.

Step S002: obtaining a standard tax data sequence according to the tax data sequence, and carrying out neighborhood analysis on the standard tax data sequence to obtain a plurality of standard risk average values; obtaining the correlation between the projects according to the difference of the standard risk average values among different projects; and carrying out threshold screening on the items according to the correlation to obtain the correlation combination of all the items.

It should be noted that, in this embodiment, the risk degree of the tax data is obtained by analyzing the correlation between the tax data; weighting the risk degree according to the calling frequency of the database to obtain weighted tax data; clustering the weighted tax data to obtain clusters with different risk degrees; and carrying out distributed storage according to a plurality of clustering clusters with different risk degrees.

It should be further noted that, since the tax data of each item is obtained by converting the existing actual tax data through different degrees of operations; and a certain correlation exists between the actual tax data, so that a certain correlation also exists between the tax data of each item, the risk degree of the tax data can be obtained by analyzing the correlation between the tax data, and if the correlation is larger, the interference degree of the tax data is larger, and the corresponding risk degree is larger.

Specifically, taking any item as an example, performing maximum and minimum normalization processing on a tax data sequence of the item, marking each piece of processed tax data as standard tax data of the item, and marking a sequence of a plurality of pieces of standard tax data of the item arranged according to the acquisition time sequence as a standard tax data sequence of the item; and acquiring standard tax data sequences of all the projects.

Further, taking any two items as examples, the two items are respectively recorded as itemsWith item/>Presetting a neighborhood preset range/>Wherein the present embodiment is described as/>To describe the example, the present embodiment is not particularly limited, wherein/>Depending on the particular implementation; in terms of item/>For example, the standard tax data is set in the neighborhood preset range/>The average value of the internal standard tax data is recorded as the standard risk average value of the standard tax data, wherein the standard tax data is in a neighborhood preset range/>Refers to the standard tax data front/>Standard tax data and post-standard tax data/>A range of intervals formed by the standard tax data; acquisition of items/>Is a standard risk average of all; acquisition of items/>Is a standard risk average for all criteria of (c). It should be further noted that, in the process of calculating the standard risk average value of the standard tax data, if there is no neighborhood meeting the neighborhood preset range/>, in the actual neighborhood range of the standard tax dataAnd calculating the standard risk mean of the standard tax data according to the actual neighborhood range of the standard tax data.

Further, according to the projectWith item/>Obtaining a standard risk mean of item/>With item/>In which item/>With item/>The method for calculating the correlation of the (a) comprises the following steps:

；

In the method, in the process of the invention, Representing item/>With item/>Related correlations of (3); /(I)Indicating the number of moments; /(I)Represented in the projectIn the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Representing item/>Standard deviation of standard tax data of (a); /(I)Representing item/>Standard deviation of standard tax data of (a). Wherein if the item/>With item/>There is a large correlation between them, and there is a positive correlation, then item/>With item/>The correlation of (2) is larger and approaches to 1; if item/>With item/>There is a large correlation between them and a negative correlation, then item/>With item/>Is less correlated and approaches-1; if item/>With item/>There is less correlation between the items/>, then the items/>With item/>The correlation of (c) is small and approaches 0. And acquiring the correlation relevance among all the items.

Further, in terms of itemsFor example, a correlation threshold T1 is preset, where the embodiment is described by taking t1=0.7 as an example, and the embodiment is not specifically limited, where T1 may be determined according to the specific implementation situation; combining all the items pairwise to obtain a plurality of item combinations; obtain all contained items/>And is noted as item/>Is a combination of the initial items of (a); in terms of item/>For example, if the correlation of the initial item combination is equal to or greater than the correlation threshold T1, the initial item combination is recorded as item/>Is a combination of the associations of (a); if the correlation of the initial item combination is smaller than the correlation threshold T1, not performing any processing on the initial item combination; acquisition of items/>Is defined as a combination of all associations; all relevance combinations for all items are obtained.

So far, all relevance combinations of all the items are obtained through the method.

Step S003: obtaining the risk degree of each item according to the relevance combination and the relevance; and clustering all the projects according to the risk degree to obtain a high-risk initial data cluster, a medium-risk initial data cluster, a low-risk initial data cluster and a risk-free initial data cluster.

It should be noted that, in the relevance combination of all the items described in step S002, the risk data existing therein is analyzed, where the risk data is standard tax data with larger variation amplitude in the data with larger relevance at the same time, and because the relevance is determined by taking the neighborhood mean value as the relevance, the standard tax data cannot be accurate, so that the risk degree of the item is obtained by performing difference analysis according to the standard tax data with strong relevance and the standard tax data of the item.

In particular, in terms of(1 /)By way of example, the item/>, is combined with the relevance(1 /)Items/>, within a personal relevance groupAnother item other than that is noted as item/>(1 /)A plurality of associated items; acquisition of items/>According to item/>All associated items get item/>Risk level of (1), wherein item/>The risk degree calculating method comprises the following steps:

；

In the method, in the process of the invention, Representing item/>Risk level of (2); /(I)Representing item/>Is a number of associative combinations of (a); /(I)Indicating the number of moments; /(I)Representing item/>(1 /)Correlation relevance of the combination of correlations; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Expressed in item/>(1 /)In the standard tax data sequence of the associated items, the/>Standard risk means for the individual standard tax data; /(I)Expressed in item/>(1 /)In the associated items, the average value of the standard risk average value; /(I)Representing hyper-parameters, preset/>, of the present embodiment; In addition,/>, it is noted thatAnd/>Respectively expressed in item/>And/>Among the associated items, the first/>Differences in the individual standard tax data compared to the respective average; if item/>And/>Approximate variation between associated items, then/>The more nearly 1; by aiming at itemsAll the relevance combinations of (1) are analyzed to obtain the item/>Risk level of (2); if the tax data changes are more similar to the correlation, then the item/>, is describedThe more normal the standard tax data is, the more items/>The lesser the risk level; conversely, item/>The greater the degree of risk. And acquiring the risk degree of all the projects.

Further, K-Means clustering is carried out on all projects according to the risk degrees to obtain a plurality of clusters, the average value of the risk degrees of the clusters is used for sorting the clusters in the order from big to small, and the sorted clusters are sequentially marked as high-risk initial data clusters, medium-risk initial data clusters, low-risk initial data clusters and risk-free initial data clusters; wherein each initial cluster contains risk degrees of a plurality of items; K-Means clustering is a known technique, and the present embodiment presets the number of classificationsThe present embodiment is not particularly limited, wherein/>Depending on the particular implementation.

So far, the method is used for obtaining the high-risk initial data cluster, the medium-risk initial data cluster, the low-risk initial data cluster and the risk-free initial data cluster.

Step S004: obtaining promotion indexes of all projects according to the high-risk initial data cluster, the medium-risk initial data cluster, the low-risk initial data cluster and the non-risk initial data cluster; adjusting all projects according to the promotion indexes to obtain a plurality of distributed data clusters; and constructing a distributed memory according to the distributed data clusters to realize tax risk data acquisition and monitoring.

It should be noted that, in this embodiment, tax data is stored in a distributed manner according to the risk level, so as to optimize call analysis, but there is a concurrent requirement for database access in the analysis process, that is, an item in one database is calledAt the same time, the item/>, in the database is calledIf item/>If the associated item does not belong to the same memory, the access efficiency is reduced.

Therefore, the embodiment is based on the items of low and medium risk levelsConcurrent frequency of associated items with high risk levels simultaneously appearing in a relational database, and building item/>Risk promotion models of (a) to obtain a revised cluster.

It should be further noted that, the initial data cluster with high risk does not have an initial data cluster that can promote; only the initial data cluster where the project is located is allowed to advance to the initial data cluster with one level higher than the project, the initial data cluster with one level higher than the initial data cluster without risk is the initial data cluster with low risk, the initial data cluster with one level higher than the initial data cluster with low risk is the initial data cluster with medium risk, the initial data cluster with one level higher than the initial data cluster with medium risk is the initial data cluster with high risk, and the initial data cluster with one level higher than the initial data cluster with high risk does not exist.

Specifically, taking any item as an example, marking an initial data cluster to which the item belongs as a first data cluster of the item, and marking an initial data cluster to which the item can promote as a promotion data cluster of the item; the inter-class distance between the first data cluster and the promotion data cluster of the project is recorded as the promotion requirement of the promotion data cluster of the project; the acquisition of the inter-class distance is a well-known content of K-Means clustering, and this embodiment will not be described.

Further, the method for calculating the promotion index of the promotion data cluster of the project comprises the following steps:

；

In the method, in the process of the invention, A promotion index representing the promotion data cluster of the project; /(I)A promotion requirement of the promotion data cluster representing the project; /(I)Representing the number of items contained in the promotional data cluster; /(I)Representing the concurrence frequency of the qth item and the item in the promotion data cluster; /(I)Representing the number of all items; /(I)An exponential function that is based on a natural constant; this embodiment employs/>The functions are presented with inverse proportion relation and normalization processing, and an implementer can select the inverse proportion function and the normalization function according to actual conditions. Wherein if there is a greater concurrency relationship between an item and the item in the high-risk initial data cluster, then/>The larger the value is, the greater the possibility of the initial data cluster of the high risk of the promotion of the project is, and the smaller the promotion index is. Acquiring promotion indexes of all items in the low-risk initial data cluster for promoting the high-risk initial data cluster; and in all the projects, acquiring promotion indexes of an initial data cluster with higher risk than the initial data cluster to which the project belongs.

Further, taking an example that the initial data cluster with high risk includes any item except all items, the adjustment is performed according to the promotion index of the item: if the promotion index of the promotion data cluster corresponding to the item promotion is smaller than the clustering threshold, the item is put into the promotion data cluster, an initial data cluster which is one level higher than the promotion data cluster is recorded as a new promotion data cluster of the item, if the promotion index of the promotion data cluster corresponding to the item promotion is smaller than the clustering threshold, the item still belongs to the original initial data cluster, then the promotion index of the new promotion data cluster of the item is judged according to the promotion index of the new promotion data cluster of the item and the clustering threshold, and if the promotion index of the promotion data cluster corresponding to the item promotion is smaller than the clustering threshold, the item still belongs to the original initial data cluster; and analogically, until the promotion index of the promotion data cluster corresponding to the promotion of the item is greater than or equal to the clustering threshold value, or the promotion data cluster does not exist in the item, and the item is adjusted. And referring to the adjustment process of the items according to the promotion index, adjusting all items except all items in the initial data cluster with high risk until all items are adjusted, and marking the initial data cluster after all items are adjusted as a distributed data cluster. The acquisition of the cluster threshold is a well-known content of K-Means clustering, and this embodiment will not be described.

Furthermore, the distributed data clusters are used as units for constructing the distributed memory, and the units are stored, so that tax risk data acquisition and monitoring are realized.

Through the steps, tax risk data acquisition and monitoring are completed.

Referring to fig. 2, a block diagram of a tax risk data collection and monitoring system according to an embodiment of the present invention is shown, where the system includes the following modules:

Compared with the existing method in the tax data collection process, when the distributed storage is used for improving the storage efficiency, the method does not consider the access concurrency requirement and the risk degree of the tax items obtained by collection, and when the method is used for analyzing tax risks, the problem of performance waste of a risk analysis system is caused; according to the method, the tax data of each item is analyzed to obtain the risk degree, initial clustering is carried out according to the risk degree to obtain the initial data cluster taking the risk grade as a standard, a promotion model is constructed for tax data in the low-grade initial data cluster according to the access concurrency demand analysis, the initial data cluster is corrected to obtain the distributed data cluster, and the distributed data cluster is used for constructing a distributed memory and storing, so that the waste of storage space is reduced, and the efficiency of a tax system for processing tax risk data is improved.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The tax risk data acquisition and monitoring method is characterized by comprising the following steps of:

The risk degree of each item is obtained according to the relevance combination and the relevance, and the specific method comprises the following steps:

；

In the method, in the process of the invention, Representing item/>Risk level of (2); /(I)Representing item/>Is a number of associative combinations of (a); /(I)Indicating the number of moments; Representing item/> (1 /)Correlation relevance of the combination of correlations; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Expressed in item/>(1 /)In the standard tax data sequence of the associated items, the/>Standard risk means for the individual standard tax data; /(I)Expressed in item/>(1 /)In the associated items, the average value of the standard risk average value; /(I)Representing the super-parameters;

Obtaining a promotion index of each promotion data cluster of each project according to the high-risk initial data cluster, the medium-risk initial data cluster, the low-risk initial data cluster, the risk-free initial data cluster and the concurrence frequency; adjusting all projects according to the promotion indexes to obtain a plurality of distributed data clusters; constructing a distributed memory according to the distributed data clusters to realize tax risk data acquisition and monitoring;

The method comprises the following steps of clustering all items according to the risk degree to obtain a high-risk initial data cluster, a medium-risk initial data cluster, a low-risk initial data cluster and a risk-free initial data cluster, wherein the specific method comprises the following steps:

The method comprises the steps of presetting a K value to be 4, carrying out K-Means clustering on all projects according to risk degrees to obtain 4 clusters, sequencing the clusters according to the average value of the risk degrees of the clusters from large to small, and sequentially marking the sequenced clusters as high-risk initial data clusters, medium-risk initial data clusters, low-risk initial data clusters and risk-free initial data clusters.

2. The tax risk data collection and monitoring method according to claim 1, wherein the neighborhood analysis is performed on the standard tax data sequence to obtain a standard risk average value of each standard tax data, and the specific method comprises the following steps:

3. The tax risk data collection and monitoring method according to claim 1, wherein the obtaining the correlation between the items according to the difference of the standard risk mean values between the different items comprises the following specific steps:

；

recording any two items as items respectively With item/>In the above, the ratio of/>Representing item/>With item/>Related correlations of (3); /(I)Indicating the number of moments; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Expressed in item/>In the standard tax data sequence of (1) >)Standard risk means for the individual standard tax data; /(I)Representing item/>Is a mean of the standard risk means; /(I)Representing item/>Standard deviation of standard tax data of (a); /(I)Representing item/>Standard deviation of standard tax data of (a).

4. The tax risk data collection and monitoring method according to claim 1, wherein the threshold screening is performed on the items according to the correlation to obtain the correlation combination of all the items, and the specific method comprises the following steps:

5. The tax risk data collection and monitoring method according to claim 4, wherein the method for obtaining all initial item combinations of marked items comprises the following specific steps:

6. The method for collecting and monitoring tax risk data according to claim 1, wherein the method for obtaining the promotion index of each promotion data cluster of each project according to the high risk initial data cluster, the medium risk initial data cluster, the low risk initial data cluster, the risk-free initial data cluster and the concurrence frequency comprises the following specific steps:

Marking any item as a target item, marking an initial data cluster to which the target item belongs as a first data cluster of the target item, and marking an initial data cluster of promotion in the target item as a promotion data cluster of the target item; acquiring a plurality of promotion data clusters of a target item, wherein any promotion data cluster of the target item is selected; the inter-class distance between the first data cluster of the target item and the promotion data cluster is recorded as the promotion requirement of the promotion data cluster of the target item;

；

7. The method for collecting and monitoring tax risk data according to claim 1, wherein the adjusting all items according to promotion indexes to obtain a plurality of distributed data clusters comprises the following specific steps:

8. A tax risk data collection and monitoring system, which is a system for implementing the tax risk data collection and monitoring method according to claims 1-7, and is characterized in that the system comprises: