CN117236900A

CN117236900A - Individual tax data processing method and system based on flow automation

Info

Publication number: CN117236900A
Application number: CN202311381321.6A
Authority: CN
Inventors: 杨东晓; 高翔; 伍斯龙
Original assignee: Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2023-12-15
Anticipated expiration: 2043-10-23
Also published as: CN117236900B

Abstract

The application relates to the technical field of computers, and provides a tax data processing method and system based on flow automation, wherein the method comprises the following steps: binding the annual income data before tax, the three-risk two-fee data, the special additional deduction data and the annual income data after tax of each user to obtain target individual tax data; inputting target tax data into a multi-layer perceptron neural network model formed by a hardware computing unit to obtain a recognition result output by the multi-layer perceptron neural network model. According to the application, the target tax data is processed through the multi-layer perceptron neural network model, the tax deduction mode of the user is output, and as the nodes of the hidden layer and the output layer output two states of 0 and 1 according to the probability, the model can consume fewer data bits, so that the hardware deployment and acceleration are facilitated, the solving capability of the model for the uncertainty classification problem is improved, and the recognition accuracy of the tax deduction mode of the user is improved.

Description

Individual tax data processing method and system based on flow automation

Technical Field

The application relates to the technical field of computers, in particular to a tax data processing method and system based on flow automation.

Background

Under the background of big data application, the artificial neural network ANN model is one of important research directions in the field of machine learning, and due to the fact that a large number of data sets are available for training a network, the related technology of the neural network model rapidly develops in recent years, and various model structure networks are designed and constructed. Because the neural network model has an effective and stable analysis capability on various data, great attention and development are paid to the field of pattern recognition.

In the existing tax deduction mode processing method for identifying users based on individual tax data, a traditional ANN model of online processing is mainly used, and the established data processing models are all based on the existing computer platform and are not independently deployed on an independent hardware system. Therefore, compared with the analysis processing mode of the ANN model formed by the hardware calculation unit, the analysis processing flow based on the traditional ANN model can not meet the data processing requirements which are increased in actual application gradually in analysis mechanism and speed, so that the tax deduction mode of the user can not be accurately identified.

Disclosure of Invention

Aiming at the problems existing in the prior art, the application provides a tax data processing method and system based on flow automation, aiming at improving the recognition accuracy of tax deduction modes of users.

In a first aspect, an embodiment of the present application provides a method for processing tax data based on flow automation, including:

binding the annual income data before tax, the three-risk two-fee data, the special additional deduction data and the annual income data after tax of each user to obtain target individual tax data;

inputting the target tax data into a multi-layer perceptron neural network model formed by a hardware computing unit to obtain an identification result output by the multi-layer perceptron neural network model; the recognition result is the tax deduction mode of each user;

the multi-layer perceptron neural network model comprises: an input layer for receiving data, at least one hidden layer for performing pattern recognition analysis, and an output layer for outputting recognition results; wherein the input layer comprises at least one common neuron node; the hidden layer comprises at least one neuron node based on a probabilistic node p-bit model, and the output layer comprises at least one neuron node based on the p-bit model; and the output result of each neuron node in the hidden layer and the output layer is 0 when the corresponding neuron node is silent, the output result is 1 when the input data of the corresponding neuron node reaches a threshold value, and the output result has probability characteristics.

In one embodiment, the calculation rules adopted by each neuron node in the hidden layer and the output layer include:

I _out ＝sign(sigmoid(ω ^T I _in +b)-rand)

wherein I is _out Is the input of the neuron node, I _in The method is an output result after calculation of the neuron nodes, wherein a sign function and a sigmoid function are both activation functions, the sign function is a symbol function, and the sigmoid is a function extrusion function; ω and b are the weights and biases of the neuron nodes, respectively, and rand is the random number output by the random number generator.

In one embodiment, the step of training the multi-layer perceptron neural network model comprises:

carrying out data binding on the annual income data before tax, the three-risk two-fee data, the special additional deduction data and the annual income data after tax of the sample user to obtain a sample individual tax data set;

setting initial values of at least one model structure parameter of the multi-layer perceptron neural network model formed by the hardware calculation units; the model structure parameters include at least one of: implicit layer weights, implicit layer biases, output layer weights and output layer biases;

using the sample tax data set, and alternately performing model training through forward propagation and error reverse propagation iteration, and continuously updating each model structure parameter to obtain the multi-layer perceptron neural network model;

Wherein the sign function sign is fitted using an extrusion function sigmoid during the error back propagation; the multi-layer perceptron neural network model comprises: an input layer for receiving data, at least one hidden layer for performing pattern recognition analysis, and an output layer for outputting recognition results; the input layer includes at least one common neuron node; the hidden layer comprises at least one neuron node based on a probabilistic node p-bit model; the output layer comprises at least one neuron node based on a p-bit model; and the output result of each neuron node in the hidden layer and the output layer is 0 when the corresponding neuron node is silent, the output result is 1 when the input data of the corresponding neuron node reaches a threshold value, and the output result has probability characteristics.

In one embodiment, using the sample tax data set, performing model training by iterative iteration of forward propagation and error reverse propagation, and continuously updating each model structural parameter to obtain the multi-layer perceptron neural network model, including:

setting an initial value of at least one model training parameter; the model training parameters include at least one of: gradient descent step alpha for controlling learning rate; a gradient descent momentum m for controlling the learning rate; fitting parameters beta used for assisting the sigmoid function in fitting the sign function; the parameter n for randomly dividing sample tax data set batches and the total iteration number T of model training are used for dividing the sample tax data set batches randomly;

For each iteration, randomly dividing sample tax data set batches according to the parameter n to generate a plurality of sample tax data subset sequences;

for each set of sample tax data set subsets in the plurality of sample tax data subset sequences, performing the following:

inputting the sample individual tax data set subset to the input layer, determining the actual output and fitting output of each hidden layer, and determining the actual output and fitting output of the output layer; determining an actual error of the model according to the actual output and the fitting output of the output layer;

back-propagating the actual error of the model to the output layer, and determining fitting increment, weight gradient and bias gradient of the output layer, and fitting increment, weight gradient and bias gradient of each hidden layer;

updating the hidden layer weight, the hidden layer bias, the output layer weight and the output layer bias according to the weight gradient and the bias gradient of the output layer and the weight gradient and the bias gradient of the hidden layer;

and ending the iteration when the iteration times reach the total iteration times T of the model training, and obtaining the multi-layer perceptron neural network model.

In an embodiment, determining the actual output and the fit output of each of the hidden layers comprises:

calculating the actual output and fitting output of each hidden layer by adopting a formula (1);

where k is an implicit layer flag, h _k For the actual output of the hidden layer of the k layer, h' _k Fitting output for the k-th hidden layer, h ₀ Data in a subset of the i-th set of sample individual tax data sets;and->Links respectively of the k-th hidden layerThe weight and bias are connected; both the sign function and the sigmoid function are activation functions, the sign function is a sign function, and the sigmoid function is an extrusion function; beta is a fitting parameter used for assisting the sigmoid function to fit the sign function; rand is the random number output by the random number generator; or alternatively, the first and second heat exchangers may be,

calculating the actual output and fitting output of the output layer by adopting a formula (2);

wherein y is the actual output of the output layer, and y' is the fitting output of the output layer;and b ^o Respectively the connection weight and the bias of the output layer, h _n Is the actual output of the last hidden layer.

In one embodiment, determining the model actual error based on the actual output and the fitted output of the output layer includes:

calculating the actual error of the model by adopting a formula (3);

Wherein e is the actual error of the model, N is the data volume contained in the subset of the ith sample individual tax data set, y _j For the actual output of the output layer, y' _j And outputting for fitting of the output layer.

In one embodiment, determining the fitting delta, weight gradient, and bias gradient for the output layer, and the fitting delta, weight gradient, and bias gradient for each of the hidden layers, comprises:

calculating fitting increment, weight gradient and bias gradient of the output layer by adopting a formula (4);

wherein delta is ^o Dω for the fitting delta of the output layer ^o Db for the weight gradient of the output layer ^o A bias gradient for the output layer;

calculating fitting increment, weight gradient and bias gradient of each hidden layer by adopting a formula (5);

wherein,fitting delta for the k-th hidden layer, < >>Weights for the layer hidden in layer k+1, < ->Weights for the hidden layer of the k-1 layer,/->Weight gradient for the k-th hidden layer, < ->Bias gradient for the k-th hidden layer;

correspondingly, updating the implicit layer weight, the implicit layer bias, the output layer weight and the output layer bias according to the weight gradient and the bias gradient of the output layer and the weight gradient and the bias gradient of the implicit layer comprises the following steps:

Calculating an update value of the implicit layer weight, an update value of the implicit layer bias, an update value of the output layer weight and an update value of the output layer bias by adopting a formula (6);

wherein omega ^h B is the current value of the hidden layer weight ^h For the current value of the hidden layer bias, ω ^o B is the current value of the output layer weight ^o A current value biased for the output layer;for the updated value of the hidden layer weight, < +.>An updated value for said hidden layer bias, < >>For the updated value of the output layer weight, < >>An updated value for the output layer bias.

In a second aspect, an embodiment of the present application provides a tax data processing system based on process automation, including:

the data binding module is used for carrying out data binding on the annual income data before tax, the three-risk two-fee data, the special additional deduction data and the annual income data after tax of each user to obtain target individual tax data;

the data processing module is used for inputting the target tax data into a multi-layer perceptron neural network model formed by a hardware computing unit to obtain an identification result output by the multi-layer perceptron neural network model; the recognition result is the tax deduction mode of each user;

In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the flow automation-based tax data processing method of the first aspect when the processor executes the program.

In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the flow automation based tax data processing method of the first aspect.

Fifth aspect an embodiment of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the flow automation based tax data processing method of the first aspect.

According to the tax data processing method based on the flow automation, data binding is carried out on the annual income data before tax, the three-risk two-fee data, the special additional deduction data and the annual income data after tax of each user, so that target tax data is obtained; inputting target tax data into a multi-layer perceptron neural network model formed by a hardware computing unit to obtain a recognition result output by the multi-layer perceptron neural network model.

The target tax data is processed through the multi-layer perceptron neural network model, the tax deduction mode of the user is output, and as the nodes of the hidden layer and the output layer output two states of 0 and 1 according to the probability, the model can consume fewer data bits, hardware deployment and acceleration are facilitated, the solving capability of the model for the uncertainty classification problem is improved, and therefore the recognition accuracy of the tax deduction mode of the user is improved.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a neural network model according to an embodiment of the present application;

FIG. 2 is a system block diagram of a p-bit provided by an embodiment of the present application;

FIG. 3 is a flow diagram of a method for processing tax data based on flow automation according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a model training process provided by an embodiment of the present application;

FIG. 5 is a schematic flow chart of forward propagation of a multi-layer perceptron neural network model composed of probabilistic nodes provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a flow chart of error back propagation of a multi-layer perceptron neural network model formed by probabilistic nodes according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a process automation based tax data processing system according to an embodiment of the present application;

fig. 8 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application provides a method and a system for processing personal tax data based on flow automation, which are described below with reference to the accompanying drawings.

Aiming at the problems of hardware acceleration, brain-like calculation and the like of an ANN model in the field of pattern recognition, the embodiment of the application provides a multi-layer perceptron neural network model which can realize on-chip hardware deployment and has brain-like characteristics and a corresponding training method thereof in the process of processing individual tax data based on flow automation, and improves the calculation capability of the model for processing uncertainty problems while being beneficial to the calculation acceleration of the ANN model.

The embodiment of the application provides a multi-layer perceptron neural network model formed by hardware computing units. Fig. 1 is a schematic structural diagram of a neural network model provided by an embodiment of the present application, and as shown in fig. 1, the neural network model of a multi-layer perceptron formed by Probability nodes (p-bit) provided by the embodiment of the present application includes: an input layer x, k hidden layers and an output layer y; wherein:

an input layer x comprising n common neuron nodes, i.e. x ₁ ，…，x _n The method comprises the steps of carrying out a first treatment on the surface of the An input layer x, which is used for receiving data and is used as an interface for receiving data by a model;

k hidden layers, e.g. hidden layer h ₁ … hidden layer h _k The method comprises the steps of carrying out a first treatment on the surface of the The hidden layer is used for carrying out pattern recognition analysis; each hidden layer comprises a plurality of neuron nodes based on a p-bit model; for example, hidden layer h ₁ Includes l ₁ Neuron nodes based on the p-bit model, i.e.Hidden layer h _k Includes l _k Neuron nodes based on the p-bit model, i.e.)>The hidden layer is used as a first-stage process of analyzing data and recombining effective information by a model, the state is 0 when silence is achieved, 1 is output when the silence reaches a threshold value, and the hidden layer has probability characteristics; the hidden layer uses a p-bit model to replace a common neuron model to complete forward propagation and error reverse propagation, so that the calculation rule of each node of the hidden layer accords with the p-bit model.

The output layer y comprises m neuron nodes based on the p-bit model, i.e. y ₁ ，…，y _m The method comprises the steps of carrying out a first treatment on the surface of the The output layer y is used for outputting the identification result; and the output layer comprehensively analyzes and processes the effective information of the recombination of the previous layer and outputs the result. Each node of the output layer uses a p-bit model to replace the common spiritTraining and analysis are completed through the meta-model, the final output result still follows the criterion that the state is 0 when silence, and the output is 1 when the input of the corresponding neuron node reaches the threshold value, and has probability characteristics, and the final output is used as the final output of the multi-layer perceptron neural network model formed by the probability nodes.

The multi-layer perceptron neural network model provided by the embodiment of the application is a multi-layer neural network structure, and is respectively an input layer, an hidden layer and an output layer in sequence according to the data flow sequence. The hidden layer may be provided in multiple layers, one for each of the input layer and the output layer. The input layer may be composed of common neuron nodes and is used only for receiving data. The hidden layer and the output layer consist of a p-bit model.

The calculation rules adopted by each neuron node in the hidden layer and the output layer comprise:

FIG. 2 is a system block diagram of a p-bit provided by an embodiment of the present application; as shown in fig. 2, for data x of an input node, vector multiplication is first performed with node connection weight ω, and then node bias b is accumulated; after the sigmoid function is calculated, subtraction is carried out on the obtained product and the random number, then a sign function is input, and a final result is output.

The function of the multi-layer perceptron neural network model formed by the hardware computing units provided by the embodiment of the application can be described as follows: when data to be analyzed is input, an input layer formed by general nodes is transmitted into an implicit layer and an output layer formed by a probability unit layer by layer through weight connection, the probability analysis result of the model for the current prediction data is finally output through the calculation characteristics described by the formula (a), and a result more accurate than a common perceptron model is finally obtained through multiple times of analysis.

The multi-layer perceptron neural network model formed by the hardware computing units provided by the embodiment of the application can still give a primary prediction result with higher accuracy by means of the computing characteristics of the internal structure (the weight and bias obtained by optimization and the built-in function of the p-bit) of the model under the condition of single prediction.

Optionally, referring to fig. 3, fig. 3 is a flow chart of a tax data processing method based on flow automation according to an embodiment of the present application.

Step 101, binding the annual income data before tax, the three-risk two-fee data, the special additional deduction data and the annual income data after tax of each user to obtain target individual tax data;

specifically, since uncertainty of tax deduction mode of each user is influenced by pre-tax annual income data, three-risk two-gold data, special additional deduction data and post-tax annual income data, pre-tax annual income data, three-risk two-gold data, special additional deduction data and post-tax annual income data of each user are acquired, and pre-tax annual income data, three-risk two-gold data, special additional deduction data and post-tax annual income data of each user are data-bound to obtain target individual tax data. The user can comprise enterprise users and individual users, the 'three-insurance' comprises endowment insurance, medical insurance and uneconomical insurance, the 'two-insurance' refers to housing public accumulation and industrial injury insurance, and special additional deduction data of different users are different and are determined according to the users.

And 102, inputting the target tax data into a multi-layer perceptron neural network model formed by a hardware computing unit, and obtaining an identification result output by the multi-layer perceptron neural network model.

Optionally, inputting target individual tax data into a multi-layer perceptron neural network model formed by a hardware computing unit to obtain an identification result output by the multi-layer perceptron neural network model, wherein the identification result is a tax deduction mode of each user, and the tax deduction mode can be a payroll tax deduction mode, a comprehensive income tax deduction mode, a prepayment tax deduction mode, a staged prepayment tax deduction mode and a full payment tax deduction mode.

Wherein, the neural network model of the multi-layer perceptron comprises: an input layer for receiving data, at least one hidden layer for performing pattern recognition analysis, and an output layer for outputting recognition results. The input layer comprises at least one common neuron node; the hidden layer comprises at least one neuron node based on a probabilistic node p-bit model, and the output layer comprises at least one neuron node based on the p-bit model; the output result of each neuron node in the hidden layer and the output layer is 0 when the corresponding neuron node is silent, the output result is 1 when the input data of the corresponding neuron node reaches a threshold value, and the output result has probability characteristics.

According to the embodiment of the application, the target individual tax data is processed through the multi-layer perceptron neural network model, the tax deduction mode of the user is output, and as the nodes of the hidden layer and the output layer output two states of 0 and 1 according to the probability, the model can consume fewer data bits, so that the hardware deployment and acceleration are facilitated, the solving capability of the model for the uncertainty classification problem is improved, and the recognition accuracy of the tax deduction mode of the user is improved.

Optionally, training the neural network model of the multi-layer perceptron includes: sample tax data set, training parameters, initializing model and algorithm specific flow.

Optionally, the sample tax data set is specifically for being analyzed during the training process to build a corresponding model. Optionally, the initialization model is specifically used to train the algorithm to give the initial weights ω and initial biases b for each layer of the model. Optionally, the training parameters are used to control important variables of the algorithm process, such as, in particular, the gradient descent step alpha used to control the learning rate; a gradient descent momentum m for controlling the learning rate; fitting parameters beta used for assisting the sigmoid function in fitting the sign function; a parameter n for randomly dividing the sample individual tax dataset batches; the model trains the total iteration parameter T. The algorithm flow is specifically used for controlling the whole model training process to obtain a final model.

According to the multi-layer perceptron neural network model mode identification method and hardware deployment formed by the probabilistic nodes, after the sample tax data set is subjected to selected standardized pretreatment, training can be conducted as model input data, model weight values and offsets are updated continuously through forward propagation and error reverse propagation iteration and are alternately conducted, a final model which can be used for a specific task is finally obtained, data bit storage space pressure is reduced in an offline or independent platform scene, a model identification structure of data to be tested is provided quickly, meanwhile, the capability of a computing platform for processing uncertain calculation is improved, and the accuracy of mode identification is improved.

Optionally, referring to fig. 3, fig. 3 is a schematic diagram of a model training flow provided in an embodiment of the present application, and a specific process of training a neural network model of a multi-layer perceptron includes steps 301 to 303, where:

step 301, performing data binding on pre-tax annual income data, three-risk two-price data, special additional deduction data and post-tax annual income data of a sample user to obtain a sample individual tax data set.

Acquiring pre-tax annual income data, three-risk two-fee data, special additional deduction data and post-tax annual income data of each sample user, and binding the pre-tax annual income data, the three-risk two-fee data, the special additional deduction data and the post-tax annual income data of each sample user to obtain a sample individual tax data set.

Step 302, setting initial values of at least one model structure parameter of a multi-layer perceptron neural network model formed by a hardware computing unit; the model structure parameters include at least one of: implicit layer weights, implicit layer biases, output layer weights, and output layer biases.

Optionally, at the time of model initialization, model structure parameters and corresponding initial values are set.

Step 303, using the sample tax data set, performing model training through forward propagation and error reverse propagation iteration alternately, and continuously updating each model structure parameter to obtain the multi-layer perceptron neural network model;

wherein the sign function sign is fitted using an extrusion function sigmoid during the error back propagation; the multi-layer perceptron neural network model comprises: an input layer for receiving data, at least one hidden layer for performing pattern recognition analysis, and an output layer for outputting recognition results; the input layer includes at least one common neuron node; the hidden layer comprises at least one neuron node based on a p-bit model; the output layer comprises at least one neuron node based on a p-bit model; and the output result of each neuron node in the hidden layer and the output layer is 0 when the corresponding neuron node is silent, the output result is 1 when the input data of the corresponding neuron node reaches a threshold value, and the output result has probability characteristics.

According to the embodiment of the application, the model training is alternately carried out through forward propagation and error reverse propagation iteration by using a sample tax data set, the structural parameters of each model are continuously updated, a multi-layer perceptron neural network model formed by a hardware calculation unit is obtained, and the sign function sign is fitted by using an extrusion function sigmoid in the error reverse propagation process, so that the problem that the sign function cannot be derived due to singular points can be solved due to the fact that the sigmoid function is conductive. Meanwhile, a probabilistic node p-bit model is introduced into the multi-layer perceptron neural network model, the hidden layer and the output layer of the multi-layer perceptron neural network model are composed of neuron nodes based on the p-bit model, the output result of each neuron node in the hidden layer and the output layer is 0 when the corresponding neuron node is silent, the output result is 1 when the input data of the corresponding neuron node reaches a threshold value, and the output result has probability characteristics, so that the calculation capability of the neural network model for processing uncertainty problems is improved while the calculation acceleration of the neural network model is facilitated, and the accuracy of pattern recognition is improved.

Optionally, in the step 303, using the sample tax data set, performing model training through iteration alternation of forward propagation and error counter propagation, and continuously updating each model structural parameter, so as to obtain the implementation manner of the multi-layer perceptron neural network model may include the following steps 1-4:

Step 1, setting initial values of at least one model training parameter; the model training parameters include at least one of: gradient descent step alpha for controlling learning rate; a gradient descent momentum m for controlling the learning rate; fitting parameters beta used for assisting the sigmoid function in fitting the sign function; the parameter n for randomly dividing sample tax data set batches and the total iteration number T of model training are used for dividing the sample tax data set batches randomly;

step 2, for each iteration, randomly dividing sample tax data set batches according to the parameter n to generate a plurality of sample tax data subset sequences;

step 3, for each group of sample tax data set subsets in the plurality of sample tax data subset sequences, executing step 31-step 33:

step 31, inputting the sample individual tax data set subset to the input layer, determining the actual output and fitting output of each hidden layer, and determining the actual output and fitting output of the output layer; determining an actual error of the model according to the actual output and the fitting output of the output layer;

step 32, back-propagating the actual error of the model to the output layer, and determining the fitting increment, the weight gradient and the bias gradient of the output layer, and the fitting increment, the weight gradient and the bias gradient of each hidden layer;

Step 33, updating the hidden layer weight, the hidden layer bias, the output layer weight and the output layer bias according to the weight gradient and the bias gradient of the output layer and the weight gradient and the bias gradient of the hidden layer;

and step 4, ending iteration when the iteration times reach the total iteration times T of the model training, and obtaining the multi-layer perceptron neural network model.

Optionally, an implementation manner of determining the actual output and the fitting output of each hidden layer may include:

where k is an implicit layer flag, h _k For the actual output of the hidden layer of the k layer, h' _k Fitting output for the k-th hidden layer, h ₀ Data in a subset of the i-th set of sample individual tax data sets;and->Respectively connecting weights and offsets of the k-th hidden layer; both the sign function and the sigmoid function are activation functions, the sign function is a sign function, and the sigmoid function is an extrusion function; beta is a fitting parameter used for assisting the sigmoid function to fit the sign function; rand is the random number output by the random number generator.

Optionally, an implementation manner of determining the actual output and the fitting output of the output layer may include:

Optionally, the implementation manner of determining the actual error of the model according to the actual output and the fitting output of the output layer may include:

calculating the actual error of the model by adopting a formula (3);

wherein e is the actual error of the model, N is the data volume contained in the subset of the ith sample individual tax data set, y _j For the actual output of the output layer, y _j ' is the fit output of the output layer.

Optionally, the implementation of determining the fitting increment, the weight gradient and the bias gradient of the output layer and the fitting increment, the weight gradient and the bias gradient of each hidden layer may include:

Wherein,fitting delta for the k-th hidden layer, < >>Weights for the layer hidden in layer k+1, < ->Weights for the hidden layer of the k-1 layer,/->Weight gradient for the k-th hidden layer, < ->A bias gradient for the k-th hidden layer.

Optionally, the updating the implicit layer weight, the implicit layer bias, the implementation manner of the output layer weight and the output layer bias according to the weight gradient and the bias gradient of the output layer and the weight gradient and the bias gradient of the implicit layer may include:

wherein omega ^h B is the current value of the hidden layer weight ^h For the current value of the hidden layer bias, ω ^o B is the current value of the output layer weight ^o A current value for the output layer bias;for the updated value of the implicit layer weight, +.>For an updated value of implicit layer bias, +.>For the updated value of the output layer weight, +.>Updated values for the output layer bias.

The application provides a process automation-based individual tax data processing method and a training method composed of probabilistic nodes by fusing a multi-layer perceptron neural network model based on a p-bit model calculation method. The probabilistic node unit is introduced into the neural network model, so that the method is more suitable for training an ANN model formed by the hardware calculation unit and is beneficial to processing uncertainty calculation. Meanwhile, the gradient descent method can be used for training a multi-layer perceptron with discretized output values by using a fitting function, so that the model has fewer data bits, and hardware deployment and calculation acceleration of the neural network model are facilitated.

Based on the structural framework of the neural network model shown in fig. 1 and the numerical calculation characteristics of the p-bit shown in fig. 2, the method can be further understood as steps S1-S11:

step S1, data set standardization. And carrying out specified data standardization pretreatment on the original sample tax data to obtain a sample tax data set.

It should be understood that the quality of the raw sample tax data varies, and the standard varies, and data normalization pretreatment is needed to meet the requirement of high accuracy of the model. After the data standardization preprocessing method of the original sample tax data is selected, the test set data is required to be subjected to the same data standardization preprocessing so as to ensure that the model can be better used for a pattern recognition task. The method aims at improving the quality of original data and guaranteeing the analysis capability of a final model through data set standardization.

Illustratively, alternative data normalization preprocessing methods include, but are not limited to, the z-score method, the min-max method, the auto-scaling method, the normalization method, etc., and data processed by the data normalization preprocessing method is then input into the model. For example, if the algorithm chooses to pre-process the sample individual tax data set using the min-max method during the model training phase, then the test set data should also use the min-max method during the later model use phase.

S2, initializing and setting a model. Setting model structure parameters and initial values. The model parameters to be initialized are specifically as follows: implicit layer initial weight ω ^h The method comprises the steps of carrying out a first treatment on the surface of the Implicit layer initial bias b ^h The method comprises the steps of carrying out a first treatment on the surface of the Output layer initial weight omega ^o The method comprises the steps of carrying out a first treatment on the surface of the Output layer initial bias b ^o 。

S3, initializing model training parameters. The model training parameters to be initialized include at least one of the following: gradient descent step alpha for controlling learning rate; a gradient descent momentum m for controlling the learning rate; fitting parameters beta used for assisting the sigmoid function in fitting the sign function; the parameter n for randomly dividing sample tax data set batches and the total iteration number T of model training are used for realizing the method. Step S3 is used for setting control parameters of the algorithm flow.

It should be appreciated that the gradient descent step alpha is used to control the learning rate to achieve a suitable iterative effect; the gradient descent momentum m is used for adjusting the learning rate; the fitting parameter beta is used to assist the sigmoid function in fitting the sign function, as shown in equation (8),

sigmoid(β·x)≈sign(x)(8)

the larger the parameter beta is, the closer the numerical output of the sigmoid function is to the sign function. The parameter n is used to randomly divide sample tax data set batches, each data subset being referred to as a batch, i.e., a set of sample tax data. The number of data in each batch is Ts/n, and Ts is the total amount of the sample individual tax data set.

S4, starting iteration. The iteration control flag bit t=1 is set. And controlling iteration start and iteration process through the step S4.

S5, generating a data subset sequence. Sample individual tax data set batches are randomly partitioned by a parameter n and a subset extraction flag bit i=1 is set. A sequence of sample individual tax data subsets is generated by step S5 to ensure the analytical capabilities of the trained model.

S6, sequentially extracting the ith group of sample individual tax data subsets. And extracting an i-th sample tax data subset for calculating each parameter gradient of the model in batches.

S7, forward propagation. Fig. 5 is a schematic flow chart of forward propagation of a multi-layer perceptron neural network model formed by probabilistic nodes according to an embodiment of the present application. In the forward propagation process shown in fig. 5, after the sample tax data x is imported into the multi-layer perceptron neural network model, the sample tax data x is cached into an input layer, and then is connected into an hidden layer through weights, and the actual output h of each layer of the hidden layer _k And fitting the output h' _k As calculated in equation (1),

wherein k is an implicit layer mark, and the value of k is counted from 1 until the mark of the last implicit layer; h is a _k For the actual output of the hidden layer of the k layer, h' _k Fitting output for the k-th hidden layer, h ₀ Data in a subset of the i-th set of sample individual tax data sets;and->Respectively connecting weights and offsets of the k-th hidden layer; both the sign function and the sigmoid function are activation functions, the sign function is a sign function, and the sigmoid function is an extrusion function; beta is a fitting parameter used for assisting the sigmoid function to fit the sign function; rand is the random number output by the random number generator.

After the sample tax data x is propagated in the hidden layer, the sample tax data x enters the output layer through weight connection, the actual output y and the fitting output y' of the output layer are calculated as in a formula (2),

It will be appreciated that in the forward propagation phase, there is no derivation in the error back propagation flow that follows, due to the singular of the sign function. From equation (8), it is known that the sign function can be fitted by the sigmoid function and the fitting parameter β. Meanwhile, the larger the parameter beta is, the closer the numerical output of the sigmoid function is to the sign function and the sigmoid function is led. The present application therefore proposes this training method. In the forward propagation stage of the training process, the actual output and the fitting output of each layer of nodes are required to be calculated and recorded respectively, the actual output is used for calculating the actual error of the current model, and the fitting output is used for calculating the gradient of the model parameters in the error back propagation process.

S8, error back propagation. Fig. 6 is a schematic flow chart of error back propagation of a multi-layer perceptron neural network model formed by probabilistic nodes according to an embodiment of the present application. After the output layer calculation is completed, the actual model errors are calculated by using the formula (3), and the actual model errors are used for calculating the increment and gradient of each model structural parameter, as shown in the error back propagation flow shown in fig. 6.

Then, the actual error of the model is reversely propagated to an output layer, a formula (4) is deduced by a chain derivative rule, and a fitting increment delta of the output layer is calculated by adopting the formula (4) ^o Weight gradient dω ^o Bias gradient db ^o ；

Wherein delta is ^o Dω for the fitting delta of the output layer ^o Db for the weight gradient of the output layer ^o A bias gradient for the output layer.

Calculating to obtain fitting increment delta of output layer ^o Weight gradient dω ^o Bias gradient db ^o Thereafter, deltaA ^o Reversely introducing hidden layers, counting forward from the last layer of the hidden layers until the first layer of the hidden layers, and sequentially calculating fitting increment of each hidden layer by adopting a formula (5) Weight gradient->Bias gradient->

This process continues until the hidden layer back propagation is complete.

It will be appreciated that during the error back propagation phase, since the sigmoid function is used to fit the sign function, equation (4) can be deduced from the chain derivative law, equation (5) for the fit delta ^o Andis calculated by the method.

S9, updating the weight. According to the weight gradient of each hidden layer obtained by calculationAnd bias gradient->Output layer weight gradient dω ^o And bias gradient db ^o The layer connection weights and offsets are updated, and in particular,

wherein omega ^h B is the current value of the hidden layer weight ^h For the current value of the hidden layer bias, ω ^o B is the current value of the output layer weight ^o A current value biased for the output layer;for the updated value of the implicit layer weight, +. >For an updated value of implicit layer bias, +.>For the updated value of the output layer weight, +.>Updated values for the output layer bias.

It will be appreciated that the gradient descent step size α is controlled for the learning rate to achieve a suitable iterative effect, and similarly the gradient descent momentum m is used to adjust the learning rate. If alpha and m are too large, the parameter step of each iteration model is too large, so that the parameter oscillates back and forth near the optimal solution, and the optimal solution cannot be obtained. If alpha and m are too small, the parameter stepping of the model is slight in each iteration, so that the parameter updating change is not large, and the optimal solution cannot be obtained. During training, the optimal alpha and m combinations need to be adapted to obtain the optimal model parameters.

S10, traversing all sample individual tax data set subsets. After training of the ith group of sample individual tax data set, if i is less than or equal to n, making i=i+1 and executing S6; otherwise, S11 is performed. By traversing all subsets in step S10, the assurance model can complete the update over the entire data set.

S11, judging whether the total iteration number T of model training is reached. After the T training is finished, if T is less than or equal to T, making t=t+1 and executing S5; otherwise, the program is ended, and a trained multi-layer perceptron neural network model formed by the probabilistic nodes is output. And (3) finishing iteration through the step S11, and controlling enough iteration rounds to ensure the analysis performance of the finally obtained model.

In the training process, the embodiment of the application uses the extrusion function sigmoid to fit the sign function sign when the error is in back propagation, and meanwhile, the output is converted into the probability signal through the random generator, so that the problems that the sign function can not be derived and the gradient disappears are solved. Therefore, the flow-automation-based individual tax data processing method provided by the embodiment of the application can solve the problem that sign function singular points cannot be derived, and provides an individual tax data processing algorithm applicable to a multi-layer perceptron consisting of probabilistic nodes and based on flow automation.

The embodiment of the application provides a flow-based automatic individual tax data processing method based on a multi-layer perceptron composed of hardware computing units, which can realize on-chip hardware deployment and has an ANN model with brain-like characteristics, is beneficial to the requirements of future computing acceleration and avoids complex computing pressure; meanwhile, the probabilistic unit module is introduced into the model, so that the solving capability of the model on the uncertainty classification problem is improved, and the accuracy of pattern recognition is improved.

The individual tax data processing method based on the process automation based on the multi-layer perceptron neural network model formed by the hardware computing units provided by the embodiment of the application can comprise the following steps a to d, wherein:

And a step a, normalizing the data to be predicted. Data with predicted requirements, referred to as test set data, is subject to a specified standardized preprocessing method.

Illustratively, alternative data normalization methods include, but are not limited to, the z-score method, the min-max method, the auto-scaling method, the normalization method, etc., with data being subsequently entered into the model.

It should be understood that, the lack of unified standards for the original data in the test set, or the omission or uncertainty of the data attribute values, may eventually lead to poor model identification effect, and require standardized preprocessing to meet the input requirements of the following models. The pretreatment standardized method of the test set data is consistent with the pretreatment method appointed by the training set. The test set data is subjected to standardized preprocessing so as to accurately extract the data and adjust the format of the data, thereby obtaining high-quality data and ensuring that the model can be better used for pattern recognition tasks.

And b, importing target tax data. Specifically, the target tax data which is subjected to standardized pretreatment is imported by using a model input layer.

Illustratively, in order to fulfill the requirement of receiving target tax data, the input layer of the multi-layer perceptron neural network model formed by the probabilistic nodes is compatible with the traditional input layer of the multi-layer perceptron, and common neuron nodes can be used, each input node independently receives data, and the number of the input nodes is consistent with the dimension of the data.

Preferably, to improve the efficiency of data transmission from step a to step 802, a data buffer pool may be set in the multi-layer perceptron neural network model formed by the probabilistic nodes.

And c, model identification analysis. And performing recognition analysis on the multi-layer perceptron neural network model formed by the probabilistic nodes by utilizing the model network structure and the optimal parameters obtained after training.

For example, a 4-layer perceptron model consisting of probabilistic nodes may be provided, comprising an input layer x consisting of n common nodes, consisting of l ₁ Hidden layer h composed of p-bits ₁ And by l ₂ Hidden layer h composed of p-bits ₂ And an output layer y formed by m p-bits. n, l ₁ 、l ₂ And m are positive integers. After the data is subjected to standardized preprocessing, the model is input by the model through weight links of each layer according to the forward propagation sequenceThe layer enters an implicit layer and an output layer, performs pattern recognition analysis and outputs a result. The hidden layer and output layer behavior consisting of p-bit neuron nodes can be described by equation (9),

wherein x, h ₁ ，h ₂ And y are vectors, respectively composed of n and l ₁ 、l ₂ And m components, each component is input or output of a certain node; likewise, sign and sigmoid are still "sign function" and "squeeze function" rand are random numbers output by random number generators embedded in each node, respectively; And->Connection weights and offsets of the hidden layers of the first layer, respectively,/->And->Connection weights and offsets of the hidden layers of the second layer, respectively,/->And b ^oT The connection weight and bias of the output layer, respectively, < >>And->Are all matrix, the sizes of which are respectively l ₁ ×n，l ₂ ×l ₁ M×l ₂ 。

It should be understood that, because the computation characteristic of the p-bit ensures that the output result of the node is binarized (the value "1" corresponds to the two states of "excitation" and the value "0" corresponds to "suppression"), and at the same time, the random number generation module in the p-bit is synchronously updated, and real-time uncertainty is introduced to the node where the random number generation module is located in the computation process, so that the neuron model simulated by the probabilistic node is more similar to a biological neuron. Furthermore, the p-bit nodes of the hidden layer and the output layer can simulate the operation mode of the biological brain nerve to the greatest extent in the mode recognition process, so that the processing and analysis capacity of the model on the uncertainty problem is improved, the accuracy of mode recognition is improved probabilistically, and the error of the model can be reduced or even eliminated to a certain extent.

And d, outputting the identification result. Specifically, the layer output model is utilized to complete the calculation result after the analysis of the data. And finally, outputting a corresponding recognition analysis result of the model. And repeatedly identifying for N times, wherein the repeated identifying times are used for defining the repeated identifying times of the data signal in the multi-layer perceptron neural network model formed by the probabilistic nodes so as to increase the accuracy of data analysis and reduce the error rate. And N is a positive integer.

For example, n=5 is set, that is, each data signal to be predicted is repeatedly identified for 5 times in the multi-layer perceptron neural network model formed by the probabilistic nodes, so that 5 results are correspondingly generated, and the final identification result is determined by a voting method. For the classification and identification problem, a single-hot coding (one-hot coding) method may be used to set a model output layer structure, that is, the number m of output layer nodes is equal to the total category number, after the probabilistic model is calculated, each node unit in the output layer node array (m×1) generates a signal, after normalization, the identification result of a certain data is always in an activated state, and other nodes are all in a suppressed state (for example, if the total category number is 6, the final output layer result of the 3 rd type data should be "001000").

It should be understood that, as calculated by the formula (9), the output layer of the model should finally generate a binary number array, which has m bits in total, where the ith node represents the ith class (i.ltoreq.m), and the position of the node where the output layer is in the active state is the class result analyzed by the model.

The process automation-based tax data processing system provided by the embodiment of the application is described below, and the process automation-based tax data processing system described below and the process automation-based tax data processing method described above can be correspondingly referred to each other.

FIG. 7 is a schematic structural diagram of a flow automation-based tax data processing system according to an embodiment of the present application, as shown in FIG. 7, the flow automation-based tax data processing system includes: a data binding module 701 and a data processing module 702;

a data binding module 701, configured to bind pre-tax annual revenue data, three-risk two-fee data, special additional deduction data, and post-tax annual revenue data of each user to obtain target tax data;

the data processing module 702 is configured to input the target tax data into a multi-layer perceptron neural network model formed by a hardware computing unit, so as to obtain a recognition result output by the multi-layer perceptron neural network model; the recognition result is the tax deduction mode of each user;

According to the application, the target individual tax data is processed through the multi-layer perceptron neural network model, the tax deduction mode of the user is output, and as the hidden layer and the output layer output the two states of 0 and 1 according to the probability, the model can consume fewer data bits, so that the hardware deployment and acceleration are facilitated, the problem solving capability of the model on uncertainty classification is improved, and the recognition accuracy of the tax deduction mode of the user is improved.

Fig. 8 is a schematic physical structure of an electronic device according to an embodiment of the present application, as shown in fig. 8, where the electronic device may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform the above-described flow automation-based tax data processing method.

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can implement the foregoing flow automation-based tax data processing method.

In yet another aspect, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described flow automation-based tax data processing method.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A tax data processing method based on process automation, comprising:

2. The process automation based tax data processing method of claim 1, wherein the calculation rules adopted by each neuron node in the hidden layer and the output layer comprise:

I _out ＝sign(sigmoid(ω ^T I _in +b)-rand)

3. The process automation based tax data processing method of claim 1, wherein training the multi-layer perceptron neural network model comprises:

4. The process automation based tax data processing method of claim 3, wherein using the sample tax data set, model training is performed by iterative alternation of forward propagation and error reverse propagation, and each model structure parameter is updated continuously, to obtain the multi-layer perceptron neural network model, comprising:

5. The process automation based tax data processing method of claim 4, wherein determining the actual output and the fit output for each of the hidden layers comprises:

where k is an implicit layer flag, h _k For the actual output of the hidden layer of the k layer, h' _k Fitting output for the k-th hidden layer, h ₀ Data in a subset of the i-th set of sample individual tax data sets;and->Respectively connecting weights and offsets of the k-th hidden layer; both the sign function and the sigmoid function are activation functions, the sign function is a sign function, and the sigmoid function is an extrusion function; beta is a fitting parameter used for assisting the sigmoid function to fit the sign function; rand is the random number output by the random number generator; or alternatively, the first and second heat exchangers may be,

6. The process automation based tax data processing method of claim 4, wherein the determining the model actual error based on the actual output and the fitted output of the output layer comprises:

calculating the actual error of the model by adopting a formula (3);

7. The process automation based tax data processing method of claim 4, wherein the determining the fitting delta, weight gradient, and bias gradient for the output layer, and the fitting delta, weight gradient, and bias gradient for each hidden layer, comprises:

wherein omega ^h B is the current value of the hidden layer weight ^h For the current value of the hidden layer bias, ω ^o B is the current value of the output layer weight ^o A current value biased for the output layer;for the updated value of the hidden layer weight, < +. >An updated value for said hidden layer bias, < >>For the updated value of the output layer weight, < >>An updated value for the output layer bias.

8. A tax data processing system based on process automation, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the flow automation based tax data processing method of any one of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the flow automation based tax data processing method of any one of claims 1 to 7.