CN111325247B

CN111325247B - Intelligent auditing realization method based on least square support vector machine

Info

Publication number: CN111325247B
Application number: CN202010084387.9A
Authority: CN
Inventors: 唐昌明; 马士中; 赵玉海
Original assignee: Inspur General Software Co Ltd
Current assignee: Inspur General Software Co Ltd
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2022-08-02
Anticipated expiration: 2040-02-10
Also published as: CN111325247A

Abstract

The invention provides an implementation method of intelligent audit based on a least square support vector machine, which belongs to the technical field of computer application and can be used at each flow node of an ERP intelligent account reporting system. Firstly classifying different scene documents, identifying different types of documents by adopting different limits, further confirming whether the documents are reasonable and whether the money incidence relation is correct, simultaneously carrying out corresponding grade prompt on document return risks according to limiting factor influence factors, and entering corresponding error categories after the risk values are reached. The invention can assist document submitters, approvers and auditors to quickly position documents, avoid error document formation, improve account-reporting efficiency, improve auditing efficiency, standardize auditing standards, release time and energy of financial staff, improve the accuracy of the bill-taking of the account-reporting person and shorten the whole account-reporting period.

Description

Intelligent auditing realization method based on least square support vector machine

Technical Field

The invention relates to the technical field of computer application, in particular to an implementation scheme for ERP financial software intelligent auditing, belongs to an implementation method for automatically identifying and classifying wrong documents, and more particularly designs an implementation method for intelligent auditing by using a least square support vector machine.

Background

The ERP financial system is a set of system for managing purchasing, supplying, generating, account reporting and other management works, and with the development of enterprises, more and more personnel are provided, documents needing to process the contents in the system become complex and diverse, so that more auditors are added to ensure the accuracy and the effectiveness of the documents.

Daily account reporting, auditing, accounting and settlement occupy most of time, so that financial staff are difficult to learn and promote, meanwhile, account reporting is tedious for staff, and the accounting and auditing work is performed manually, so that on one hand, the documents are numerous and miscellaneous, the auditing content is numerous, omission is easy to occur, on the other hand, the auditing rules are inconsistent, the time consumption is more, and the auditing speed and the bill extraction quality of the documents are low.

In view of the above, it is an urgent need to develop a method for rapidly and intelligently identifying error points of documents and initially classifying the error points.

Disclosure of Invention

The technical task of the invention is to provide a method for realizing intelligent auditing based on a least square support vector machine, aiming at the defect that the documents are numerous and easy to make mistakes when traditional personnel audit the documents, so that the auditing efficiency is improved, the auditing standard is standardized, and the auditing speed and the document-picking quality of the documents are improved.

The technical scheme adopted by the invention for solving the technical problems is as follows:

an implementation method of intelligent audit based on least square support vector machine comprises the following steps:

step A, document classification: classifying the documents based on a classification method of a least square support vector machine, determining document types and then further identifying;

step B, wrong bill identification: the method comprises the steps of identifying the rationality, the amount relation and the normalization of the document; according to the rule requirements of corresponding document types, setting influence factors of different error categories and overrun standards of the influence factors, identifying, and rapidly optimizing parameters of a least square support vector machine and the influence factors of the error categories by using a population algorithm;

step C, further classifying the wrong list: after the receipt identification, the common features of the receipts with large errors are identified and classified, and the specific direct reason of the errors is fed back as much as possible.

Preferably, the step a comprises the following steps according to classification:

1) classifying the types of the documents, namely manually classifying the documents according to the large types of the documents, so that the classification efficiency is improved;

2) in the major category, a least square support vector machine is used for extracting document types, single making persons, abstracts, starting detailed tables and detailed table information (sharing information) to serve as support vectors;

3) the influence of the vectors under different categories on the recognition result is extracted and eliminated, so that the recognition efficiency is improved;

4) and (4) data normalization, namely performing calibration normalization processing on all extracted vector data, and particularly summarizing text information.

Preferably, the step a of implementing the classification specifically includes the following steps:

1) setting of relevant preconditions

a) In order to classify, the selected support vector needs to be defined first, and the source large-class identifier is defined: BillSource, defining the unique identification of the document type: BillType, the definition system single unique identifier: peoples id, defines a digest unique identifier: abstract, defines the unique identifier of the enabled sub-table: GetInfosForchild, defining the rest support vectors in turn;

b) defining the result acceptable value of the cross validation test on the classification result as LowerConfidencelimit;

c) defining the internal code of the document and the classified corresponding type mapper, defining the classification result as ResultType, and the internal code as BillId;

2) establishing operation framework of least square support vector machine

a) Standardizing all extracted data to ensure that the data standard is available;

b) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;

c) setting the algorithm type as classification;

d) inputting historical data for modeling, and confirming that the credibility of the model is more than Lowerconfigencellimit by means of cross validation;

e) inputting current source document information, matching support vector information to obtain a classification result;

3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultType of the TypeMapper class.

Preferably, in the step a, the documents are classified according to different document originators, different document header information and different introduced documents.

Preferably, the identification of the wrong order in the step B includes the following steps:

1) extracting principal information as a support vector for identifying the error using principal component analysis;

2) and (3) rapidly optimizing the parameters of the least square support vector machine and the influence factors of the error categories by using a drosophila algorithm, and analyzing the classification result by adopting a cross validation mode, thereby improving the generalization capability of the optimization parameter result.

Preferably, the implementation steps of the wrong order identification in the step B specifically include the following steps:

1) setting of relevant preconditions

a) Defining all components;

b) defining a main component array SupportArray as a support vector;

c) defining a corresponding type MissTypeMapper of an error type and a document internal code, and parameters of BillId and ResultMissType;

d) defining that the result of the classification result cross-validation test is subjected to an optimization error CanErrorRate;

2) establishing operation framework of least square support vector machine introducing fruit fly algorithm optimization

b) giving the iteration times and the population scale of the drosophila algorithm, confirming the fitness function, setting the flight step length, and improving the algorithm to a three-dimensional space to prevent the drosophila algorithm from falling into a local part;

c) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;

d) setting the algorithm type as Classication;

e) determining an initial penalty factor and a kernel function, and obtaining an ErrorRate through cross validation after modeling;

f) optimizing the two parameters by using a drosophila algorithm of a component, taking ErrorRate obtained by cross validation as a standard, stopping when CanErrorRate is met, and not needing excessive optimization, otherwise damaging generalization capability;

g) inputting source data into the trained model to obtain a recognition result;

3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultMissType of the MissTypeMapper class.

Preferably, the step C of misclassifying the list further means: and after the wrong bill is identified, further classifying the wrong bill by using a K-Means clustering method.

Preferably, the implementation steps of further classifying the wrong order in the step C specifically include the following:

1) clustering framework

a) Identifying the number of types that are desired to be further subdivided (by analyzing the likelihood of a sub-category being incorrect);

b) clustering the result by using a standard K-Means clustering method;

c) when the data belonging to the set sub-category reaches a certain value, the type is considered to be frequently appeared and is classified into a large category of wrong list identification.

Compared with the prior art, the method for realizing the intelligent audit based on the least square support vector machine has the following beneficial effects:

1. the method and the device aim at the problems that the traditional personnel auditing mode is large in auditing content, easy to miss, inconsistent in auditing rule, long in time consumption and the like, intelligent processing is carried out, the specific classification of the documents is intelligently summarized, the wrong documents in the classification are identified, warning is carried out according to the setting after identification or the documents are directly confirmed to be returned, and meanwhile, the specific class of wrong iterative wrong documents identification target is further clustered. The auditing efficiency is improved, the auditing standard is standardized, and the bill extraction quality is improved;

2. the method can be used for each flow node of an ERP intelligent posting system, and the realization method introduces a population algorithm thought on the basis of a traditional least square support vector machine to improve the convergence speed and the local error zone problem;

3. the invention can assist document submitters, approvers and auditors to quickly position documents, avoid error document formation, improve account-reporting efficiency, improve auditing efficiency, standardize auditing standards, release time and energy of financial staff, improve the accuracy of the bill-taking of the account-reporting person and shorten the whole account-reporting period.

Detailed Description

The technical solutions in the present invention will be described clearly and completely with reference to specific embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention discloses a method for realizing intelligent audit based on a least square support vector machine, which comprises the following steps:

step A, different bill sponsors, different bill head information and different introduced bills lead to different specific categories of bills, and the classification method based on the least square support vector machine classifies the bills first, determines the categories of the bills and then further identifies the bills; the recognition efficiency is improved, and meanwhile, the recognition accuracy and the model effectiveness are guaranteed.

The step A comprises the following steps according to classification:

Step B, wrong bill identification: the method comprises the steps of identifying the rationality, the amount relation and the normalization of the document; and according to the rule requirements of corresponding document types, setting influence factors of different error categories and overrun standards of the influence factors, identifying, and rapidly optimizing the parameters of the least square support vector machine and the influence factors of the error categories by using a drosophila algorithm.

The wrong order identification in the step B comprises the following steps:

And step C, wrong bill classification mainly comprises the steps of identifying and classifying the common features of bills with large errors after bill identification, and feeding back specific direct reasons of errors as far as possible.

The step C of further classifying the wrong list means that: and after the wrong bill is identified, further classifying the wrong bill by using a K-Means clustering method.

The step A of realizing the classification specifically comprises the following steps:

1) setting of relevant preconditions

2) establishing operation framework of least square support vector machine

b) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;

c) setting the algorithm type as classification;

The implementation steps of the wrong order identification in the step B specifically comprise the following steps:

1) setting of relevant preconditions

a) Defining all components;

b) defining a main component array SupportArray as a support vector;

c) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;

d) setting the algorithm type as Classication;

g) inputting source data into the trained model to obtain a recognition result;

The implementation steps of further classifying the wrong order in the step C specifically include the following steps:

1) clustering framework

b) clustering the result by using a standard K-Means clustering method;

The main program components involved in the invention are as follows:

% relates to a total of 7 sets of parameters, three uncontrolled variables, 5 controlled variables, with each current variable being taken as 0 point of the initial fruit fly, added at the end,

% row 5 columns random number matrix, representing 5 fruit fly populations. Initial fly cost flight position.

Initial Drosophila population position

X _ axis ═ 2 × rand (1, 5); search for maximum when initial position of% fruit fly is set to 100 times 1 random number

Y_axis＝2*rand(1,5)；

Z_axis＝2*rand(1,5)；

maxgen ═ 100; % number of iterations

sizepop＝30；

B＝0；

mm 0-2; % coefficient of initialization

% Drosophila optimization onset

for i＝sizepop:-1:1

X(i,:)＝X_axis+2*mm0*rand()-mm0；

Y(i,:)＝Y_axis+2*mm0*rand()-mm0；

Z(i,:)＝Z_axis+2*mm0*rand()-mm0；

D(i,1)＝(X(i,1)^2+Y(i,1)^2+Z(i,1)^2)^0.5；

D(i,2)＝(X(i,2)^2+Y(i,2)^2+Z(i,2)^2)^0.5；

D(i,3)＝(X(i,3)^2+Y(i,3)^2+Z(i,3)^2)^0.5；

D(i,4)＝(X(i,4)^2+Y(i,4)^2+Z(i,4)^2)^0.5；

D(i,5)＝(X(i,5)^2+Y(i,5)^2+Z(i,5)^2)^0.5；

S(i,1)＝1/D(i,1)+B；

S(i,2)＝1/D(i,2)+B；

S(i,3)＝1/D(i,3)+B；

S(i,4)＝1/D(i,4)+B；

S(i,5)＝1/D(i,4)+B；

% parameter of

a1＝indata(1)；a2＝indata(2)；a3＝indata(3)；a4＝indata(4)；a5＝indata(5)；a6＝indata(6)；a7＝indata(7)；

% similar Fitness Fitness function

x1＝a1；

x2＝a2；

x3＝S(i,5)+a3；

x4＝S(i,1)+a4；

x5＝S(i,2)+a5；

x6＝S(i,3)+a6；

x7＝S(i,4)+a7；

Data(i,:)＝[x1,x2,x3,x4,x5,x6,x7]；

Smell(i)＝TCM_smellfun(Data(i,:))；

end

% extract optimal population

(ii) fit effect for% 5 flies, selecting the one with the best fit effect, i.e. the smallest root mean square error, indicating that observed and actual values are closest to [ bestsmlll, bestsindex ] ═ max (smell);

% population moved to the optimum taste concentration, and x [ ], y [ ] rows indicate the positions of 5 populations, respectively

X_axis＝X(bestIndex,:)；

Y_axis＝Y(bestIndex,:)；

Z_axis＝Z(bestIndex,:)；

Data_best＝Data(bestIndex,:)；

Smellbest＝bestSmell；

Dist＝max(D(:,1))；

% iteration optimization

yy＝zeros(1,maxgen)；

Xbest＝zeros(5,maxgen)；

Ybest＝zeros(5,maxgen)；

Zbest＝zeros(5,maxgen)；

for g＝1:maxgen

B＝Dist*(0.5-rand())；

tt ═ 1- (1 × g-1))/maxgen; % coefficient of variation value

for i＝sizepop:-1:1

X(i,:)＝X_axis+2*mm0*tt*rand()-mm0*tt；

Y(i,:)＝Y_axis+2*mm0*tt*rand()-mm0*tt；

Z(i,:)＝Z_axis+2*mm0*tt*rand()-mm0*tt；

D(i,1)＝(X(i,1)^2+Y(i,1)^2+Z(i,1)^2)^0.5；

D(i,2)＝(X(i,2)^2+Y(i,2)^2+Z(i,2)^2)^0.5；

D(i,3)＝(X(i,3)^2+Y(i,3)^2+Z(i,3)^2)^0.5；

D(i,4)＝(X(i,4)^2+Y(i,4)^2+Z(i,4)^2)^0.5；

D(i,5)＝(X(i,5)^2+Y(i,5)^2+Z(i,5)^2)^0.5；

S(i,1)＝1/D(i,1)+B；

S(i,2)＝1/D(i,2)+B；

S(i,3)＝1/D(i,3)+B；

S(i,4)＝1/D(i,4)+B；

S(i,5)＝1/D(i,5)+B；

The% parameters are:

% similar Fitness Fitness function

x1＝a1；

x2＝a2；

x3＝S(i,5)+a3；

x4＝S(i,1)+a4；

x5＝S(i,2)+a5；

x6＝S(i,3)+a6；

x7＝S(i,4)+a7；

Data(i,:)＝[x1,x2,x3,x4,x5,x6,x7]；

Smell(i)＝TCM_smellfun(Data(i,:))；

end

% from initial taste concentration values to find extrema

[bestSmell,bestIndex]＝max(Smell)；

Left the position of the optimum value

ifbestSmell>Smellbest

X_axis＝X(bestIndex,:)；

Y_axis＝Y(bestIndex,:)；

Z_axis＝Z(bestIndex,:)；

Data_best＝Data(bestIndex,:)；

Smellbest＝bestSmell；

gen＝g；

end

The% optimal value of each generation is recorded to yy array, and the optimal position of each generation is also recorded

yy(g)＝Smellbest；

Xbest(:,g)＝X_axis'；

Ybest(:,g)＝Y_axis'；

Zbest(:,g)＝Z_axis'；

End

% support vector classification body

type＝'Classification'；

kernel＝'RBF_kernel'；

proprecess＝'proprecess'；

L_fold＝10；

gam＝[]；

sig2＝[]；

model＝initlssvm(train_data,train_result,type,gam,sig2,kernel,proprecess)；

model＝tunelssvm(model,'gridsearch','crossvalidatelssvm',{L_fold,'mse'})；

model＝trainlssvm(model)；

In summary, the method classifies the bills in different scenes, adopts different limits for identification of the bills in different categories, further confirms whether the bills are reasonable and whether the money association relation is correct, and simultaneously carries out corresponding grade prompt on the return risk of the bills according to the influence factors of the limit factors, and the bills are classified into corresponding error categories after the risk value is reached. The invention can assist document submitters, approvers and auditors to quickly position documents, avoid error document formation, improve account-reporting efficiency, improve auditing efficiency, standardize auditing standards, release time and energy of financial staff, improve the accuracy of the bill-taking of the account-reporting person and shorten the whole account-reporting period.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

In addition to the technical features described in the specification, the technology is known to those skilled in the art.

Claims

1. An implementation method of intelligent audit based on a least square support vector machine is characterized by comprising the following specific steps:

step C, further classifying the wrong list: after the receipt identification, identifying and classifying the common specificity of the receipts with large errors, and feeding back the specific direct reason of the errors;

the step A comprises the following steps according to classification:

1) classifying the types of the documents, namely manually classifying the documents according to the large types of the documents;

2) extracting document types, single making, abstracting, starting detailed tables and detailed table information as support vectors by using a least square support vector machine in the large category;

3) extracting and eliminating the influence of the vectors under different categories on the recognition result;

4) data normalization, namely performing calibration normalization processing on all extracted vector data;

the implementation steps of the classification in the step A specifically include the following:

1) setting of relevant preconditions

a) In order to classify, the selected support vector needs to be defined first, and the source large-class identifier is defined: BillSource, defining the unique identification of the document type: BillType, the unique identification of definition system person: peoples id, defines a digest unique identifier: abstract, defines the unique identifier of the enabled sub-table: GetInfosForchild, defining the rest support vectors in turn;

2) establishing operation framework of least square support vector machine

b) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;

c) setting the algorithm type as classification;

3) respectively writing the inner code and the classification result of the single data information into the BillId and the ResultType of the TypeMapper class;

4) in the step A, classifying the documents according to different document initiators, different document head information and different introduced documents;

the wrong order identification in the step B comprises the following steps:

2) the fruit fly algorithm is used for quickly optimizing parameters of the least square support vector machine and influence factors of error categories, and a cross validation mode is adopted for analyzing classification results, so that the generalization capability of the optimization parameter results is improved;

1) setting of relevant preconditions

a) Defining all components;

b) defining a main component array SupportArray as a support vector;

c) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;

d) setting the algorithm type as Classication;

f) optimizing two parameters of a penalty factor and a kernel function by using a drosophila algorithm of a component, taking ErrorRate obtained by one-time cross validation as a standard, stopping when the CanErrorRate is met, avoiding excessive optimization, and otherwise damaging generalization capability;

g) inputting the source data into the trained model to obtain a recognition result;

2. The method as claimed in claim 1, wherein the step C of classifying further by mistake is that: and after the wrong bill is identified, further classifying the wrong bill by using a K-Means clustering method.

3. The method as claimed in claim 2, wherein the step of implementing the further classification of the wrong list in step C includes the following steps:

1) clustering framework

a) Identifying the number of types that are desired to be further subdivided by analyzing the likelihood of the wrong subcategory;

b) clustering the result by using a standard K-Means clustering method;