CN111325247B - Intelligent auditing realization method based on least square support vector machine - Google Patents

Intelligent auditing realization method based on least square support vector machine Download PDF

Info

Publication number
CN111325247B
CN111325247B CN202010084387.9A CN202010084387A CN111325247B CN 111325247 B CN111325247 B CN 111325247B CN 202010084387 A CN202010084387 A CN 202010084387A CN 111325247 B CN111325247 B CN 111325247B
Authority
CN
China
Prior art keywords
support vector
document
documents
classification
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010084387.9A
Other languages
Chinese (zh)
Other versions
CN111325247A (en
Inventor
唐昌明
马士中
赵玉海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN202010084387.9A priority Critical patent/CN111325247B/en
Publication of CN111325247A publication Critical patent/CN111325247A/en
Application granted granted Critical
Publication of CN111325247B publication Critical patent/CN111325247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Abstract

The invention provides an implementation method of intelligent audit based on a least square support vector machine, which belongs to the technical field of computer application and can be used at each flow node of an ERP intelligent account reporting system. Firstly classifying different scene documents, identifying different types of documents by adopting different limits, further confirming whether the documents are reasonable and whether the money incidence relation is correct, simultaneously carrying out corresponding grade prompt on document return risks according to limiting factor influence factors, and entering corresponding error categories after the risk values are reached. The invention can assist document submitters, approvers and auditors to quickly position documents, avoid error document formation, improve account-reporting efficiency, improve auditing efficiency, standardize auditing standards, release time and energy of financial staff, improve the accuracy of the bill-taking of the account-reporting person and shorten the whole account-reporting period.

Description

Intelligent auditing realization method based on least square support vector machine
Technical Field
The invention relates to the technical field of computer application, in particular to an implementation scheme for ERP financial software intelligent auditing, belongs to an implementation method for automatically identifying and classifying wrong documents, and more particularly designs an implementation method for intelligent auditing by using a least square support vector machine.
Background
The ERP financial system is a set of system for managing purchasing, supplying, generating, account reporting and other management works, and with the development of enterprises, more and more personnel are provided, documents needing to process the contents in the system become complex and diverse, so that more auditors are added to ensure the accuracy and the effectiveness of the documents.
Daily account reporting, auditing, accounting and settlement occupy most of time, so that financial staff are difficult to learn and promote, meanwhile, account reporting is tedious for staff, and the accounting and auditing work is performed manually, so that on one hand, the documents are numerous and miscellaneous, the auditing content is numerous, omission is easy to occur, on the other hand, the auditing rules are inconsistent, the time consumption is more, and the auditing speed and the bill extraction quality of the documents are low.
In view of the above, it is an urgent need to develop a method for rapidly and intelligently identifying error points of documents and initially classifying the error points.
Disclosure of Invention
The technical task of the invention is to provide a method for realizing intelligent auditing based on a least square support vector machine, aiming at the defect that the documents are numerous and easy to make mistakes when traditional personnel audit the documents, so that the auditing efficiency is improved, the auditing standard is standardized, and the auditing speed and the document-picking quality of the documents are improved.
The technical scheme adopted by the invention for solving the technical problems is as follows:
an implementation method of intelligent audit based on least square support vector machine comprises the following steps:
step A, document classification: classifying the documents based on a classification method of a least square support vector machine, determining document types and then further identifying;
step B, wrong bill identification: the method comprises the steps of identifying the rationality, the amount relation and the normalization of the document; according to the rule requirements of corresponding document types, setting influence factors of different error categories and overrun standards of the influence factors, identifying, and rapidly optimizing parameters of a least square support vector machine and the influence factors of the error categories by using a population algorithm;
step C, further classifying the wrong list: after the receipt identification, the common features of the receipts with large errors are identified and classified, and the specific direct reason of the errors is fed back as much as possible.
Preferably, the step a comprises the following steps according to classification:
1) classifying the types of the documents, namely manually classifying the documents according to the large types of the documents, so that the classification efficiency is improved;
2) in the major category, a least square support vector machine is used for extracting document types, single making persons, abstracts, starting detailed tables and detailed table information (sharing information) to serve as support vectors;
3) the influence of the vectors under different categories on the recognition result is extracted and eliminated, so that the recognition efficiency is improved;
4) and (4) data normalization, namely performing calibration normalization processing on all extracted vector data, and particularly summarizing text information.
Preferably, the step a of implementing the classification specifically includes the following steps:
1) setting of relevant preconditions
a) In order to classify, the selected support vector needs to be defined first, and the source large-class identifier is defined: BillSource, defining the unique identification of the document type: BillType, the definition system single unique identifier: peoples id, defines a digest unique identifier: abstract, defines the unique identifier of the enabled sub-table: GetInfosForchild, defining the rest support vectors in turn;
b) defining the result acceptable value of the cross validation test on the classification result as LowerConfidencelimit;
c) defining the internal code of the document and the classified corresponding type mapper, defining the classification result as ResultType, and the internal code as BillId;
2) establishing operation framework of least square support vector machine
a) Standardizing all extracted data to ensure that the data standard is available;
b) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;
c) setting the algorithm type as classification;
d) inputting historical data for modeling, and confirming that the credibility of the model is more than Lowerconfigencellimit by means of cross validation;
e) inputting current source document information, matching support vector information to obtain a classification result;
3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultType of the TypeMapper class.
Preferably, in the step a, the documents are classified according to different document originators, different document header information and different introduced documents.
Preferably, the identification of the wrong order in the step B includes the following steps:
1) extracting principal information as a support vector for identifying the error using principal component analysis;
2) and (3) rapidly optimizing the parameters of the least square support vector machine and the influence factors of the error categories by using a drosophila algorithm, and analyzing the classification result by adopting a cross validation mode, thereby improving the generalization capability of the optimization parameter result.
Preferably, the implementation steps of the wrong order identification in the step B specifically include the following steps:
1) setting of relevant preconditions
a) Defining all components;
b) defining a main component array SupportArray as a support vector;
c) defining a corresponding type MissTypeMapper of an error type and a document internal code, and parameters of BillId and ResultMissType;
d) defining that the result of the classification result cross-validation test is subjected to an optimization error CanErrorRate;
2) establishing operation framework of least square support vector machine introducing fruit fly algorithm optimization
a) Standardizing all extracted data to ensure that the data standard is available;
b) giving the iteration times and the population scale of the drosophila algorithm, confirming the fitness function, setting the flight step length, and improving the algorithm to a three-dimensional space to prevent the drosophila algorithm from falling into a local part;
c) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;
d) setting the algorithm type as Classication;
e) determining an initial penalty factor and a kernel function, and obtaining an ErrorRate through cross validation after modeling;
f) optimizing the two parameters by using a drosophila algorithm of a component, taking ErrorRate obtained by cross validation as a standard, stopping when CanErrorRate is met, and not needing excessive optimization, otherwise damaging generalization capability;
g) inputting source data into the trained model to obtain a recognition result;
3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultMissType of the MissTypeMapper class.
Preferably, the step C of misclassifying the list further means: and after the wrong bill is identified, further classifying the wrong bill by using a K-Means clustering method.
Preferably, the implementation steps of further classifying the wrong order in the step C specifically include the following:
1) clustering framework
a) Identifying the number of types that are desired to be further subdivided (by analyzing the likelihood of a sub-category being incorrect);
b) clustering the result by using a standard K-Means clustering method;
c) when the data belonging to the set sub-category reaches a certain value, the type is considered to be frequently appeared and is classified into a large category of wrong list identification.
Compared with the prior art, the method for realizing the intelligent audit based on the least square support vector machine has the following beneficial effects:
1. the method and the device aim at the problems that the traditional personnel auditing mode is large in auditing content, easy to miss, inconsistent in auditing rule, long in time consumption and the like, intelligent processing is carried out, the specific classification of the documents is intelligently summarized, the wrong documents in the classification are identified, warning is carried out according to the setting after identification or the documents are directly confirmed to be returned, and meanwhile, the specific class of wrong iterative wrong documents identification target is further clustered. The auditing efficiency is improved, the auditing standard is standardized, and the bill extraction quality is improved;
2. the method can be used for each flow node of an ERP intelligent posting system, and the realization method introduces a population algorithm thought on the basis of a traditional least square support vector machine to improve the convergence speed and the local error zone problem;
3. the invention can assist document submitters, approvers and auditors to quickly position documents, avoid error document formation, improve account-reporting efficiency, improve auditing efficiency, standardize auditing standards, release time and energy of financial staff, improve the accuracy of the bill-taking of the account-reporting person and shorten the whole account-reporting period.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to specific embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses a method for realizing intelligent audit based on a least square support vector machine, which comprises the following steps:
step A, different bill sponsors, different bill head information and different introduced bills lead to different specific categories of bills, and the classification method based on the least square support vector machine classifies the bills first, determines the categories of the bills and then further identifies the bills; the recognition efficiency is improved, and meanwhile, the recognition accuracy and the model effectiveness are guaranteed.
The step A comprises the following steps according to classification:
1) classifying the types of the documents, namely manually classifying the documents according to the large types of the documents, so that the classification efficiency is improved;
2) in the major category, a least square support vector machine is used for extracting document types, single making persons, abstracts, starting detailed tables and detailed table information (sharing information) to serve as support vectors;
3) the influence of the vectors under different categories on the recognition result is extracted and eliminated, so that the recognition efficiency is improved;
4) and (4) data normalization, namely performing calibration normalization processing on all extracted vector data, and particularly summarizing text information.
Step B, wrong bill identification: the method comprises the steps of identifying the rationality, the amount relation and the normalization of the document; and according to the rule requirements of corresponding document types, setting influence factors of different error categories and overrun standards of the influence factors, identifying, and rapidly optimizing the parameters of the least square support vector machine and the influence factors of the error categories by using a drosophila algorithm.
The wrong order identification in the step B comprises the following steps:
1) extracting principal information as a support vector for identifying the error using principal component analysis;
2) and (3) rapidly optimizing the parameters of the least square support vector machine and the influence factors of the error categories by using a drosophila algorithm, and analyzing the classification result by adopting a cross validation mode, thereby improving the generalization capability of the optimization parameter result.
And step C, wrong bill classification mainly comprises the steps of identifying and classifying the common features of bills with large errors after bill identification, and feeding back specific direct reasons of errors as far as possible.
The step C of further classifying the wrong list means that: and after the wrong bill is identified, further classifying the wrong bill by using a K-Means clustering method.
The step A of realizing the classification specifically comprises the following steps:
1) setting of relevant preconditions
a) In order to classify, the selected support vector needs to be defined first, and the source large-class identifier is defined: BillSource, defining the unique identification of the document type: BillType, the definition system single unique identifier: peoples id, defines a digest unique identifier: abstract, defines the unique identifier of the enabled sub-table: GetInfosForchild, defining the rest support vectors in turn;
b) defining the result acceptable value of the cross validation test on the classification result as LowerConfidencelimit;
c) defining the internal code of the document and the classified corresponding type mapper, defining the classification result as ResultType, and the internal code as BillId;
2) establishing operation framework of least square support vector machine
a) Standardizing all extracted data to ensure that the data standard is available;
b) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;
c) setting the algorithm type as classification;
d) inputting historical data for modeling, and confirming that the credibility of the model is more than Lowerconfigencellimit by means of cross validation;
e) inputting current source document information, matching support vector information to obtain a classification result;
3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultType of the TypeMapper class.
The implementation steps of the wrong order identification in the step B specifically comprise the following steps:
1) setting of relevant preconditions
a) Defining all components;
b) defining a main component array SupportArray as a support vector;
c) defining a corresponding type MissTypeMapper of an error type and a document internal code, and parameters of BillId and ResultMissType;
d) defining that the result of the classification result cross-validation test is subjected to an optimization error CanErrorRate;
2) establishing operation framework of least square support vector machine introducing fruit fly algorithm optimization
a) Standardizing all extracted data to ensure that the data standard is available;
b) giving the iteration times and the population scale of the drosophila algorithm, confirming the fitness function, setting the flight step length, and improving the algorithm to a three-dimensional space to prevent the drosophila algorithm from falling into a local part;
c) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;
d) setting the algorithm type as Classication;
e) determining an initial penalty factor and a kernel function, and obtaining an ErrorRate through cross validation after modeling;
f) optimizing the two parameters by using a drosophila algorithm of a component, taking ErrorRate obtained by cross validation as a standard, stopping when CanErrorRate is met, and not needing excessive optimization, otherwise damaging generalization capability;
g) inputting source data into the trained model to obtain a recognition result;
3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultMissType of the MissTypeMapper class.
The implementation steps of further classifying the wrong order in the step C specifically include the following steps:
1) clustering framework
a) Identifying the number of types that are desired to be further subdivided (by analyzing the likelihood of a sub-category being incorrect);
b) clustering the result by using a standard K-Means clustering method;
c) when the data belonging to the set sub-category reaches a certain value, the type is considered to be frequently appeared and is classified into a large category of wrong list identification.
The main program components involved in the invention are as follows:
% relates to a total of 7 sets of parameters, three uncontrolled variables, 5 controlled variables, with each current variable being taken as 0 point of the initial fruit fly, added at the end,
% row 5 columns random number matrix, representing 5 fruit fly populations. Initial fly cost flight position.
Initial Drosophila population position
X _ axis ═ 2 × rand (1, 5); search for maximum when initial position of% fruit fly is set to 100 times 1 random number
Y_axis=2*rand(1,5);
Z_axis=2*rand(1,5);
maxgen ═ 100; % number of iterations
sizepop=30;
B=0;
mm 0-2; % coefficient of initialization
% Drosophila optimization onset
for i=sizepop:-1:1
X(i,:)=X_axis+2*mm0*rand()-mm0;
Y(i,:)=Y_axis+2*mm0*rand()-mm0;
Z(i,:)=Z_axis+2*mm0*rand()-mm0;
D(i,1)=(X(i,1)^2+Y(i,1)^2+Z(i,1)^2)^0.5;
D(i,2)=(X(i,2)^2+Y(i,2)^2+Z(i,2)^2)^0.5;
D(i,3)=(X(i,3)^2+Y(i,3)^2+Z(i,3)^2)^0.5;
D(i,4)=(X(i,4)^2+Y(i,4)^2+Z(i,4)^2)^0.5;
D(i,5)=(X(i,5)^2+Y(i,5)^2+Z(i,5)^2)^0.5;
S(i,1)=1/D(i,1)+B;
S(i,2)=1/D(i,2)+B;
S(i,3)=1/D(i,3)+B;
S(i,4)=1/D(i,4)+B;
S(i,5)=1/D(i,4)+B;
% parameter of
a1=indata(1);a2=indata(2);a3=indata(3);a4=indata(4);a5=indata(5);a6=indata(6);a7=indata(7);
% similar Fitness Fitness function
x1=a1;
x2=a2;
x3=S(i,5)+a3;
x4=S(i,1)+a4;
x5=S(i,2)+a5;
x6=S(i,3)+a6;
x7=S(i,4)+a7;
Data(i,:)=[x1,x2,x3,x4,x5,x6,x7];
Smell(i)=TCM_smellfun(Data(i,:));
end
% extract optimal population
(ii) fit effect for% 5 flies, selecting the one with the best fit effect, i.e. the smallest root mean square error, indicating that observed and actual values are closest to [ bestsmlll, bestsindex ] ═ max (smell);
% population moved to the optimum taste concentration, and x [ ], y [ ] rows indicate the positions of 5 populations, respectively
X_axis=X(bestIndex,:);
Y_axis=Y(bestIndex,:);
Z_axis=Z(bestIndex,:);
Data_best=Data(bestIndex,:);
Smellbest=bestSmell;
Dist=max(D(:,1));
% iteration optimization
yy=zeros(1,maxgen);
Xbest=zeros(5,maxgen);
Ybest=zeros(5,maxgen);
Zbest=zeros(5,maxgen);
for g=1:maxgen
B=Dist*(0.5-rand());
tt ═ 1- (1 × g-1))/maxgen; % coefficient of variation value
for i=sizepop:-1:1
X(i,:)=X_axis+2*mm0*tt*rand()-mm0*tt;
Y(i,:)=Y_axis+2*mm0*tt*rand()-mm0*tt;
Z(i,:)=Z_axis+2*mm0*tt*rand()-mm0*tt;
D(i,1)=(X(i,1)^2+Y(i,1)^2+Z(i,1)^2)^0.5;
D(i,2)=(X(i,2)^2+Y(i,2)^2+Z(i,2)^2)^0.5;
D(i,3)=(X(i,3)^2+Y(i,3)^2+Z(i,3)^2)^0.5;
D(i,4)=(X(i,4)^2+Y(i,4)^2+Z(i,4)^2)^0.5;
D(i,5)=(X(i,5)^2+Y(i,5)^2+Z(i,5)^2)^0.5;
S(i,1)=1/D(i,1)+B;
S(i,2)=1/D(i,2)+B;
S(i,3)=1/D(i,3)+B;
S(i,4)=1/D(i,4)+B;
S(i,5)=1/D(i,5)+B;
The% parameters are:
a1=indata(1);a2=indata(2);a3=indata(3);a4=indata(4);a5=indata(5);a6=indata(6);a7=indata(7);
% similar Fitness Fitness function
x1=a1;
x2=a2;
x3=S(i,5)+a3;
x4=S(i,1)+a4;
x5=S(i,2)+a5;
x6=S(i,3)+a6;
x7=S(i,4)+a7;
Data(i,:)=[x1,x2,x3,x4,x5,x6,x7];
Smell(i)=TCM_smellfun(Data(i,:));
end
% from initial taste concentration values to find extrema
[bestSmell,bestIndex]=max(Smell);
Left the position of the optimum value
ifbestSmell>Smellbest
X_axis=X(bestIndex,:);
Y_axis=Y(bestIndex,:);
Z_axis=Z(bestIndex,:);
Data_best=Data(bestIndex,:);
Smellbest=bestSmell;
gen=g;
end
The% optimal value of each generation is recorded to yy array, and the optimal position of each generation is also recorded
yy(g)=Smellbest;
Xbest(:,g)=X_axis';
Ybest(:,g)=Y_axis';
Zbest(:,g)=Z_axis';
End
% support vector classification body
type='Classification';
kernel='RBF_kernel';
proprecess='proprecess';
L_fold=10;
gam=[];
sig2=[];
model=initlssvm(train_data,train_result,type,gam,sig2,kernel,proprecess);
model=tunelssvm(model,'gridsearch','crossvalidatelssvm',{L_fold,'mse'});
model=trainlssvm(model);
In summary, the method classifies the bills in different scenes, adopts different limits for identification of the bills in different categories, further confirms whether the bills are reasonable and whether the money association relation is correct, and simultaneously carries out corresponding grade prompt on the return risk of the bills according to the influence factors of the limit factors, and the bills are classified into corresponding error categories after the risk value is reached. The invention can assist document submitters, approvers and auditors to quickly position documents, avoid error document formation, improve account-reporting efficiency, improve auditing efficiency, standardize auditing standards, release time and energy of financial staff, improve the accuracy of the bill-taking of the account-reporting person and shorten the whole account-reporting period.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
In addition to the technical features described in the specification, the technology is known to those skilled in the art.

Claims (3)

1. An implementation method of intelligent audit based on a least square support vector machine is characterized by comprising the following specific steps:
step A, document classification: classifying the documents based on a classification method of a least square support vector machine, determining document types and then further identifying;
step B, wrong bill identification: the method comprises the steps of identifying the rationality, the amount relation and the normalization of the document; according to the rule requirements of corresponding document types, setting influence factors of different error categories and overrun standards of the influence factors, identifying, and rapidly optimizing parameters of a least square support vector machine and the influence factors of the error categories by using a population algorithm;
step C, further classifying the wrong list: after the receipt identification, identifying and classifying the common specificity of the receipts with large errors, and feeding back the specific direct reason of the errors;
the step A comprises the following steps according to classification:
1) classifying the types of the documents, namely manually classifying the documents according to the large types of the documents;
2) extracting document types, single making, abstracting, starting detailed tables and detailed table information as support vectors by using a least square support vector machine in the large category;
3) extracting and eliminating the influence of the vectors under different categories on the recognition result;
4) data normalization, namely performing calibration normalization processing on all extracted vector data;
the implementation steps of the classification in the step A specifically include the following:
1) setting of relevant preconditions
a) In order to classify, the selected support vector needs to be defined first, and the source large-class identifier is defined: BillSource, defining the unique identification of the document type: BillType, the unique identification of definition system person: peoples id, defines a digest unique identifier: abstract, defines the unique identifier of the enabled sub-table: GetInfosForchild, defining the rest support vectors in turn;
b) defining the result acceptable value of the cross validation test on the classification result as LowerConfidencelimit;
c) defining the internal code of the document and the classified corresponding type mapper, defining the classification result as ResultType, and the internal code as BillId;
2) establishing operation framework of least square support vector machine
a) Standardizing all extracted data to ensure that the data standard is available;
b) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;
c) setting the algorithm type as classification;
d) inputting historical data for modeling, and confirming that the credibility of the model is more than Lowerconfigencellimit by means of cross validation;
e) inputting current source document information, matching support vector information to obtain a classification result;
3) respectively writing the inner code and the classification result of the single data information into the BillId and the ResultType of the TypeMapper class;
4) in the step A, classifying the documents according to different document initiators, different document head information and different introduced documents;
the wrong order identification in the step B comprises the following steps:
1) extracting principal information as a support vector for identifying the error using principal component analysis;
2) the fruit fly algorithm is used for quickly optimizing parameters of the least square support vector machine and influence factors of error categories, and a cross validation mode is adopted for analyzing classification results, so that the generalization capability of the optimization parameter results is improved;
the implementation steps of the wrong order identification in the step B specifically comprise the following steps:
1) setting of relevant preconditions
a) Defining all components;
b) defining a main component array SupportArray as a support vector;
c) defining a corresponding type MissTypeMapper of an error type and a document internal code, and parameters of BillId and ResultMissType;
d) defining that the result of the classification result cross-validation test is subjected to an optimization error CanErrorRate;
2) establishing operation framework of least square support vector machine introducing fruit fly algorithm optimization
a) Standardizing all extracted data to ensure that the data standard is available;
b) giving the iteration times and the population scale of the drosophila algorithm, confirming the fitness function, setting the flight step length, and improving the algorithm to a three-dimensional space to prevent the drosophila algorithm from falling into a local part;
c) selecting an LSSVM algorithm training kernel as a Gaussian kernel RBF;
d) setting the algorithm type as Classication;
e) determining an initial penalty factor and a kernel function, and obtaining an ErrorRate through cross validation after modeling;
f) optimizing two parameters of a penalty factor and a kernel function by using a drosophila algorithm of a component, taking ErrorRate obtained by one-time cross validation as a standard, stopping when the CanErrorRate is met, avoiding excessive optimization, and otherwise damaging generalization capability;
g) inputting the source data into the trained model to obtain a recognition result;
3) and respectively writing the inner code and the classification result of the single data information into the BillId and the ResultMissType of the MissTypeMapper class.
2. The method as claimed in claim 1, wherein the step C of classifying further by mistake is that: and after the wrong bill is identified, further classifying the wrong bill by using a K-Means clustering method.
3. The method as claimed in claim 2, wherein the step of implementing the further classification of the wrong list in step C includes the following steps:
1) clustering framework
a) Identifying the number of types that are desired to be further subdivided by analyzing the likelihood of the wrong subcategory;
b) clustering the result by using a standard K-Means clustering method;
c) when the data belonging to the set sub-category reaches a certain value, the type is considered to be frequently appeared and is classified into a large category of wrong list identification.
CN202010084387.9A 2020-02-10 2020-02-10 Intelligent auditing realization method based on least square support vector machine Active CN111325247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010084387.9A CN111325247B (en) 2020-02-10 2020-02-10 Intelligent auditing realization method based on least square support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010084387.9A CN111325247B (en) 2020-02-10 2020-02-10 Intelligent auditing realization method based on least square support vector machine

Publications (2)

Publication Number Publication Date
CN111325247A CN111325247A (en) 2020-06-23
CN111325247B true CN111325247B (en) 2022-08-02

Family

ID=71168780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010084387.9A Active CN111325247B (en) 2020-02-10 2020-02-10 Intelligent auditing realization method based on least square support vector machine

Country Status (1)

Country Link
CN (1) CN111325247B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669134A (en) * 2020-12-31 2021-04-16 山东浪潮通软信息科技有限公司 Method, equipment and medium for realizing auditing intellectualization through auditing rule machine learning
CN112766391B (en) * 2021-01-26 2023-02-21 浪潮通用软件有限公司 Method, system, equipment and medium for making document

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067044A (en) * 2017-05-31 2017-08-18 北京空间飞行器总体设计部 A kind of finance reimbursement unanimous vote is according to intelligent checks system
WO2018019176A1 (en) * 2016-07-26 2018-02-01 四川长虹电器股份有限公司 Xbrl-based intelligent financial cloud platform system, construction method, and service implementation method
US9886945B1 (en) * 2011-07-03 2018-02-06 Reality Analytics, Inc. System and method for taxonomically distinguishing sample data captured from biota sources
CN107895168A (en) * 2017-10-13 2018-04-10 平安科技(深圳)有限公司 The method of data processing, the device of data processing and computer-readable recording medium
CN108229536A (en) * 2017-12-01 2018-06-29 温州大学 Optimization method, device and the terminal device of classification prediction model
CN108876166A (en) * 2018-06-27 2018-11-23 平安科技(深圳)有限公司 Financial risk authentication processing method, device, computer equipment and storage medium
CN109064304A (en) * 2018-08-03 2018-12-21 四川长虹电器股份有限公司 Finance reimbursement bill automated processing system and method
CN110334640A (en) * 2019-06-28 2019-10-15 苏宁云计算有限公司 A kind of ticket processing method and system
CN110544161A (en) * 2019-08-09 2019-12-06 北京市天元网络技术股份有限公司 financial expense auditing method and device based on automatic extraction of bill data
CN110610346A (en) * 2019-08-07 2019-12-24 北京航空航天大学 Intelligent office automation system workflow instance time prediction analysis

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7007001B2 (en) * 2002-06-26 2006-02-28 Microsoft Corporation Maximizing mutual information between observations and hidden states to minimize classification errors
US20140081652A1 (en) * 2012-09-14 2014-03-20 Risk Management Solutions Llc Automated Healthcare Risk Management System Utilizing Real-time Predictive Models, Risk Adjusted Provider Cost Index, Edit Analytics, Strategy Management, Managed Learning Environment, Contact Management, Forensic GUI, Case Management And Reporting System For Preventing And Detecting Healthcare Fraud, Abuse, Waste And Errors

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886945B1 (en) * 2011-07-03 2018-02-06 Reality Analytics, Inc. System and method for taxonomically distinguishing sample data captured from biota sources
WO2018019176A1 (en) * 2016-07-26 2018-02-01 四川长虹电器股份有限公司 Xbrl-based intelligent financial cloud platform system, construction method, and service implementation method
CN107067044A (en) * 2017-05-31 2017-08-18 北京空间飞行器总体设计部 A kind of finance reimbursement unanimous vote is according to intelligent checks system
CN107895168A (en) * 2017-10-13 2018-04-10 平安科技(深圳)有限公司 The method of data processing, the device of data processing and computer-readable recording medium
CN108229536A (en) * 2017-12-01 2018-06-29 温州大学 Optimization method, device and the terminal device of classification prediction model
CN108876166A (en) * 2018-06-27 2018-11-23 平安科技(深圳)有限公司 Financial risk authentication processing method, device, computer equipment and storage medium
CN109064304A (en) * 2018-08-03 2018-12-21 四川长虹电器股份有限公司 Finance reimbursement bill automated processing system and method
CN110334640A (en) * 2019-06-28 2019-10-15 苏宁云计算有限公司 A kind of ticket processing method and system
CN110610346A (en) * 2019-08-07 2019-12-24 北京航空航天大学 Intelligent office automation system workflow instance time prediction analysis
CN110544161A (en) * 2019-08-09 2019-12-06 北京市天元网络技术股份有限公司 financial expense auditing method and device based on automatic extraction of bill data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
分类提取法在产品质量管理中的应用;沈克佳等;《一重技术》;20070615(第03期);全文 *

Also Published As

Publication number Publication date
CN111325247A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN111950932B (en) Comprehensive quality portrait method for small and medium-sized micro enterprises based on multi-source information fusion
CN111325247B (en) Intelligent auditing realization method based on least square support vector machine
CN108664574A (en) Input method, terminal device and the medium of information
CN109766428A (en) Data query method and apparatus, data processing method
CN104216876A (en) Informative text filter method and system
CN112163424A (en) Data labeling method, device, equipment and medium
Deng et al. Semi-supervised learning based fake review detection
CN110674296B (en) Information abstract extraction method and system based on key words
CN106327321A (en) Method for automatically generating financial data
CN113590698A (en) Artificial intelligence technology-based data asset classification modeling and hierarchical protection method
CN110309234A (en) A kind of client of knowledge based map holds position method for early warning, device and storage medium
CN106611016B (en) A kind of image search method based on decomposable word packet model
CN112328868A (en) Credit evaluation and credit granting application system and method based on information data
CN115186780A (en) Discipline knowledge point classification model training method, system, storage medium and equipment
Sabadin et al. Improving the identification of haploid maize seeds using convolutional neural networks
CN109992761A (en) The rule-based adaptive text information extracting method of one kind and software memory
CN115168345B (en) Database classification method, system, device and storage medium
CN116226108A (en) Data management method and system capable of realizing different management degrees
CN116127194A (en) Enterprise recommendation method
CN108242019A (en) The monitoring method and system of the taxable sales volume of small-scale taxpayer year based on SPARK
CN113592512A (en) Online commodity identity uniqueness identification and confirmation system
Deng et al. Detection method of wood skin defects based on Bag-of-words model
US20240143632A1 (en) Extracting information from documents using automatic markup based on historical data
CN108897888A (en) It is man-machine to white silk method under voice customer service training scene
CN113127597A (en) Processing method and device for search information and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220713

Address after: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Applicant after: Inspur Genersoft Co.,Ltd.

Address before: 250100 No. 2877 Kehang Road, Sun Village Town, Jinan High-tech District, Shandong Province

Applicant before: SHANDONG INSPUR GENESOFT INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant