CN108876166A

CN108876166A - Financial risk authentication processing method, device, computer equipment and storage medium

Info

Publication number: CN108876166A
Application number: CN201810675971.4A
Authority: CN
Inventors: 潘庚生; 罗斌; 魏尧东
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2018-11-23
Also published as: WO2020000688A1

Abstract

The invention discloses a kind of financial risk authentication processing method, device, computer equipment and storage mediums.The financial risk authentication processing method includes：Financial approval request is obtained, the financial approval request includes fiscal agency agreement data；Feature extraction is carried out to the fiscal agency agreement data, obtains feature item data；The feature item data is input to the target risk identification model based on decision Tree algorithms and carries out risk identification, obtains target risk grade；Classification processing is carried out to the fiscal agency agreement data according to the target risk grade, the fiscal agency agreement data are assigned to junior's process flow corresponding with the target risk grade, to carry out corresponding risk verifying to the fiscal agency agreement data.The financial risk authentication processing method is based on target risk identification model and carries out risk identification, carries out classification processing according to risk class, improves the examination & approval efficiency of fiscal agency agreement data, saves the workload of fiscal agency agreement data examination & approval and reduces the cost manually examined.

Description

Financial risk authentication processing method, device, computer equipment and storage medium

Technical field

The present invention relates to data processing fields more particularly to a kind of financial risk authentication processing method, device, computer to set Standby and storage medium.

Background technique

Risk identification refers to before operation in the course of work and things that working result is likely to occur carries out extremely Prediction, to work out countermeasure thus a kind of measure for generation of preventing accident.Fiscal agency agreement data refer to number relevant to fiscal agency agreement According to for example, data relevant to the fiscal agency agreements such as expenses claim sheer, loan application list, invoice, outbound list and storage bill.In wealth When business reimbursement, there may be fictitious bill, submit an expense account and fill in the financial risks such as field errors more, it is current main by manually checking The mode of fiscal agency agreement data carries out risk verifying, and time-consuming for process, and the efficiency of risk identification is lower.

Summary of the invention

The embodiment of the present invention provides a kind of financial risk authentication processing method, device, computer equipment and storage medium, with It solves current manual and checks fiscal agency agreement data, existing the problems such as time-consuming and low efficiency when carrying out risk identification.

A kind of financial risk authentication processing method, including：

Financial approval request is obtained, the financial approval request includes fiscal agency agreement data；

Feature extraction is carried out to the fiscal agency agreement data, obtains feature item data；

The feature item data is input to the target risk identification model based on decision Tree algorithms and carries out risk identification, is obtained Take target risk grade；

Classification processing is carried out to the fiscal agency agreement data according to the target risk grade, by the fiscal agency agreement data It is assigned to junior's process flow corresponding with the target risk grade, to carry out corresponding wind to the fiscal agency agreement data Danger verifying.

A kind of financial risk verification processing device, including：

Financial approval request module, for obtaining financial approval request, the financial approval request includes that finance are single According to data；

Feature item data extraction module obtains feature item data for carrying out feature extraction to the fiscal agency agreement data；

Target risk grade obtains module, for the feature item data to be input to the target wind based on decision Tree algorithms Dangerous identification model carries out risk identification, obtains target risk grade；

Fiscal agency agreement data allocation module, for being divided according to the target risk grade the fiscal agency agreement data The fiscal agency agreement data are assigned to junior's process flow corresponding with the target risk grade, to institute by class processing It states fiscal agency agreement data and carries out corresponding risk verifying.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned financial risk verification processing side when executing the computer program The step of method.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter The step of calculation machine program realizes above-mentioned financial risk authentication processing method when being executed by processor.

Above-mentioned financial risk authentication processing method, device, computer equipment and storage medium obtain financial approval request, It obtains fiscal agency agreement data and extracts feature item data, using the target risk identification model based on decision Tree algorithms to feature Item data identified, can the higher target risk grade of quick obtaining accuracy rate；According to the difference of target risk grade, by wealth Business risk class is assigned to junior corresponding with target risk grade process flow, is divided with realizing fiscal agency agreement data Class processing can effectively improve the examination & approval efficiency of fiscal agency agreement data to carry out corresponding risk verifying, save fiscal agency agreement number According to the workload of examination & approval, and help to reduce the cost manually examined.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is the application environment schematic diagram of financial risk authentication processing method in one embodiment of the invention；

Fig. 2 is the flow chart of financial risk authentication processing method in one embodiment of the invention；

Fig. 3 is another flow chart of financial risk authentication processing method in one embodiment of the invention；

Fig. 4 is another flow chart of financial risk authentication processing method in one embodiment of the invention；

Fig. 5 is another flow chart of financial risk authentication processing method in one embodiment of the invention；

Fig. 6 is another flow chart of financial risk authentication processing method in one embodiment of the invention；

Fig. 7 is another flow chart of financial risk authentication processing method in one embodiment of the invention；

Fig. 8 is the schematic diagram of financial risk verification processing device in one embodiment of the invention；

Fig. 9 is the schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

Financial risk authentication processing method provided in an embodiment of the present invention, can be applicable in the application environment such as Fig. 1, In, client is communicated by network with server, and the fiscal agency agreement data for being uploaded to server to client carry out Risk identification.Wherein, client is also known as user terminal, refers to corresponding with server, provides the program of local service for client. Client it is mountable but be not limited to various personal computers, laptop, smart phone, tablet computer and portable wear It wears in equipment.Server can be realized with the server cluster of the either multiple server compositions of independent server.

In one embodiment, it as shown in Fig. 2, providing a kind of financial risk authentication processing method, is applied in this way in Fig. 1 In server for be illustrated, which is specially the server of financial system.The financial risk authentication processing method Include the following steps：

S10：Financial approval request is obtained, financial approval request includes fiscal agency agreement data.

Wherein, fiscal agency agreement data refer to data relevant to fiscal agency agreement, for example, with expenses claim sheer, loan application The relevant data of the fiscal agency agreements such as list, invoice, outbound list and storage bill.In the present embodiment, fiscal agency agreement data are to need to import The data of financial system progress risk verifying.The fiscal agency agreement data can be external service system and directly be pushed to financial system Bills data, be also possible to financial system and call directly the bills data that external service system uploads in shared data bank. The external service system can be reimbursement system.Financial approval request is to carry out for triggering financial system to fiscal agency agreement data The request of risk examination & approval.

In the present embodiment, fiscal agency agreement data may include fiscal agency agreement image, list corresponding with fiscal agency agreement image According to type and target declaration data.Wherein, fiscal agency agreement image, which refers to, is shot to fiscal agency agreement or is scanned resulting figure Picture.Document types corresponding with fiscal agency agreement image can be expenses claim sheer, loan application list, invoice, outbound list and enter The types such as library list.Target declaration data is the data of this financial approval request expenditure of being declared, and can specifically include finance Pay the origin of an incident, the financial expenditures amount of money and financial expenditures declarer.Such as " 500 yuan of travel charge is submitted an expense account in user's A application " this target In declaration data, travel charge is the financial expenditures origin of an incident, and 500 yuan are the financial expenditures amount of money, and user A is financial expenditures declarer.

Specifically, user can fill in target declaration data (such as user A application reimbursement travel charge in the client of financial system 500 yuan), fiscal agency agreement image is uploaded after then clicking " addition attachment " button, and select the document class of the fiscal agency agreement image Type, to complete the upload of fiscal agency agreement data；Then, it is requested based on the fiscal agency agreement data-triggered financial approval, by the finance Approval request is sent to the server of financial system, so that server carries out wind to the fiscal agency agreement data in financial approval request Danger verifying, to verify fiscal agency agreement data with the presence or absence of risk.For example, the data in the client of financial system import interface, User upload fiscal agency agreement data after, can click data import interface on " examination & approval " button, triggering and the fiscal agency agreement data Corresponding financial approval request, so that server obtains financial approval request.

S20：Feature extraction is carried out to fiscal agency agreement data, obtains feature item data.

Feature item data is the data of the specific features item extracted by fiscal agency agreement data.Wherein, feature item data packet It includes papers feature, data true and false feature, fill in defect characteristic and credit grade feature.Papers are characterized in for assessing The feature of the true or false of fiscal agency agreement image.The data true and false is characterized in comparing to be extracted in target declaration data and fiscal agency agreement image After picture material out, what is obtained is used to assess the feature of data true or false.Filling in defect characteristic is for assessing this finance Target declaration data fills in the feature whether standardized in approval request.Wherein, filling in defect characteristic may include financial expenditures Whether the origin of an incident meets the pre-set allowable origin of an incident, whether the capital and small letter of the financial expenditures amount of money fills in specification etc..Credit grade It is characterized in the feature of the credit grade for assessing the user for lifting this financial approval request.

In the present embodiment, credit grade feature can be determined by the post for lifting the user of this financial approval request.For example, In any user A of financial system registration, credit grade feature is determined by the post of user A, if user A is conglomerate Financial staff, credit grade feature are height；If user A is some region of financial staff in conglomerate, credit grade During feature is；If user A is the financial staff of the subsidiary under conglomerate's any region, credit grade feature is low.Or Person, credit grade feature can declare record by the history for lifting the user of this financial approval request and determine, as according to user B History declares papers feature, data true and false feature in record and fills in the credit grade that defect characteristic determines user B.

S30：Feature item data is input to the target risk identification model based on decision Tree algorithms and carries out risk identification, is obtained Take target risk grade.

Wherein, decision Tree algorithms are a kind of methods for approaching discrete function value, and substantially decision tree is by a series of rule The process then classified to data.Target risk identification model based on decision Tree algorithms is fiscal agency agreement data for identification Risk class model.In the target risk identification model based on decision Tree algorithms, including tree shaped model framework, this is tree-shaped Model framework includes root node, leaf node and the intermediate node being connected between root node and leaf node, wherein every One intermediate node is connected at least two leaf nodes.In the tree shaped model framework, on root node and each intermediate node A target classification feature is configured, so that the feature item data of input is divided at least two classes based on the target classification feature.Leaf Child node is the node that branch is not present in tree shaped model framework, and each leaf node is configured with corresponding risk class.This reality It applies in example, target risk grade is base after feature item data to be input to the target risk identification model based on decision Tree algorithms After tree shaped model framework sequentially matches target classification feature, the corresponding risk model of final leaf node is determined.

In step s 30, since the root node of tree shaped model framework, call the target classification feature of root node to feature Item data is classified, and determines intermediate node or leaf node belonging to it；If leaf node, then directly by the leaf node Corresponding risk class is determined as target risk grade；If intermediate node, then the corresponding target classification of the intermediate node is called Feature further classifies to feature item data, determines next stage intermediate node belonging to it or leaf node ... with such It pushes away, calls corresponding target classification feature to classify feature item data each intermediate node, until determining the spy The corresponding leaf node of item data is levied, and the corresponding risk class of the leaf node is determined as target risk grade.

In the present embodiment, risk knowledge is carried out to feature item data using the target risk identification model based on decision Tree algorithms During other, have the classification and recognition height that decision Tree algorithms are possessed and the fireballing advantage of Classification and Identification.

S40：According to target risk grade to fiscal agency agreement data carry out classification processing, by fiscal agency agreement data be assigned to The corresponding junior's process flow of target risk grade, to carry out corresponding risk verifying to fiscal agency agreement data.

Wherein, target risk grade can be divided into high-risk grade (hereinafter referred to as high risk) and low risk level (with Lower abbreviation low-risk) two kinds.Junior's process flow refers to requests after carrying out risk examination & approval according to the financial approval, to financial single The next process handled according to data.

Specifically, server carries out classification processing to fiscal agency agreement data according to target risk grade, by fiscal agency agreement number According to junior corresponding with target risk grade process flow is assigned to, refer to the target risk grade according to fiscal agency agreement data Belong to high-risk grade or low risk level, fiscal agency agreement data is assigned to different junior's process flows, to wealth Business bills data carries out corresponding risk verifying, i.e., carries out to high-risk grade and the corresponding fiscal agency agreement data of low risk level Classification processing realizes that intelligent audit fiscal agency agreement data, review efficiency are fast and time-consuming few.

In financial risk authentication processing method provided by the present embodiment, known using the target risk based on decision Tree algorithms The feature item data that other model extracts fiscal agency agreement data identifies, can the higher target risk of quick obtaining accuracy rate Grade；Then, according to target risk grade difference, by financial risk grade be assigned to it is corresponding with target risk grade under Grade process flow is handled, and is carried out classification processing and risk verifying to fiscal agency agreement data to realize, can be effectively improved finance The examination & approval efficiency of bills data, saves the workload of fiscal agency agreement data examination & approval, and helps to reduce the cost manually examined.

In one embodiment, as shown in figure 3, the financial risk authentication processing method further includes：

S51：Training dataset is obtained, training dataset includes at least two training bills datas, each trained document number According to including class label feature and at least two training attributive character.

Wherein, training dataset is the set of trained bills data.Training bills data is for training based on decision tree Bills data used by the target risk identification model of algorithm.Class label is characterized in advance to each trained bills data mark Infuse the feature of risk class.In the present embodiment, class label feature includes high risk and low-risk two.Training attributive character is For limiting the feature for belonging to different attribute in trained bills data.Wherein, training attributive character includes papers feature, number According to true and false feature, credit grade feature and defect characteristic is filled in, that is, trains the characteristic item extracted in attributive character and step S20 Data are corresponding.In order to make it easy to understand, being illustrated for several trained bills datas shown in the table one below.

Table one trains bills data

S52：The class label feature and training attributive character concentrate to training data carry out information gain-ratio calculating, obtain The corresponding information gain-ratio of each trained attributive character.

Wherein, information gain-ratio calculating is the process for calculating the information gain-ratio of trained attributive character.Information gain Rate is a kind of ratio of profit increase, is a mathematics noun, specially the ratio of information gain and division Information Meter.Wherein, information gain is The degree of information uncertainty reduction, information gain are partial to the more feature of value.Division Information Meter, which is characterized, to be divided When branch quantity.Information gain-ratio is the ratio of information gain and division Information Meter, to be increased information using information fiber thinness Benefit standardization, so that accuracy rate is higher when being classified based on information gain-ratio.In the present embodiment, server is to training dataset In class label feature and training attributive character carry out information gain-ratio calculating, it is corresponding to obtain each trained attributive character Information gain-ratio, so as to using classification foundation of the information gain-ratio as model training, can make its assorting process accuracy rate compared with Height helps to improve the recognition rate of trained model.

S53：Target classification feature of the maximum trained attributive character of information gain-ratio as training dataset is chosen, is used Training dataset is divided at least two training subsets by target classification feature.

Wherein, target classification is characterized in the feature divided to training dataset.In the present embodiment, information gain is chosen Target classification feature of the maximum trained attributive character of rate as the training dataset.Training subset is based on target classification feature The set that training dataset is divided.In table one, if calculating the maximum trained attributive character of information gain-ratio is ticket According to true and false feature, then the papers feature is target classification feature, carries out training dataset according to the target classification feature It divides, to mark off the corresponding training subset of true bill, the corresponding training subset of fictitious bill and the corresponding instruction of the uncertain bill of the true and false Practice subset.

S54：Judge whether each training subset is single element data set.

Single element data set refers to data set only comprising an element.In the present embodiment, single element data set is specially Only include the data set of a trained bills data, i.e., does not have in data set comprising two or more different training documents Data.In the present embodiment, judge whether each training subset is single element data set, to determine that can the training subset carry out Further division, even training subset are single element data set, then illustrate that the training subset can not carry out further division；If instruction Practicing subset is not single element data set, then illustrates that the training subset can carry out further division.

S55：If all training subsets are single element data set, the target risk identification based on decision Tree algorithms is obtained Model.

In the present embodiment, after training dataset is divided at least two training subsets using target classification feature, if All training subsets are single element data set, then illustrate at least two training subsets can not further division, complete to be based on The training process of the target risk identification model of decision Tree algorithms.At this point, training subset corresponds to the target risk identification model Tree shaped model framework leaf node, and target classification feature is the upper level intermediate node or root node of the leaf node Characteristic of division.It is to be appreciated that in the target risk identification model based on decision Tree algorithms, the risk etc. of each leaf node Grade can be determined by the class label feature of the training bills data of training subset.If a training subset { A }, A are in training subset Training bills data, if class label feature entrained by A be high risk, determined by the target based on decision Tree algorithms In risk identification model, the risk class of the corresponding leaf node of training subset where A is high risk.

S56：If any training subset is not single element data set, training subset is updated to training dataset, is repeated The class label feature and training attributive character progress information gain-ratio calculating concentrated to training data are executed, each training is obtained The step of attributive character corresponding information gain-ratio.

In the present embodiment, after training dataset is divided at least two training subsets using target classification feature, if Any one training subset at least two training subsets is not single element data set, illustrates that the training subset can carry out further It divides, at this point, the step of training subset not for single element data set is updated to training dataset, repeats step S52, Until all training subsets are single element data set, to complete the training of the target risk identification model based on decision Tree algorithms Process.The training subset marked off such as step S54 is { B, C }, { D, E, F } and { G }, due to { B, C } and { D, E, F } the two Training subset is not single element data set, it is therefore desirable to which by { B, C } and { D, E, F }, the two training subsets are updated to new training Data set, the step of repeating step S52, until all training subsets are single element data set.

In step S51-S56, the training attributive character that selection training data is concentrated is each by calculating as classification foundation The information gain-ratio of training attributive character；Choose mesh of the maximum trained attributive character of information gain-ratio as the training dataset Characteristic of division is marked, training dataset is divided at least two training subsets using the target classification feature；According to training Integrate and whether determine the need for further being classified as single element data set, until all training subsets are single element data When collection, the training process of the target risk identification model based on decision Tree algorithms is completed.Target risk acquired in the present embodiment The classification accuracy of identification model is higher, and classifying rules is clear, when so that the later use risk identification model being identified, Recognition efficiency is very fast.

In one embodiment, as shown in figure 4, in step S52 to training data concentrate class label feature and training Attributive character carries out information gain-ratio calculating, obtains the corresponding information gain-ratio of each trained attributive character, specifically includes as follows Step：

S521：The class label feature that training data is concentrated is calculated using classification information entropy formula, obtains training The classification information entropy of data set, classification information entropy formula areWherein, S is training dataset, Entropy (S) is the classification information entropy of training dataset S, and c is the quantity of the value of class label feature, p_iFor i-th of classification mark Sign the probability of feature.

Classification information entropy is the sum of the uncertainty that various classifications (high risk and low-risk) occur in all samples.Classification Comentropy is bigger, uncertain bigger.As shown in Table 1, since class label feature includes high risk and two kinds of low-risk, The quantity c of the value of category label characteristics is 2.S is training dataset, that is, trains bills data to be formed by set.Training number According to the probability for integrating i-th of class label feature in S as p_i, refer to the quantity and all categories label of i-th of class label feature The ratio of number value of feature.In the present embodiment, the probability of i-th of class label feature is p in training dataset S_iRespectively p₁With p₂, wherein p₁For the probability of high risk class label feature, thenp₂For the probability of low-risk class label feature, thenIn table one, the classification information entropy according to the calculated training dataset of classification information entropy formula is Entropy (S) =-9/14*log₂(9/14)-5/14*log₂(5/14)=0.940.

S522：Each trained attributive character is calculated using Attribute information entropy formula, it is special to obtain each trained attribute The Attribute information entropy of sign, Attribute information entropy formula areWherein, Entropy (S, A) the Attribute information entropy for training attributive character A in training dataset S, v are the branch of training attributive character A, Entropy (S_v) be branch v branch information entropy, | S_v| it is the quantity of branch v, | S | it is the quantity of training dataset S.

The Attribute information entropy of each trained attributive character is a kind of conditional entropy, is referred in training this condition of attributive character Under, the sum of the uncertainty of various classifications appearance.As shown in Table 1, training attributive character A is papers feature, and the bill is true Pseudo-characteristic includes true bill, fictitious bill and the true and false 3 branch v of uncertain bill, the quantity of this 3 branch v is respectively 5,5 and 4, then The probability of true bill, fictitious bill and the uncertain bill of the true and false is respectivelyWithIn the present embodiment, 5 true bills are corresponding Training bills data belong to high risk and the probability of low-risk is respectivelyWithThe corresponding trained bills data of 5 fictitious bills The probability for belonging to high risk and low-risk is respectively 1 and 0, and the corresponding trained bills data of 4 uncertain bills of the true and false belongs to high wind Danger and the basis of low-risk are respectivelyWithThen the Attribute information entropy of papers feature is Similarly, data true and false feature, credit grade feature and the Attribute information entropy for filling in defect characteristic are calculated.

S523：Using information gain formula to the classification information entropy of training dataset and the attribute of each trained attributive character Comentropy is calculated, and obtains the information gain of each trained attributive character, information gain formula be Gain (S, A)= Entropy (S)-Entropy (S, A), wherein Gain (S, A) is that information of the training attributive character A in training dataset S increases Benefit.

Information gain indicates the degree of information uncertainty reduction, if the information gain of a trained attributive character is bigger, Expression can preferably reduce the uncertainty of sample after division when carrying out sample division with this attribute, select the training attribute Feature can be more quickly completed class object.Specifically, special according to classification information entropy Entropy (S) and each trained attribute The Attribute information entropy Entropy (S, A) for levying A, calculates the training using Gain (S, A)=Entropy (S)-Entropy (S, A) The information gain Gain (S, A) of attributive character A.For example, the information gain of papers feature is Gain (S, A)=0.940- 0.6325=0.3075.Similarly, data true and false feature, credit grade feature and the information gain for filling in defect characteristic are calculated.

S524：Each trained attributive character is calculated using division Information Meter formula, it is special to obtain each trained attribute The division Information Meter of sign, division Information Meter formula areWherein, m is training The quantity of the value of attributive character A, S_jThe training attributive character A being worth for m divides j-th of training that training dataset S is formed Collection, | S_j| it is training subset S_jQuantity, | S | be training dataset S quantity.

The division Information Meter of training attributive character refer to when the training attributive character is divided the quantity information of branch with Dimension information, these information are the internal information (Split Information) of training attributive character.As shown in Table 1, training Attributive character A is papers feature, and papers feature includes this 3 values of true bill, fictitious bill and the uncertain bill of the true and false, then M is 3；If classify according to papers feature to training dataset, true bill, fictitious bill and the uncertain bill pair of the true and false The training bills data answered respectively corresponds a training subset, and the quantity of training subset is respectively 5,5 and 4, then true bill, vacation The probability of bill and the uncertain bill of the true and false is respectivelyWithThen according to feature each in papers feature Probability, using division Information Meter formula calculated, the division Information Meter for obtaining the papers feature isSimilarly, the data true and false is calculated Feature, credit grade feature and the division Information Meter for filling in defect characteristic.

S525：The information gain of each trained attributive character and division are believed using information gain-ratio formula Breath degree is calculated, and the information gain-ratio of each trained attributive character is obtained, and information gain-ratio formula is

Since the information gain that the information gain-ratio of any trained attributive character is the training attributive character divides letter with it The ratio of breath degree, so that information gain-ratio is reduced with the increase of division Information Meter, it can be understood as, if any trained attribute Feature itself uncertain (i.e. division Information Meter) is just very big, be more not inclined to choose the training attributive character as classification according to According to.For example, the information gain of papers feature isSimilarly, data are calculated True and false feature, credit grade feature and the information gain-ratio for filling in defect characteristic.

In step S521-S525, successively uses classification information entropy formula, Attribute information entropy formula, information gain formula, divides It splits Information Meter formula and information gain-ratio formula and letter is carried out with training attributive character to the class label feature that training data is concentrated It ceases ratio of profit increase to calculate, the information gain-ratio of each trained attributive character can be obtained, rapidly and accurately to be based on the information gain Rate classifies to training dataset, thus target risk identification model of the building based on decision Tree algorithms.

In one embodiment, fiscal agency agreement data include fiscal agency agreement image and document corresponding with fiscal agency agreement image Type.Fiscal agency agreement image, which refers to, is shot to fiscal agency agreement or is scanned resulting image.It is corresponding with fiscal agency agreement image Document types can be the fiscal agency agreements such as expenses claim sheer, loan application list, invoice, outbound list and storage bill.Feature item number According to including papers feature, papers are characterized in the feature of the true or false for assessing fiscal agency agreement image.The present embodiment In, papers feature includes true bill, fictitious bill and the uncertain bill of the true and false.Wherein, true bill is after carrying out authenticity verification, really Fixed its is true bill；Fictitious bill is to determine its false bill after carrying out authenticity verification；The uncertain bill of the true and false is to carry out very After puppet verifying, the bill of its true and false can not be determined.

As shown in figure 5, carrying out feature extraction to fiscal agency agreement data in step S20, feature item data is obtained, it is specific to wrap Include following steps：

S21：If document types are invoice type, invoice is carried out to fiscal agency agreement image using invoice OCR identification facility Truth identification determines that the papers feature of fiscal agency agreement image is true bill or fictitious bill.

Wherein, invoice OCR identification facility is the tool for being exclusively used in identification true from false of bills, built in invoice OCR identification facility On the server of financial system, so that server calls invoice OCR identification facility carries out true from false of bills knowledge to invoice type Not.Specifically, if document types corresponding with fiscal agency agreement image are invoice type, built-in hair is called on the server Ticket OCR identification facility carries out true from false of bills identification to the fiscal agency agreement image, to determine the papers of the fiscal agency agreement image Feature is true bill or fictitious bill.

Specifically, the invoice OCR identification facility built in server passes through OCR (Optical Character Recognition, optical character identification) technology to document types be invoice type fiscal agency agreement image identify, obtain Invoice codes, invoice number, the invoice informations such as date and the amount of money of making out an invoice in fiscal agency agreement image, and acquired invoice is believed Breath imports invoice examination platform and is verified, and the verification result of invoice examination platform feedback is obtained, to determine the fiscal agency agreement figure As being true bill or fictitious bill.It is flat that invoice examination platform specifically can be the examination of State Tax Administration's whole nation VAT invoice Platform is also possible to other platforms that can realize invoice examination.In the present embodiment, work is identified using the invoice OCR built in server Have and quick truth identification is carried out to the fiscal agency agreement image that document types are invoice type, to determine the ticket of its fiscal agency agreement image It is true bill or fictitious bill according to true and false feature, this financial approval request is determined based on the papers feature so as to subsequent Risk realizes risk profile purpose.

S22：If document types are non-invoice type, it is determined that the papers feature of fiscal agency agreement image is that the true and false is uncertain Bill.

Wherein, non-invoice type refers to all document types in addition to invoice type, including expenses claim sheer, loaning bill Shen Please singly, the fiscal agency agreements such as outbound list and storage bill.Know since non-invoice type type is more, and currently without the dedicated true and false Other tool identifies papers, therefore, the bill for the fiscal agency agreement image that document types are non-invoice type is true Pseudo-characteristic is determined as the uncertain bill of the true and false.

It include fiscal agency agreement image and corresponding with the fiscal agency agreement image in fiscal agency agreement data in step S21-S22 Document types when, if document types are invoice type, which can quickly be determined using invoice OCR identification facility According to papers feature be true bill or fictitious bill；If document types are non-invoice type, it is single to directly determine the finance It is the uncertain bill of the true and false according to the papers feature of data, so that according to identified papers feature to fiscal agency agreement data Risk identified, to achieve the purpose that intelligent checks financial risk.

In one embodiment, fiscal agency agreement data further include target declaration data, which is this finance The financial expenditures origin of an incident, the financial expenditures amount of money and the financial expenditures declarer of the approval request expenditure of being declared.Feature item data packet Data true and false feature is included, which is characterized in comparing in the image extracted in target declaration data and fiscal agency agreement image Rong Hou, what is obtained is used to assess the feature of data true or false.

As shown in figure 5, carrying out feature extraction to fiscal agency agreement data in step S20, feature item data is obtained, further includes Following steps：

S23：Text identification is carried out to fiscal agency agreement image using target text identification model, obtains text feature data.

Target text identification model is trained in advance for carrying out the model of text identification to image.Text feature number According to being the characteristic that obtains after carrying out text identification to fiscal agency agreement image.For example, the fiscal agency agreement image of invoice type is known Not Chu text feature data include charge type, unit, quantity, unit price, the amount of money, the tax rate and amount of tax to be paid etc. and costs related number According to also including the data relevant to object such as purchaser and seller, further include ticket number, bill check code and invoicing time Deng data relevant to bill.

In the present embodiment, target text identification model is using long Memory Neural Networks (long-short term in short-term Memory, abbreviation LSTM) to image training content of text for identification model.LSTM is a kind of time recurrent neural net Network is suitable for handling and predicting have time series, and the critical event that time series interval is relatively long with delay.Specifically Ground, fiscal agency agreement image is input in the target text identification model based on LSTM and identifies by server, so that being based on The target text identification model of LSTM can be identified according to the text from the context in fiscal agency agreement image, obtain text Eigen data.

S24：Keyword extraction is carried out to text feature data using keyword extraction algorithm, it is crucial to obtain text feature Word.

Keyword extraction algorithm is the algorithm that keyword is extracted from text.Text feature keyword is using keyword Extraction algorithm carries out the keyword obtained after keyword extraction to text feature item data.Number is declared by target in this present embodiment According to include the financial expenditures origin of an incident and the financial expenditures amount of money etc. and costs related data, also include financial expenditures declarer this with The relevant data of object, so, it, need to be from text when carrying out keyword extraction to text feature data using keyword extraction algorithm Extracted in eigen data to costs related data and data relevant with object, as text feature keyword.

S25：The object matching degree for calculating target declaration data and text feature keyword, determines number according to object matching degree According to true and false feature.

In the present embodiment, the object matching of target declaration data and text feature keyword is calculated using similarity algorithm Degree.The similarity algorithm can for cosine similarity algorithm, editing distance algorithm or other can calculate the calculation of text similarity Method.Due to target declaration data and text feature keyword include to costs related data and data relevant with object, Then need to calculate separately in target declaration data and text feature keyword with the first object matching degree of costs related data and Second object matching degree of data relevant to object determines the finance according to first object matching degree and the second object matching degree The data true and false feature of bills data.Wherein, object matching degree includes first object matching degree and the second object matching degree.

For example, travel charge is financial expenditures thing in " 500 yuan of travel charge is submitted an expense account in user's A application " this target declaration data By 500 yuan are the financial expenditures amount of money, and user A is financial expenditures declarer.If in text feature keyword comprising " travel charge " and " 500 yuan " etc. and costs related data, then the first object matching degree of assertive goal declaration data and text feature keyword Exact matching；Conversely, if not comprising " travel charge " and " 500 yuan " etc. and costs related data in text feature keyword, The then first object matching degree Incomplete matching of assertive goal declaration data and text feature keyword.Correspondingly, Ruo Wenbente Levying the purchaser in keyword is " user A ", then the second object matching of assertive goal declaration data and text feature keyword Degree exact matching；Conversely, then the second object matching degree Incomplete matching of assertive goal declaration data and text feature keyword.

In the present embodiment, data true and false feature is determined according to object matching degree, is specifically included：If first object matching degree and Second object matching degree is exact matching, then the data true and false feature of fiscal agency agreement data is true data；If first object It is Incomplete matching at least one of degree and the second object matching, then the data true and false feature of fiscal agency agreement data is mantissa According to.

In step S23-S25, fiscal agency agreement image is identified using preparatory trained target text identification model, Quick, intelligent acquisition text feature data can be achieved, without manually carrying out school to the text feature data in fiscal agency agreement image It is right；Then, text feature keyword is extracted from text feature data using keyword extraction algorithm, utilizes this article eigen Keyword and target declaration data determine object matching degree, so that data true and false feature is determined according to object matching degree, thus real Whether the now function of intelligent document examination, match without desk checking fiscal agency agreement image and target declaration data, to avoid target There are problems that data are played tricks in declaration data.

It in one embodiment, should before the step of carrying out feature extraction to fiscal agency agreement data, obtain feature item data Financial risk authentication processing method further includes target text identification model of the preparatory training based on long Memory Neural Networks in short-term Step.As shown in fig. 6, target text identification model of the preparatory training based on long Memory Neural Networks in short-term, specifically include as Lower step：

S61：Training text image is obtained, and training text image is divided into training set and test set by preset ratio.

Wherein, training text image is to obtain from open source library in advance for carrying out the text image of model training, should It include content of text in text image, text content is made of the Chinese characters such as handwritten word and unscripted word.Specifically, training text The content of text that may include handwritten word composition in image, also may include the content of text of unscripted word composition, or including The content of text of handwritten word and non-textual word composition.The training text image can be using expenses claim sheer, loan application list, hair The image of the fiscal agency agreements such as ticket, outbound list and storage bill, so that the target text identification model that training obtains has specific aim, It identifies more acurrate.

Wherein, training set (training set) is learning sample data set, is to be established by matching some parameters point Class device, i.e., using the training text image in training set come training machine learning model, to determine the parameter of machine learning model. Test set (test set) is the resolution capability for testing trained machine learning model, such as accuracy rate.Preset ratio is The pre-set ratio for being divided to training text image.It, can be according to 9 in the present embodiment:1 ratio is to training text This image is divided, can be using 90% training text image as training set, the training text image conduct of residue 10% Test set.

S62：Single font cutting is carried out to training text image using vertical projection method, obtains training single font image, and To training single font image carry out sequence mark, training single font sample is obtained.

Vertical projection method refers to the projection that vertical direction is carried out to training text image, obtains the side of vertical projective histogram Method, vertical projective histogram reflect the number of pixels of training text image in vertical direction.

Specifically, single font cutting is carried out to training text image using upright projection method to include the following steps：Service Device is scanned at least a line text in training text image line by line, the corresponding pixel value of each text is obtained, according to every The corresponding vertical projective histogram of one pixel value obtains the corresponding pixel quantity of different pixel values, according to vertical projective histogram In minimum, circulation cutting carried out to training text image, obtains training single font image.It is to be appreciated that each Chinese The corresponding pixel value of word compares concentration, between Chinese character and Chinese character the corresponding pixel value in gap be than sparse, it is corresponding The concentration of pixel value is reacted in corresponding vertical projective histogram, then has the pixel of Chinese character in vertical projective histogram It is worth that corresponding pixel quantity is relatively high, the corresponding pixel quantity of the pixel value of Chinese character is not relatively low, passes through vertical projection method's energy It is enough that single font cutting effectively is carried out to training text image, training single font image is obtained, realizes that process is simple, helps to improve Training effectiveness.

Wherein, this institute, which cuts, to style of writing each in training text image is referred to training single font image progress sequence mark Training single font image out is labeled, from left to right so that the training single font sample acquired according to cutting sequence Originally has timing.

S63：Training single font sample in training set is sequentially input to long Memory Neural Networks in short-term according to mark In be trained, be updated, obtained former using network parameter of the batch gradient descent algorithm to long Memory Neural Networks in short-term Beginning text identification model.

Urtext identification model is by the long obtained model of Memory Neural Networks successive ignition in short-term.Length is remembered in short-term Recalling neural (long-short term memory, abbreviation LSTM) network is a kind of time recurrent neural network, is suitable for handling There is time series, and the critical event that time series interval is relatively long with delay with prediction.

Batch gradient descent algorithm is every time when updating network parameter, using all trained single font samples in training set Originally it is updated, globally optimal solution can be obtained, improve model accuracy rate.Network parameter is long Memory Neural Networks in short-term Weight and biasing between each layer.Long Memory Neural Networks in short-term have the function of time memory, thus are used to handle carrying timing The training single font sample of state.

Long Memory Neural Networks in short-term have the network structure an of input layer, at least one hidden layer and an output layer.Its In, input layer is the first layer of long Memory Neural Networks in short-term, for receiving outer signals, that is, is responsible for receiving training single font sample This.Output layer is the last layer of long Memory Neural Networks in short-term, for outputing signal to the outside, that is, is responsible for output length and remembers in short-term Recall the calculated result of neural network.Hidden layer is each layer in long Memory Neural Networks in short-term in addition to input layer and output layer, For handling training single font sample, the calculated result of long Memory Neural Networks in short-term is obtained.It is to be appreciated that due to Training single font sample has timing, can be used long Memory Neural Networks in short-term based on context to training single font sample into Row training, to improve the recognition accuracy of acquired urtext identification model.In the present embodiment, long short-term memory nerve The output layer of network carries out recurrence processing using Softmax (regression model), for output weight matrix of classifying.Softmax (is returned Return model) it is a kind of classification function for being usually used in neural network, the output of multiple neurons is mapped in [0,1] section by it, Can be understood as probability, calculate it is simple and convenient, to make it export result more acurrate to carry out outputs of classifying more.

S64：Urtext identification model is tested using the training single font sample in test set, obtains original text The recognition accuracy of this identification model obtains target text identification model when recognition accuracy is greater than default accuracy rate.

Wherein, default accuracy rate is pre-set accuracy rate threshold value, for assessing whether text identification model conforms to The threshold value asked.Specifically, trained single font sample input urtext identification models all in test set are tested, is obtained The recognition accuracy of urtext identification model, the recognition accuracy are that the training accurate quantity of single font sample predictions result is removed With the quantity of trained single font samples all in training set.Wherein, training single font sample predictions result accurately refers to that training is single The prediction result of font sample marks identical with it.Then, then judge whether the recognition accuracy of urtext identification model is big Assert that the urtext identification model is more accurate, by the original if recognition accuracy is greater than default accuracy rate in default accuracy rate Beginning text identification model is determined as target text identification model；If recognition accuracy is not more than default accuracy rate, the original is assert Beginning text identification model is not accurate enough, needs re -training and tests, until recognition accuracy reaches default accuracy rate, stops instruction Practice, to ensure the recognition accuracy of target text identification model.

In the present embodiment, made using fiscal agency agreement image as training text image with training objective text identification model It must train the target text identification model obtained that there is specific aim, recognition accuracy is higher.Then, using vertical projection method to instruction Practice text image to be cut, and to the training single font image carry out sequence mark that cutting obtains, obtains training single font sample This, so that training single font sample has timing.And the training single font sample input length with timing is remembered in short-term Recall neural network to be trained, so that based on context long Memory Neural Networks in short-term can instruct training single font sample Practice, is updated using network parameter of the batch gradient descent algorithm to long Memory Neural Networks in short-term, obtains urtext and know Other model improves model accuracy rate.Urtext identification model is tested using the training single font sample in test set, When test accuracy rate is greater than default accuracy rate, target text identification model is obtained, further ensures that target text identification model Recognition accuracy.

In one embodiment, as shown in fig. 7, being carried out according to target risk grade to fiscal agency agreement data in step S40 Fiscal agency agreement data are assigned to junior corresponding with target risk grade process flow, specifically included as follows by classification processing Step：

S41：Classified according to target risk grade to fiscal agency agreement data, determines that fiscal agency agreement data are number to be circulated According to or pending batch data.

Specifically, target risk grade can be divided into high-risk grade and low risk level in financial system, can be pre- First the corresponding fiscal agency agreement data of setting high-risk grade need to be by artificial further examination & approval, therefore, can be by high-risk grade pair The fiscal agency agreement data answered are determined as pending batch data；Correspondingly, the corresponding fiscal agency agreement data of low risk level are without passing through It is manually examined, can directly carry out next circulation processing links, it therefore, can be by the corresponding fiscal agency agreement data of low risk level It is determined as data to be circulated.

S42：Pending batch data is assigned to examination & approval processing links corresponding with target risk grade.

Examination & approval allocation table is pre-configured in financial system, the examination & approval allocation table is corresponding for storing different risk class Approving person's information of fiscal agency agreement data.Server is needed when determining fiscal agency agreement data is pending batch data according to target Risk class inquires the examination & approval allocation table, determines approving person's information corresponding with the target risk grade, and will be financial single The corresponding examination & approval processing links of approving person's information are sent to according to data, so that corresponding approving person can check and examine The fiscal agency agreement data.

S43：Data to be circulated are assigned to circulation processing links.

In financial system, data to be circulated are the corresponding fiscal agency agreement data of low risk level, in financial system, nothing It need to be by manually being examined, therefore the data to be circulated can be assigned to circulation processing links by server, to realize to the wealth Business bills data is further processed.The circulation processing links can be the links such as finance payment or financial declaration, according to wealth The type of business bills data is different, and corresponding circulation processing links are different.

In the present embodiment, the corresponding fiscal agency agreement data of high-risk grade are distributed as pending batch data and target risk The corresponding examination & approval processing links of grade help to cut down the financial risk further to examine the fiscal agency agreement data, improve and examine The efficiency of batch processing link, and save examination & approval cost；And the corresponding fiscal agency agreement data of low risk level are used as data to be circulated The circulation processing links of next stage are assigned directly to, the circulation efficiency of fiscal agency agreement data is helped to improve.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, a kind of financial risk verification processing device is provided, the financial risk verification processing device with it is upper Financial risk authentication processing method in embodiment is stated to correspond.As shown in figure 8, the financial risk verification processing device includes wealth Approval request acquisition module 10, feature item data extraction module 20, the target risk grade of being engaged in obtain module 30 and fiscal agency agreement number According to distribution module 40.Detailed description are as follows for each functional module：

Financial approval request module 10, for obtaining financial approval request, financial approval request includes fiscal agency agreement Data.

Feature item data extraction module 20 obtains feature item data for carrying out feature extraction to fiscal agency agreement data.

Target risk grade obtains module 30, for feature item data to be input to the target risk based on decision Tree algorithms Identification model carries out risk identification, obtains target risk grade.

Fiscal agency agreement data allocation module 40, for being carried out at classification according to target risk grade to fiscal agency agreement data Reason, fiscal agency agreement data are assigned to junior corresponding with target risk grade process flow, with to fiscal agency agreement data into The corresponding risk verifying of row.

Specifically, financial risk verification processing device further includes training dataset acquiring unit 51, information gain-ratio acquisition Unit 52, target classification feature acquiring unit 53, single element data set judging unit 54,55 and of risk identification model acquiring unit Training dataset updating unit 56.

Training dataset acquiring unit 51, for obtaining training dataset, training dataset includes that at least two training are single According to data, each trained bills data includes class label feature and at least two training attributive character.

Information gain-ratio acquiring unit 52, class label feature for being concentrated to training data and training attributive character into Row information ratio of profit increase calculates, and obtains the corresponding information gain-ratio of each trained attributive character.

Target classification feature acquiring unit 53, for choosing the maximum trained attributive character of information gain-ratio as training number According to the target classification feature of collection, training dataset is divided by least two training subsets using target classification feature.

Single element data set judging unit 54, for judging whether each training subset is single element data set.

Risk identification model acquiring unit 55, if being single element data set for all training subsets, acquisition is based on The target risk identification model of decision Tree algorithms.

Training dataset updating unit 56, if not being single element data set for any training subset, by training subset It is updated to training dataset, repeats the class label feature and training attributive character progress information increasing concentrated to training data The step of beneficial rate calculates, and obtains each trained attributive character corresponding information gain-ratio.

Specifically, information gain-ratio acquiring unit 52 includes classification information entropy computation subunit 521, Attribute information entropy calculating Subelement 522, information gain computation subunit 523, division Information Meter computation subunit 524 and information gain-ratio computation subunit 525。

Classification information entropy computation subunit 521, for what is concentrated using classification information entropy formula to training data Class label feature is calculated, and the classification information entropy of training dataset is obtained, and classification information entropy formula isWherein, S is training dataset, and Entropy (S) is the classification information of training dataset S Entropy, c are the quantity of the value of class label feature, p_iFor the probability of i-th of class label feature.

Attribute information entropy computation subunit 522, for using Attribute information entropy formula to each trained attribute Feature is calculated, and the Attribute information entropy of each trained attributive character is obtained, and Attribute information entropy formula isWherein, Entropy (S, A) is training attributive character A in training dataset S Attribute information entropy, v be training attributive character A branch, Entropy (S_v) be branch v branch information entropy, | S_v| it is branch v Quantity, | S | be training dataset S quantity.

Information gain computation subunit 523, for using information gain formula to the classification information entropy of training dataset and The Attribute information entropy of each trained attributive character is calculated, and the information gain of each trained attributive character, information gain are obtained Formula is Gain (S, A)=Entropy (S)-Entropy (S, A), wherein Gain (S, A) is training attributive character A in training Information gain in data set S.

Information Meter computation subunit 524 is divided, for using division Information Meter formula to each trained attribute Feature is calculated, and the division Information Meter of each trained attributive character is obtained, and division Information Meter formula isWherein, m is the quantity of the value of training attributive character A, S_jIt is worth for m Training attributive character A segmentation training dataset S formed j-th of training subset, | S_j| it is training subset S_jQuantity, | S | be The quantity of training dataset S.

Information gain-ratio computation subunit 525, for the letter using information gain-ratio formula to each trained attributive character Breath gain and division Information Meter are calculated, and obtain the information gain-ratio of each trained attributive character, information gain-ratio formula is

Specifically, feature item data extraction module 20 includes that the first papers determination unit 21 and the second papers are true Order member 22.

First papers determination unit 21 uses invoice OCR identification facility if being invoice type for document types True from false of bills identification is carried out to fiscal agency agreement image, determines that the papers feature of fiscal agency agreement image is true bill or false tickets According to.

Second papers determination unit 22, if being non-invoice type for document types, it is determined that fiscal agency agreement image Papers feature be the uncertain bill of the true and false.

Specifically, feature item data extraction module 20 further includes the sub- document 23 of text feature data acquisition, text feature pass Keyword obtains subelement 24 and data true and false feature obtains subelement 25.

The sub- document 23 of text feature data acquisition, for carrying out text to fiscal agency agreement image using target text identification model This identification obtains text feature data.

Text feature keyword obtains subelement 24, for being closed using keyword extraction algorithm to text feature data Keyword extracts, and obtains text feature keyword.

Data true and false feature obtains subelement 25, for calculating the target of target declaration data Yu text feature keyword With degree, data true and false feature is determined according to object matching degree.

Specifically, financial risk verification processing device further includes training text image acquisition unit 61, training single font sample This acquiring unit 62, urtext identification model acquiring unit 63 and target text identification model acquiring unit 64.

Training text image acquisition unit 61, for obtaining training text image, and by training text image by default ratio Example is divided into training set and test set.

Training single font sample acquisition unit 62 is cut for carrying out single font to training text image using vertical projection method It cuts, obtains training single font image, and to training single font image carry out sequence mark, obtain training single font sample.

Urtext identification model acquiring unit 63, for the training single font sample in training set is suitable according to what is marked Sequence is input in long Memory Neural Networks in short-term and is trained, using batch gradient descent algorithm to long Memory Neural Networks in short-term Network parameter be updated, obtain urtext identification model.

Target text identification model acquiring unit 64, for using the training single font sample in test set to urtext Identification model is tested, and the recognition accuracy of urtext identification model is obtained, and is greater than default accuracy rate in recognition accuracy When, obtain target text identification model.

Specifically, fiscal agency agreement data allocation module 40 includes data classification determination unit 41, the first allocation processing unit 42 and the second allocation processing unit 43.

Data classification determination unit 41 determines wealth for classifying according to target risk grade to fiscal agency agreement data Bills data be engaged in as data to be circulated or pending batch data.

First allocation processing unit 42, for being assigned to pending batch data at examination & approval corresponding with target risk grade Manage link.

Second allocation processing unit 43, for data to be circulated to be assigned to circulation processing links.

Specific restriction about financial risk verification processing device may refer to above for financial risk verification processing The restriction of method, details are not described herein.Modules in above-mentioned financial risk verification processing device can be fully or partially through Software, hardware and combinations thereof are realized.Above-mentioned each module can be embedded in the form of hardware or independently of the place in computer equipment It manages in device, can also be stored in a software form in the memory in computer equipment, in order to which processor calls execution or more The corresponding operation of modules.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 9.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is used to store the data generated during financial risk verification processing, such as feature item data and target risk Grade etc..The network interface of the computer equipment is used to communicate with external terminal by network connection.The computer program quilt To realize a kind of financial risk authentication processing method when processor executes.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize following steps when executing computer program：Finance are obtained to examine Request is criticized, financial approval request includes fiscal agency agreement data；Feature extraction is carried out to fiscal agency agreement data, obtains feature item number According to；Feature item data is input to the target risk identification model based on decision Tree algorithms and carries out risk identification, obtains target wind Dangerous grade；Classification processing is carried out to fiscal agency agreement data according to target risk grade, fiscal agency agreement data are assigned to and target The corresponding junior's process flow of risk class, to carry out corresponding risk verifying to fiscal agency agreement data.

In one embodiment, following steps are also realized when processor executes computer program：Obtain training dataset, instruction Practicing data set includes at least two training bills datas, and each trained bills data includes class label feature and at least two instructions Practice attributive character；The class label feature and training attributive character concentrate to training data carry out information gain-ratio calculating, obtain The corresponding information gain-ratio of each trained attributive character；The maximum trained attributive character of information gain-ratio is chosen as training data Training dataset is divided at least two training subsets using target classification feature by the target classification feature of collection；Judge each Whether training subset is single element data set；If all training subsets are single element data set, obtain and calculated based on decision tree The target risk identification model of method；If any training subset is not single element data set, training subset is updated to train number According to collection, the class label feature and training attributive character progress information gain-ratio calculating concentrated to training data are repeated, is obtained The step of taking each trained attributive character corresponding information gain-ratio.

In one embodiment, following steps are also realized when processor executes computer program：It is public using classification information entropy Formula calculates the class label feature that training data is concentrated, and obtains the classification information entropy of training dataset, classification information entropy Formula isWherein, S is training dataset, and Entropy (S) is the class of training dataset S Other comentropy, c are the quantity of the value of class label feature, p_iFor the probability of i-th of class label feature；Using Attribute information entropy Formula calculates each trained attributive character, obtains the Attribute information entropy of each trained attributive character, and Attribute information entropy is public Formula isWherein, Entropy (S, A) is training attributive character A in training data Collect the Attribute information entropy in S, v is the branch of training attributive character A, Entropy (S_v) be branch v branch information entropy, | S_v| it is The quantity of branch v, | S | it is the quantity of training dataset S；Using information gain formula to the classification information entropy of training dataset and The Attribute information entropy of each trained attributive character is calculated, and the information gain of each trained attributive character, information gain are obtained Formula is Gain (S, A)=Entropy (S)-Entropy (S, A), wherein and Gain (S, A) it is training attributive character A in training Information gain in data set S；Each trained attributive character is calculated using division Information Meter formula, obtains each training The division Information Meter of attributive character, division Information Meter formula areIts In, m is the quantity of the value of training attributive character A, S_jThe training attributive character A being worth for m divides that training dataset S is formed J training subset, | S_j| it is training subset S_jQuantity, | S | be training dataset S quantity；Using information gain-ratio formula pair The information gain and division Information Meter of each trained attributive character are calculated, and the information gain of each trained attributive character is obtained Rate, information gain-ratio formula are

In one embodiment, following steps are also realized when processor executes computer program：Fiscal agency agreement data include Fiscal agency agreement image and document types corresponding with fiscal agency agreement image；Feature item data includes papers feature；To wealth Business bills data carries out feature extraction, obtains feature item data, including：If document types are invoice type, invoice OCR is used Identification facility carries out true from false of bills identification to fiscal agency agreement image, determines that the papers feature of fiscal agency agreement image is true bill Or fictitious bill；If document types are non-invoice type, it is determined that the papers feature of fiscal agency agreement image is the non-booking of the true and false According to.

In one embodiment, following steps are also realized when processor executes computer program：Fiscal agency agreement data include Target declaration data；Feature item data includes data true and false feature；Feature extraction is carried out to fiscal agency agreement data, obtains characteristic item Data carry out text identification to fiscal agency agreement image using target text identification model, obtain text feature data；Using key Word extraction algorithm carries out keyword extraction to text feature data, obtains text feature keyword；Calculate target declaration data with The object matching degree of text feature keyword determines data true and false feature according to object matching degree.

In one embodiment, following steps are also realized when processor executes computer program：Training text image is obtained, And training text image is divided into training set and test set by preset ratio；Using vertical projection method to training text image into The cutting of row single font obtains training single font image, and to training single font image carry out sequence mark, obtains training single font Sample；Training single font sample in training set is sequentially input to instruct in long Memory Neural Networks in short-term according to mark Practice, is updated using network parameter of the batch gradient descent algorithm to long Memory Neural Networks in short-term, obtains urtext and know Other model；Urtext identification model is tested using the training single font sample in test set, urtext is obtained and knows The recognition accuracy of other model obtains target text identification model when recognition accuracy is greater than default accuracy rate.

In one embodiment, following steps are also realized when processor executes computer program：According to target risk grade Classify to fiscal agency agreement data, determines that fiscal agency agreement data are data to be circulated or pending batch data；By pending batch data It is assigned to examination & approval processing links corresponding with target risk grade；Data to be circulated are assigned to circulation processing links.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes following steps when being executed by processor：Financial approval request is obtained, financial approval request includes fiscal agency agreement number According to；Feature extraction is carried out to fiscal agency agreement data, obtains feature item data；Feature item data is input to based on decision Tree algorithms Target risk identification model carry out risk identification, obtain target risk grade；According to target risk grade to fiscal agency agreement number According to classification processing is carried out, fiscal agency agreement data are assigned to junior corresponding with target risk grade process flow, to wealth Business bills data carries out corresponding risk verifying.

In one embodiment, following steps are also realized when computer program is executed by processor：Training dataset is obtained, Training dataset includes at least two training bills datas, and each trained bills data includes class label feature and at least two Training attributive character；The class label feature and training attributive character concentrate to training data carry out information gain-ratio calculating, obtain Take the corresponding information gain-ratio of each trained attributive character；The maximum trained attributive character of information gain-ratio is chosen as training number According to the target classification feature of collection, training dataset is divided by least two training subsets using target classification feature；Judgement is every Whether one training subset is single element data set；If all training subsets are single element data set, obtain and be based on decision tree The target risk identification model of algorithm；If any training subset is not single element data set, training subset is updated to train Data set repeats the class label feature and training attributive character progress information gain-ratio calculating concentrated to training data, The step of obtaining each trained attributive character corresponding information gain-ratio.

In one embodiment, following steps are also realized when computer program is executed by processor：Using classification information entropy Formula calculates the class label feature that training data is concentrated, and obtains the classification information entropy of training dataset, classification information Entropy formula isWherein, S is training dataset, and Entropy (S) is training dataset S's Classification information entropy, c are the quantity of the value of class label feature, p_iFor the probability of i-th of class label feature；Using attribute information Entropy formula calculates each trained attributive character, obtains the Attribute information entropy of each trained attributive character, Attribute information entropy Formula isWherein, Entropy (S, A) is training attributive character A in training number According to the Attribute information entropy in collection S, v is the branch of training attributive character A, Entropy (S_v) be branch v branch information entropy, | S_v| For the quantity of branch v, | S | it is the quantity of training dataset S；Using information gain formula to the classification information entropy of training dataset It is calculated with the Attribute information entropy of each trained attributive character, obtains the information gain of each trained attributive character, information increases Beneficial formula is Gain (S, A)=Entropy (S)-Entropy (S, A), wherein Gain (S, A) is that training attributive character A is being instructed Practice the information gain in data set S；Each trained attributive character is calculated using division Information Meter formula, obtains each instruction Practice the division Information Meter of attributive character, division Information Meter formula is Wherein, m is the quantity of the value of training attributive character A, S_jThe training attributive character A being worth for m divides what training dataset S was formed J-th of training subset, | S_j| it is training subset S_jQuantity, | S | be training dataset S quantity；Using information gain-ratio formula Information gain and division Information Meter to each trained attributive character calculate, and the information for obtaining each trained attributive character increases Beneficial rate, information gain-ratio formula are

In one embodiment, following steps are also realized when computer program is executed by processor：Fiscal agency agreement data packet Include fiscal agency agreement image and document types corresponding with fiscal agency agreement image；Feature item data includes papers feature；It is right Fiscal agency agreement data carry out feature extraction, obtain feature item data, including：If document types are invoice type, invoice is used OCR identification facility carries out true from false of bills identification to fiscal agency agreement image, determines that the papers feature of fiscal agency agreement image is true Bill or fictitious bill；If document types be non-invoice type, it is determined that the papers feature of fiscal agency agreement image be the true and false not Booking evidence.

In one embodiment, following steps are also realized when computer program is executed by processor：Fiscal agency agreement data packet Include target declaration data；Feature item data includes data true and false feature；Feature extraction is carried out to fiscal agency agreement data, obtains feature Item data carries out text identification to fiscal agency agreement image using target text identification model, obtains text feature data；Using pass Keyword extraction algorithm carries out keyword extraction to text feature data, obtains text feature keyword；Calculate target declaration data With the object matching degree of text feature keyword, data true and false feature is determined according to object matching degree.

In one embodiment, following steps are also realized when computer program is executed by processor：Obtain training text figure Picture, and training text image is divided into training set and test set by preset ratio；Using vertical projection method to training text figure As carrying out single font cutting, training single font image is obtained, and to training single font image carry out sequence mark, it is single to obtain training Font sample；By the training single font sample in training set according to mark be sequentially input in long Memory Neural Networks in short-term into Row training is updated using network parameter of the batch gradient descent algorithm to long Memory Neural Networks in short-term, obtains original text This identification model；Urtext identification model is tested using the training single font sample in test set, obtains original text The recognition accuracy of this identification model obtains target text identification model when recognition accuracy is greater than default accuracy rate.

In one embodiment, following steps are also realized when computer program is executed by processor：According to target risk etc. Grade classifies to fiscal agency agreement data, determines that fiscal agency agreement data are data to be circulated or pending batch data；By pending lot number According to being assigned to examination & approval processing links corresponding with target risk grade；Data to be circulated are assigned to circulation processing links.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that：It still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of financial risk authentication processing method, which is characterized in that including：

The feature item data is input to the target risk identification model based on decision Tree algorithms and carries out risk identification, obtains mesh Mark risk class；

Classification processing is carried out to the fiscal agency agreement data according to the target risk grade, the fiscal agency agreement data are distributed To junior's process flow corresponding with the target risk grade, tested with carrying out corresponding risk to the fiscal agency agreement data Card.

2. financial risk authentication processing method as described in claim 1, which is characterized in that requested in the acquisition financial approval The step of before, the financial risk authentication processing method further includes：

Training dataset is obtained, the training dataset includes at least two training bills datas, each trained bills data packet Include class label feature and at least two training attributive character；

The class label feature and training attributive character concentrate to the training data carry out information gain-ratio calculating, obtain each The corresponding information gain-ratio of the trained attributive character；

Target classification feature of the maximum trained attributive character of information gain-ratio as the training dataset is chosen, using described The training dataset is divided at least two training subsets by target classification feature；

Judge whether each training subset is single element data set；

If all training subsets are single element data set, the target risk identification model based on decision Tree algorithms is obtained；

If any training subset is not single element data set, the training subset is updated to the training dataset, It repeats the class label feature concentrated to the training data and training attributive character carries out information gain-ratio calculating, The step of obtaining each trained attributive character corresponding information gain-ratio.

3. financial risk authentication processing method as claimed in claim 2, which is characterized in that described to be concentrated to the training data Class label feature and training attributive character carry out information gain-ratio calculating, it is corresponding to obtain each trained attributive character Information gain-ratio, including：

The class label feature that the training data is concentrated is calculated using classification information entropy formula, obtains the trained number According to the classification information entropy of collection, the classification information entropy formula isWherein, S is training data Collection, Entropy (S) are the classification information entropy of training dataset S, and c is the quantity of the value of class label feature, p_iFor i-th of class The probability of distinguishing label feature；

Each trained attributive character is calculated using Attribute information entropy formula, obtains each trained attributive character Attribute information entropy, the Attribute information entropy formula isWherein,

Entropy (S, A) is Attribute information entropy of the training attributive character A in training dataset S, and v is training attributive character A Branch, Entropy (S_v) be branch v branch information entropy, | S_v| it is the quantity of branch v, | S | it is the quantity of training dataset S；

Using information gain formula to the classification information entropy of the training dataset and the attribute of each trained attributive character Comentropy is calculated, and the information gain of each trained attributive character is obtained, and the information gain formula is Gain (S, A) =Entropy (S)-Entropy (S, A), wherein Gain (S, A) is information of the training attributive character A in training dataset S Gain；

Each trained attributive character is calculated using division Information Meter formula, obtains each trained attributive character Division Information Meter, the division Information Meter formula isWherein, m is instruction Practice the quantity of the value of attributive character A, S_jThe training attributive character A being worth for m divides j-th of training that training dataset S is formed Subset, | S_j| it is training subset S_jQuantity, | S | be training dataset S quantity；

Using information gain-ratio formula to the information gain of each trained attributive character and division Information Meter It is calculated, obtains the information gain-ratio of each trained attributive character, the information gain-ratio formula is

4. financial risk authentication processing method as described in claim 1, which is characterized in that the fiscal agency agreement data include wealth Business document image and document types corresponding with the fiscal agency agreement image；

The feature item data includes papers feature；

It is described that feature extraction is carried out to the fiscal agency agreement data, feature item data is obtained, including：

If the document types are invoice type, invoice is carried out to the fiscal agency agreement image using invoice OCR identification facility Truth identification determines that the papers feature of the fiscal agency agreement image is true bill or fictitious bill；

If the document types are non-invoice type, it is determined that the papers feature of the fiscal agency agreement image is that the true and false is uncertain Bill.

5. financial risk authentication processing method as described in claim 1, which is characterized in that the fiscal agency agreement data include mesh Mark declaration data；

The feature item data includes data true and false feature；

Text identification is carried out to the fiscal agency agreement image using target text identification model, obtains text feature data；

Keyword extraction is carried out to the text feature data using keyword extraction algorithm, obtains text feature keyword；

The object matching degree for calculating the target declaration data Yu the text feature keyword, it is true according to the object matching degree The fixed data true and false feature.

6. financial risk authentication processing method as claimed in claim 5, which is characterized in that identified described using target text Before the step of model carries out text identification to the fiscal agency agreement image, obtains text feature data, the financial risk is tested Demonstrate,proving processing method further includes：

Training text image is obtained, and the training text image is divided into training set and test set by preset ratio；

Single font cutting is carried out to training text image using vertical projection method, obtains training single font image, and to the instruction Practice single font image carry out sequence mark, obtains training single font sample；

Training single font sample in training set is sequentially input to instruct in long Memory Neural Networks in short-term according to mark Practice, is updated using network parameter of the batch gradient descent algorithm to length Memory Neural Networks in short-term, obtains original text This identification model；

The urtext identification model is tested using the training single font sample in test set, obtains the original text The recognition accuracy of this identification model obtains target text identification model when the recognition accuracy is greater than default accuracy rate.

7. financial risk authentication processing method as described in claim 1, which is characterized in that described according to described target risk etc. Grade carries out classification processing to the fiscal agency agreement data, and the fiscal agency agreement data are assigned to and the target risk grade phase Corresponding junior's process flow, including：

Classified according to the target risk grade to the fiscal agency agreement data, determines that the fiscal agency agreement data are wait flow Revolution evidence or pending batch data；

The pending batch data is assigned to examination & approval processing links corresponding with the target risk grade；

The data to be circulated are assigned to circulation processing links.

8. a kind of financial risk verification processing device, which is characterized in that including：

Financial approval request module, for obtaining financial approval request, the financial approval request includes fiscal agency agreement number According to；

Target risk grade obtains module, knows for the feature item data to be input to the target risk based on decision Tree algorithms Other model carries out risk identification, obtains target risk grade；

Fiscal agency agreement data allocation module, for being carried out at classification according to the target risk grade to the fiscal agency agreement data The fiscal agency agreement data are assigned to junior's process flow corresponding with the target risk grade, to the wealth by reason Business bills data carries out corresponding risk verifying.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The step of any one of 7 financial risk authentication processing method.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization financial risk verification processing side as described in any one of claim 1 to 7 when the computer program is executed by processor The step of method.