CN111783829A - Financial anomaly detection method and device based on multi-label learning - Google Patents

Financial anomaly detection method and device based on multi-label learning Download PDF

Info

Publication number
CN111783829A
CN111783829A CN202010474735.3A CN202010474735A CN111783829A CN 111783829 A CN111783829 A CN 111783829A CN 202010474735 A CN202010474735 A CN 202010474735A CN 111783829 A CN111783829 A CN 111783829A
Authority
CN
China
Prior art keywords
financial
enterprise
label
samples
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010474735.3A
Other languages
Chinese (zh)
Inventor
林康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gf Securities Co ltd
Original Assignee
Gf Securities Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gf Securities Co ltd filed Critical Gf Securities Co ltd
Priority to CN202010474735.3A priority Critical patent/CN111783829A/en
Publication of CN111783829A publication Critical patent/CN111783829A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Abstract

The invention discloses a financial anomaly detection method and device based on multi-label learning, wherein the method comprises the steps of firstly generating a characteristic vector sample of each enterprise according to financial information of the existing enterprises, and then carrying out sample balance according to preset sampling parameters to obtain a training sample set; labeling the training sample set according to a plurality of labels obtained from historical accountability data, and constructing a financial anomaly detection model based on a multi-label learning algorithm; and finally, acquiring financial information of the enterprise to be detected, constructing a sample input vector, inputting the sample input vector into a financial anomaly detection model, and acquiring a detection result. By adopting the technical scheme of the invention, the problems of low accuracy rate caused by few data samples and incapability of detecting financial plastering means can be solved, and the detection accuracy is improved.

Description

Financial anomaly detection method and device based on multi-label learning
Technical Field
The invention relates to the technical field of computers, in particular to a financial anomaly detection method and device based on multi-label learning.
Background
Financial information is important data for measuring enterprises, and the requirement for marketing of the enterprises is high, so that more and more marketing companies whitewash the financial information which needs to be regularly disclosed. These activities bring enormous losses to investors in the capital market and also hinder the healthy development of the domestic market. Enterprises which carry out special audit on enterprise financial information exist in China, and manual audit is generally carried out through audit experts. But the auditing experts find that the financial treatment is very time-consuming and labor-consuming by analyzing the financial information with huge data volume, and different auditing experts can have different opinions on the same problem. It is therefore difficult to analyze the financial status of a listed company on a large scale, let alone to explore the financial status of millions of small and medium-sized micro-enterprises.
Financial anomaly detection based on big data artificial intelligence algorithm modeling provides assistance to this work. At present, foreign countries have many modeling technologies based on financial indexes, which mainly relate to the following indexes: 1. the plastering chance index is the easiness of plastering. Such as the number of holdings of a large shareholder, the number of people at a prison, the number of independent board of directors, etc. 2. For example, to avoid ST, a company may be subsidized with financial data the next year if a year loss occurs. 3. The characteristic of the pasture, i.e. the operation index of the company, relates to a series of indexes such as the turnover rate of accounts receivable, the turnover rate of stock, and the profit margin.
The modeling techniques are widely applied abroad, and the models comprise logistic regression, neural networks, Bayesian networks, decision trees and the like. Besides the above technologies, the support vector machine, the multivariate discrimination method, the index positive and negative probability discrimination method, etc. are available in China. The recognition rate of the foreign model is higher than that of the domestic model, firstly, because the domestic and foreign market information is different, the foreign model and the domestic model have different system backgrounds and different whitewash risk factors; and secondly, because the number of domestic financial charm companies is small, the number of penalized companies is below 50 every year, so that the modeling can be influenced by sample imbalance to reduce the identification accuracy. And the punishment book of the supervision authorities contains a plurality of specific financial charting problems, namely a company can carry out financial charting with a plurality of means, and the company cannot pay attention to whether the financial reports are charted or not during auditing, and also pay attention to which items are charted and which charting means are adopted. Therefore, the motivation and the business scene behind the painting can be combined for inference, which are not considered by the prior art.
Disclosure of Invention
The embodiment of the invention provides a financial anomaly detection method and device based on multi-label learning, which can solve the problems of low accuracy rate caused by few data samples and incapability of detecting financial embellishment means and improve the detection accuracy.
The invention provides a financial abnormity detection method based on multi-label learning, which comprises the following steps:
acquiring financial information of a plurality of enterprises, and generating a feature vector sample of each enterprise according to the financial information of each enterprise; the characteristic vector samples are divided into positive samples and negative samples according to whether the enterprises have financial violations or not;
balancing the quantity of positive samples and negative samples in all the feature vector samples according to preset sampling parameters to obtain a training sample set;
labeling the training sample set according to a plurality of labels obtained from historical accountability data, and constructing a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set after labeling;
acquiring financial information of an enterprise to be detected, and constructing a sample input vector according to the financial information of the enterprise to be detected;
and inputting the sample input vector into the financial abnormity detection model to obtain a multi-label result vector of the enterprise to be detected, and obtaining a detection result of the enterprise to be detected according to the multi-label result vector.
Further, the obtaining financial information of a plurality of enterprises, and generating a feature vector sample of each enterprise according to the financial information of each enterprise specifically includes:
acquiring financial information of a plurality of enterprises in a time region; the financial information includes: an asset liability statement, a profit statement and a cash flow statement;
generating a financial index table, a derivative financial index table and a derivative financial index attached table according to the asset liability table, the profit table and the cash flow table;
and extracting various indexes from the asset liability statement, the profit statement, the cash flow statement, the financial index statement, the derived financial index statement and the derived financial index attached statement, and generating a characteristic vector sample of each enterprise.
Further, according to preset sampling parameters, the number of positive samples and the number of negative samples in all feature vector samples are balanced to obtain a training sample set, which specifically comprises:
increasing the number of the positive samples through random oversampling, reducing the number of the negative samples through random undersampling, and balancing data of the positive samples and the negative samples according to a sampling parameter r to obtain a training sample set;
wherein the sampling parameter
Figure BDA0002515506390000031
NnegIs the number of positive samples; n is a radical ofposIs the number of negative samples.
Further, the plurality of tags obtained from the historical accountability data specifically include:
the historical accountability data comprises: the disclaimer and disclosure material published by the regulatory body, the proprietary data owned by the financial institution, and the legal data owned by the third party data company;
the plurality of labels are respectively: in-table validation, out-of-table disclosure, revenue, fees, assets, liabilities, cash flows, withheld security, withheld party financing, withheld association transactions, withheld major litigation, performance disclosure modification, administrative penalties, and internal control issues.
Further, the method includes the steps of constructing a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set labeled with the labels, and specifically includes the following steps:
and configuring a classifier for the label of each category according to a one-vs-all strategy algorithm, and constructing the financial anomaly detection model by combining the training sample set labeled by the label.
Correspondingly, the invention also provides a financial abnormity detection device based on multi-label learning, which comprises: the system comprises a training sample acquisition module, a sample balance module, a label marking module, a model construction module, a to-be-detected sample acquisition module and a detection module;
the training sample acquisition module is used for acquiring financial information of a plurality of enterprises and generating a characteristic vector sample of each enterprise according to the financial information of each enterprise; the characteristic vector samples are divided into positive samples and negative samples according to whether the enterprises have financial violations or not;
the sample balancing module is used for balancing the number of positive samples and negative samples in all the feature vector samples according to preset sampling parameters to obtain a training sample set;
the label labeling module is used for labeling the training sample set according to a plurality of labels obtained from historical accountability data;
the model construction module is used for constructing a financial abnormity detection model based on a multi-label learning algorithm according to the training sample set labeled with the labels;
the to-be-detected sample acquisition module is used for acquiring financial information of an enterprise to be detected and constructing a sample input vector according to the financial information of the enterprise to be detected;
the detection module is used for inputting the sample input vector into the financial abnormity detection model, obtaining a multi-label result vector of the enterprise to be detected, and obtaining a detection result of the enterprise to be detected according to the multi-label result vector.
Further, the training sample acquisition module comprises a first acquisition unit, a first generation unit and a second generation unit;
the first acquisition unit is used for acquiring financial information of a plurality of enterprises in a time region; the financial information includes: an asset liability statement, a profit statement and a cash flow statement;
the first generation unit is used for generating a financial index table, a derived financial index table and a derived financial index attached table according to the asset liability table, the profit table and the cash flow table;
the second generating unit is used for extracting various indexes from the asset liability statement, the profit statement, the cash flow statement, the financial index statement, the derived financial index statement and the derived financial index attached statement and generating a characteristic vector sample of each enterprise.
Further, the sample balancing module is configured to balance the number of positive samples and the number of negative samples in all feature vector samples according to a preset sampling parameter, so as to obtain a training sample set, specifically:
the sample balancing module increases the number of the positive samples through random oversampling, decreases the number of the negative samples through random undersampling, and balances the data of the positive samples and the negative samples according to a sampling parameter r to obtain a training sample set;
wherein the sampling parameter
Figure BDA0002515506390000051
NnegIs the number of positive samples; n is a radical ofposIs the number of negative samples.
Further, the plurality of tags obtained from the historical accountability data specifically include:
the historical accountability data comprises: the disclaimer and disclosure material published by the regulatory body, the proprietary data owned by the financial institution, and the legal data owned by the third party data company;
the plurality of labels are respectively: in-table validation, out-of-table disclosure, revenue, fees, assets, liabilities, cash flows, withheld security, withheld party financing, withheld association transactions, withheld major litigation, performance disclosure modification, administrative penalties, and internal control issues.
Further, the model construction module is used for constructing a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set labeled with the labels, and specifically comprises:
and the model construction module configures a classifier for each class of label according to a one-vs-all strategy algorithm, and constructs the financial anomaly detection model by combining the training sample set labeled by the label.
In view of the above, the financial anomaly detection method and device based on multi-label learning provided by the invention have the advantages that firstly, the characteristic vector sample of each enterprise is generated according to the financial information of the existing enterprise, and then, the sample balance is carried out according to the preset sampling parameters to obtain the training sample set; labeling the training sample set according to a plurality of labels obtained from historical accountability data, and constructing a financial anomaly detection model based on a multi-label learning algorithm; and finally, acquiring financial information of the enterprise to be detected, constructing a sample input vector, inputting the sample input vector into a financial anomaly detection model, and acquiring a detection result. Compared with the prior art that the accuracy is low or the plastering motivation cannot be obtained due to the lack of the samples, the technical scheme of the invention not only balances the number of the samples and improves the accuracy of the model, but also obtains a plurality of types of labels from historical accountability data, calculates the plastering means based on a multi-label learning method, and cooperatively judges the final financial plastering event by the labels, thereby further improving the detection accuracy. In addition, the invention can also assist audit experts in analyzing and judging financial information disclosed by companies, thereby reducing labor cost and judgment errors and improving the efficiency of financial audit.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for detecting financial anomalies based on multi-tag learning according to the present invention;
FIG. 2 is a schematic flow chart diagram illustrating a method for detecting financial anomalies based on multi-tag learning according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of an embodiment of the financial anomaly detection device based on multi-tag learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a schematic flowchart of an embodiment of a financial anomaly detection method based on multi-label learning according to the present invention is shown. The method shown in fig. 1 includes steps 101 to 105, and each step is as follows:
step 101: acquiring financial information of a plurality of enterprises, and generating a feature vector sample of each enterprise according to the financial information of each enterprise; the feature vector samples are divided into positive samples and negative samples according to whether the enterprises have financial violations or not.
In this embodiment, step 101 specifically includes: acquiring financial information of a plurality of enterprises in a time region; the financial information includes: an asset liability statement, a profit statement and a cash flow statement; generating a financial index table, a derivative financial index table and a derivative financial index attached table according to the asset liability table, the profit table and the cash flow table; and extracting various indexes from the asset liability statement, the profit statement, the cash flow statement, the financial index statement, the derived financial index statement and the derived financial index attached statement, and generating a characteristic vector sample of each enterprise.
In this embodiment, the financial analysis relies primarily on 3 tables, namely the balance sheet, the profit sheet, and the cash flow sheet. The balance sheet reflects the condition of the company's balance, liability and equity at a certain moment. The profit sheet reflects the income, expense, profit and other conditions of the company within a certain period of time, and reveals the operation results of the company. The cash flow table reflects the cash change condition of the company and reveals the trend of cash flow of the company in the operation, investment and financing activities. The index table and the derivative table generated by the invention also mainly depend on the three tables, and the derivative table can include but is not limited to: a financial index table, a derived financial index table, and a derived financial index attached table. A small part of the index table is directly taken from the original data of the three tables, and more core indexes are taken from indexes derived based on the three tables. For example, the financial index analysis systems (repayment ability, profitability, etc.) commonly used in the industry also adopt self-developed analysis indexes as necessary supplements, such as the index of the capital investment proportion of the equity of the stock and the equity of the capitalization.
In this embodiment, data collection may be addressed by purchasing a data supplier, and the data is typically presented to the user in a tabular form stored in a database. The user constructs a required table by database extraction technologies such as Hive, SQL and the like, and main fields comprise dates, enterprise codes and various indexes to form a financial index table, a derivative financial index table and a derivative financial index attached table. Various indexes are extracted from the enterprise data, and a characteristic vector sample of each enterprise is generated. A sample consists of feature vectors for a business over a time interval, which may be a quarter, half year, or even a year.
In this embodiment, the feature vector samples are divided into positive and negative samples before training. If the enterprise does not have a financial violation within the time interval, then the sample is negative, and the negative sample is positive.
Step 102: and balancing the quantity of positive samples and negative samples in all the feature vector samples according to preset sampling parameters to obtain a training sample set.
In this embodiment, sample balancing is achieved by sampling and down-sampling. Because the positive and negative samples output in step 101 are not balanced, the negative samples are many orders of magnitude higher than the positive samples. To reduce the risk of model failure due to sample imbalance, the sampling tool may employ a sample algorithm in a mature Python algorithm library. Negative sampling is carried out on a small amount of positive samples, sampling is carried out on a large amount of negative samples, and a part of the positive samples is selected from a large amount of samples to participate in subsequent model learning. Thus, a training sample set with the same magnitude of positive and negative samples can be obtained.
In this embodiment, step 102 specifically includes: increasing the number of the positive samples through random oversampling, reducing the number of the negative samples through random undersampling, and balancing data of the positive samples and the negative samples according to a sampling parameter r to obtain a training sample set;
wherein the sampling parameter
Figure BDA0002515506390000081
NnegIs the number of positive samples; n is a radical ofposIs the number of negative samples.
In order to avoid over-fitting and under-fitting of features, the sampling degree of two aspects is controlled through a sampling parameter r, and a training sample set is obtained through a sampling algorithm.
Step 103: labeling the training sample set according to a plurality of labels obtained from historical accountability data, and constructing a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set after labeling.
In this embodiment, the historical accountability data includes: the disclaimer and disclosure material published by the regulatory body, the proprietary data owned by the financial institution, and the legal data owned by third party data companies. The plurality of labels are respectively: in-table validation, out-of-table disclosure, revenue, fees, assets, liabilities, cash flows, withheld security, withheld party financing, withheld association transactions, withheld major litigation, performance disclosure modification, administrative penalties, and internal control issues.
In this embodiment, the historical accountability data typically details what financial instrument violation violations were performed by the accountability enterprise at a particular time.
According to the method, the labels of multiple categories are obtained by refining the categories of financial problems according to historical accountability data. And eliminating data which cannot be labeled, wherein the data generally accounts for less than 5% of total violation samples, and modeling work is not influenced. Secondly. For an enterprise with the continuous fraud life of 2 years or more, the enterprise is specifically analyzed according to the specific conditions of each year, namely, the enterprise is divided into a plurality of samples for independent analysis according to the years, and finally the stock codes and the fraud years are identification bases of different samples.
In this embodiment, the tagging is to perform analysis tagging on data in a sample, for example, a certain item of data in a sample belongs to a "revenue" tag, and the sample is marked with 1 corresponding to a certain type of tag, and is marked with 0 in a negative positive way. Thus each exemplar may correspond to a variety of different labels, with one exemplar corresponding to a multi-dimensional vector such as [0,1,1,0, … …,0 ]. This can translate the problem into a typical problem that is solved using multi-label learning, i.e. this can be considered as predicting the properties of data points that are not mutually exclusive.
In this embodiment, a one-vs-all policy algorithm is adopted, a classifier is configured for each class of label, and a financial anomaly detection model is constructed by combining a training sample set labeled by the label. In this strategy, assuming there are n classes, then n binomial classifiers are built, each classifying one of the classes and the remaining classes. When prediction is carried out, the n binomial classifiers are used for classification to obtain the probability that the data belongs to the current class, and the class with the highest probability is selected as a final prediction result. The one-vs-all strategy algorithm adopted by the invention can improve the calculation efficiency (only n classifiers are needed) and also has the interpretability of the algorithm. Since each label is represented by only one classifier, knowledge about the class can be obtained by examining its corresponding classifier.
The present invention can implement the above model construction using, but not limited to, OneVsRestClassifier in Python library. The financial anomaly detection model can output various index performances such as precision, recall ratio and the like.
As an example of this embodiment, when constructing the financial anomaly detection model, a one-vs-one policy may also be used. Assuming that n classes are provided, two classifiers are established for every two classes to obtain k ═ nx (n-1)/2 classifiers. When new data is classified, the k classifiers are used for classification in sequence, each classification is equivalent to one voting, and the classification result is equivalent to one vote for the class. After all k classifiers are used for classification, the class with the most votes is selected as the final classification result, which is equivalent to k times of voting.
Step 104: and acquiring financial information of the enterprise to be detected, and constructing a sample input vector according to the financial information of the enterprise to be detected.
In this embodiment, in order to obtain a whole that determines whether the company performs financial charting, the financial information of the enterprise to be detected needs to be obtained first, and then according to the financial information of the enterprise to be detected, the sample generation method described in step 101 is used to obtain features of the enterprise in a certain time interval, so as to form a sample input vector.
Step 105: and inputting the input vector into a financial abnormity detection model, obtaining a multi-label result vector of the enterprise to be detected, and obtaining a detection result of the enterprise to be detected according to the multi-label result vector.
In this embodiment, the financial anomaly detection model outputs a multi-label result vector, and the detection result can be obtained by performing overall inference on the vector. The more labels are involved in the multi-label result vector, which indicates that the more the sample problems are, the more the financial breading is performed, the more important attention is needed.
In this embodiment, on the one hand, the auditing experts are concerned with the specific type of financial breading and, on the other hand, they also want to obtain an overall assessment. Assume that the prediction result is L, which is a 14-dimensional vector, each dimension being a binary component of 0/1. The present invention sums all components of this result vector with the evaluation result as a whole. Which is an integer S in the interval 0 to 14. Generally, an emphasis value can be set, for example, an enterprise with an emphasis value of 2 or more needs to pay attention and investigate.
In addition, the multi-label result vector obtained by the method can be used as auxiliary data to assist an audit expert in analyzing and judging financial information disclosed by a company, so that the labor cost and the judgment error are reduced, and the efficiency of financial audit is improved. As described above, through the screening of the invention, the number of samples which need to be concerned by an auditing specialist is reduced, and the auditing efficiency is improved.
For better explaining the principle and the flow of the present invention, referring to fig. 2, fig. 2 is a schematic flow chart of another embodiment of the financial anomaly detection method based on multi-tag learning provided by the present invention. Fig. 2 describes the flows of feature selection, sample balancing, label annotation, multi-label learning model and overall inference, and the specific principles of each flow may refer to the above related description without limitation.
Correspondingly, the invention further provides a financial abnormity detection device based on multi-label learning. Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of the financial anomaly detection apparatus based on multi-tag learning according to the present invention. As shown in fig. 3, the apparatus includes: a training sample acquisition module 301, a sample balancing module 302, a label labeling module 303, a model building module 304, a to-be-detected sample acquisition module 305 and a detection module 306.
The training sample acquisition module 301 is configured to acquire financial information of a plurality of enterprises, and generate a feature vector sample of each enterprise according to the financial information of each enterprise; the feature vector samples are divided into positive samples and negative samples according to whether the enterprises have financial violations or not.
The sample balancing module 302 is configured to balance the number of positive samples and the number of negative samples in all feature vector samples according to a preset sampling parameter, so as to obtain a training sample set.
The label labeling module 303 is configured to label the training sample set according to a plurality of labels obtained from the historical accountability data.
The model construction module 304 is configured to construct a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set labeled with the labels.
The to-be-detected sample acquisition module 305 is configured to acquire financial information of the to-be-detected enterprise, and construct a sample input vector according to the financial information of the to-be-detected enterprise.
The detection module 306 is configured to input the sample input vector into the financial anomaly detection model, obtain a multi-tag result vector of the enterprise to be detected, and obtain a detection result of the enterprise to be detected according to the multi-tag result vector.
In the present embodiment, the training sample acquisition module 301 includes a first acquisition unit, a first generation unit, and a second generation unit.
The first acquisition unit is used for acquiring financial information of a plurality of enterprises in a time region; the financial information includes: an asset liability statement, a profit statement and a cash flow statement.
The first generating unit is used for generating a financial index table, a derived financial index table and a derived financial index attached table according to the asset liability table, the profit table and the cash flow table.
And the second generating unit is used for extracting various indexes from the asset liability statement, the profit statement, the cash flow statement, the financial index statement, the derived financial index statement and the derived financial index attached statement and generating a characteristic vector sample of each enterprise.
In this embodiment, the sample balancing module 302 is configured to balance the number of positive samples and the number of negative samples in all feature vector samples according to a preset sampling parameter, so as to obtain a training sample set, specifically:
the sample balancing module increases the number of the positive samples through random oversampling, decreases the number of the negative samples through random undersampling, and balances the data of the positive samples and the negative samples according to a sampling parameter r to obtain a training sample set;
wherein the sampling parameter
Figure BDA0002515506390000121
NnegIs the number of positive samples; n is a radical ofposIs the number of negative samples.
In this embodiment, the model construction module 304 is configured to construct a financial anomaly detection model based on a multi-label learning algorithm according to a training sample set labeled with a label, specifically:
the model construction module 304 configures a classifier for each class of label according to a one-vs-all policy algorithm, and constructs the financial anomaly detection model by combining the training sample set labeled by the label.
In view of the above, the financial anomaly detection method and device based on multi-label learning provided by the invention have the advantages that firstly, the characteristic vector sample of each enterprise is generated according to the financial information of the existing enterprise, and then, the sample balance is carried out according to the preset sampling parameters to obtain the training sample set; labeling the training sample set according to a plurality of labels obtained from historical accountability data, and constructing a financial anomaly detection model based on a multi-label learning algorithm; and finally, acquiring financial information of the enterprise to be detected, constructing a sample input vector, inputting the sample input vector into a financial anomaly detection model, and acquiring a detection result. Compared with the prior art that the accuracy is low or the plastering motivation cannot be obtained due to the lack of the samples, the technical scheme of the invention not only balances the number of the samples and improves the accuracy of the model, but also obtains a plurality of types of labels from historical accountability data, calculates the plastering means based on a multi-label learning method, and cooperatively judges the final financial plastering event by the labels, thereby further improving the detection accuracy. In addition, the invention can also assist audit experts in analyzing and judging financial information disclosed by companies, thereby reducing labor cost and judgment errors and improving the efficiency of financial audit.
Furthermore, the financial abnormity detection method of the invention and the affair inference of financial auditing experts can solve the problems of many applications in the aspect of company financial affair declaration, such as company operation condition analysis, stock price control and the like, reduce the culture cost, time and resources of the auditing experts and enlarge the application range of the invention.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A financial anomaly detection method based on multi-label learning is characterized by comprising the following steps:
acquiring financial information of a plurality of enterprises, and generating a feature vector sample of each enterprise according to the financial information of each enterprise; the characteristic vector samples are divided into positive samples and negative samples according to whether the enterprises have financial violations or not;
balancing the quantity of positive samples and negative samples in all the feature vector samples according to preset sampling parameters to obtain a training sample set;
labeling the training sample set according to a plurality of labels obtained from historical accountability data, and constructing a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set after labeling;
acquiring financial information of an enterprise to be detected, and constructing a sample input vector according to the financial information of the enterprise to be detected;
and inputting the sample input vector into the financial abnormity detection model to obtain a multi-label result vector of the enterprise to be detected, and obtaining a detection result of the enterprise to be detected according to the multi-label result vector.
2. The financial anomaly detection method based on multi-label learning according to claim 1, wherein the financial information of a plurality of enterprises is obtained, and a feature vector sample of each enterprise is generated according to the financial information of each enterprise, specifically:
acquiring financial information of a plurality of enterprises in a time region; the financial information includes: an asset liability statement, a profit statement and a cash flow statement;
generating a financial index table, a derivative financial index table and a derivative financial index attached table according to the asset liability table, the profit table and the cash flow table;
and extracting various indexes from the asset liability statement, the profit statement, the cash flow statement, the financial index statement, the derived financial index statement and the derived financial index attached statement, and generating a characteristic vector sample of each enterprise.
3. The financial anomaly detection method based on multi-label learning according to claim 1, wherein the number of positive samples and negative samples in all feature vector samples is balanced according to preset sampling parameters to obtain a training sample set, specifically:
increasing the number of the positive samples through random oversampling, reducing the number of the negative samples through random undersampling, and balancing data of the positive samples and the negative samples according to a sampling parameter r to obtain a training sample set;
wherein the sampling parameter
Figure FDA0002515506380000021
NnegIs the number of positive samples; n is a radical ofposIs the number of negative samples.
4. The financial anomaly detection method based on multi-tag learning according to claim 1, wherein the plurality of tags derived from historical accountability data are specifically:
the historical accountability data comprises: the disclaimer and disclosure material published by the regulatory body, the proprietary data owned by the financial institution, and the legal data owned by the third party data company;
the plurality of labels are respectively: in-table validation, out-of-table disclosure, revenue, fees, assets, liabilities, cash flows, withheld security, withheld party financing, withheld association transactions, withheld major litigation, performance disclosure modification, administrative penalties, and internal control issues.
5. The financial anomaly detection method based on multi-label learning according to claim 4, wherein a financial anomaly detection model is constructed based on a multi-label learning algorithm according to the training sample set labeled with labels, specifically:
and configuring a classifier for the label of each category according to a one-vs-all strategy algorithm, and constructing the financial anomaly detection model by combining the training sample set labeled by the label.
6. A financial anomaly detection device based on multi-tag learning, said financial anomaly detection device comprising: the system comprises a training sample acquisition module, a sample balance module, a label marking module, a model construction module, a to-be-detected sample acquisition module and a detection module;
the training sample acquisition module is used for acquiring financial information of a plurality of enterprises and generating a characteristic vector sample of each enterprise according to the financial information of each enterprise; the characteristic vector samples are divided into positive samples and negative samples according to whether the enterprises have financial violations or not;
the sample balancing module is used for balancing the number of positive samples and negative samples in all the feature vector samples according to preset sampling parameters to obtain a training sample set;
the label labeling module is used for labeling the training sample set according to a plurality of labels obtained from historical accountability data;
the model construction module is used for constructing a financial abnormity detection model based on a multi-label learning algorithm according to the training sample set labeled with the labels;
the to-be-detected sample acquisition module is used for acquiring financial information of an enterprise to be detected and constructing a sample input vector according to the financial information of the enterprise to be detected;
the detection module is used for inputting the sample input vector into the financial abnormity detection model, obtaining a multi-label result vector of the enterprise to be detected, and obtaining a detection result of the enterprise to be detected according to the multi-label result vector.
7. The multi-label learning-based financial anomaly detection device according to claim 6, wherein said training sample acquisition module comprises a first acquisition unit, a first generation unit and a second generation unit;
the first acquisition unit is used for acquiring financial information of a plurality of enterprises in a time region; the financial information includes: an asset liability statement, a profit statement and a cash flow statement;
the first generation unit is used for generating a financial index table, a derived financial index table and a derived financial index attached table according to the asset liability table, the profit table and the cash flow table;
the second generating unit is used for extracting various indexes from the asset liability statement, the profit statement, the cash flow statement, the financial index statement, the derived financial index statement and the derived financial index attached statement and generating a characteristic vector sample of each enterprise.
8. The financial anomaly detection device based on multi-label learning according to claim 6, wherein the sample balancing module is configured to balance the number of positive samples and negative samples in all feature vector samples according to preset sampling parameters to obtain a training sample set, specifically:
the sample balancing module increases the number of the positive samples through random oversampling, decreases the number of the negative samples through random undersampling, and balances the data of the positive samples and the negative samples according to a sampling parameter r to obtain a training sample set;
wherein the sampling parameter
Figure FDA0002515506380000041
NnegIs the number of positive samples; n is a radical ofposIs the number of negative samples.
9. A financial anomaly detection device based on multi-tag learning according to claim 6, wherein said plurality of tags derived from historical accountability data are specifically:
the historical accountability data comprises: the disclaimer and disclosure material published by the regulatory body, the proprietary data owned by the financial institution, and the legal data owned by the third party data company;
the plurality of labels are respectively: in-table validation, out-of-table disclosure, revenue, fees, assets, liabilities, cash flows, withheld security, withheld party financing, withheld association transactions, withheld major litigation, performance disclosure modification, administrative penalties, and internal control issues.
10. The financial anomaly detection device based on multi-label learning according to claim 9, wherein the model construction module is configured to construct a financial anomaly detection model based on a multi-label learning algorithm according to the training sample set labeled with labels, specifically:
and the model construction module configures a classifier for each class of label according to a one-vs-all strategy algorithm, and constructs the financial anomaly detection model by combining the training sample set labeled by the label.
CN202010474735.3A 2020-05-29 2020-05-29 Financial anomaly detection method and device based on multi-label learning Pending CN111783829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010474735.3A CN111783829A (en) 2020-05-29 2020-05-29 Financial anomaly detection method and device based on multi-label learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010474735.3A CN111783829A (en) 2020-05-29 2020-05-29 Financial anomaly detection method and device based on multi-label learning

Publications (1)

Publication Number Publication Date
CN111783829A true CN111783829A (en) 2020-10-16

Family

ID=72754553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010474735.3A Pending CN111783829A (en) 2020-05-29 2020-05-29 Financial anomaly detection method and device based on multi-label learning

Country Status (1)

Country Link
CN (1) CN111783829A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446774A (en) * 2020-10-30 2021-03-05 杭州衡泰软件有限公司 Financial statement quality early warning method
CN112767106A (en) * 2021-01-14 2021-05-07 中国科学院上海高等研究院 Automatic auditing method, system, computer readable storage medium and auditing equipment
CN113269626A (en) * 2021-06-03 2021-08-17 北京航空航天大学 Financial manipulation behavior identification method and device, electronic equipment and medium
CN114022053A (en) * 2022-01-05 2022-02-08 鲁信科技股份有限公司 Auditing system and equipment based on risk factors
CN117151906A (en) * 2023-08-15 2023-12-01 广东省地质调查院 Financial accounting audit supervision collaborative supervision method based on association network establishment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018049400A (en) * 2016-09-20 2018-03-29 株式会社ココペリインキュベート Financial information analysis system, and program
CN108230131A (en) * 2017-12-29 2018-06-29 国信优易数据有限公司 A kind of data processing method and device
CN109376995A (en) * 2018-09-18 2019-02-22 平安科技(深圳)有限公司 Financial data methods of marking, device, computer equipment and storage medium
CN109657721A (en) * 2018-12-20 2019-04-19 长沙理工大学 A kind of multi-class decision-making technique of combination fuzzy set and random forest tree
CN110298741A (en) * 2019-06-27 2019-10-01 广发证券股份有限公司 A kind of Financial Fraud risk recognition system
CN110347839A (en) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 A kind of file classification method based on production multi-task learning model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018049400A (en) * 2016-09-20 2018-03-29 株式会社ココペリインキュベート Financial information analysis system, and program
CN108230131A (en) * 2017-12-29 2018-06-29 国信优易数据有限公司 A kind of data processing method and device
CN109376995A (en) * 2018-09-18 2019-02-22 平安科技(深圳)有限公司 Financial data methods of marking, device, computer equipment and storage medium
CN109657721A (en) * 2018-12-20 2019-04-19 长沙理工大学 A kind of multi-class decision-making technique of combination fuzzy set and random forest tree
CN110298741A (en) * 2019-06-27 2019-10-01 广发证券股份有限公司 A kind of Financial Fraud risk recognition system
CN110347839A (en) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 A kind of file classification method based on production multi-task learning model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘君;王理平;: "基于概率神经网络的财务舞弊识别模型", 哈尔滨商业大学学报(社会科学版), no. 03 *
文拥军;朱文杰;: "上市公司财务报告舞弊识别实证研究", 财会通讯, no. 02 *
邓庆山;梅国平;: "基于BP神经网络的虚假财务报告识别", 系统工程, no. 10 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446774A (en) * 2020-10-30 2021-03-05 杭州衡泰软件有限公司 Financial statement quality early warning method
CN112767106A (en) * 2021-01-14 2021-05-07 中国科学院上海高等研究院 Automatic auditing method, system, computer readable storage medium and auditing equipment
CN112767106B (en) * 2021-01-14 2023-11-07 中国科学院上海高等研究院 Automatic auditing method, system, computer readable storage medium and auditing equipment
CN113269626A (en) * 2021-06-03 2021-08-17 北京航空航天大学 Financial manipulation behavior identification method and device, electronic equipment and medium
CN114022053A (en) * 2022-01-05 2022-02-08 鲁信科技股份有限公司 Auditing system and equipment based on risk factors
CN114022053B (en) * 2022-01-05 2022-04-12 鲁信科技股份有限公司 Auditing system and equipment based on risk factors
CN117151906A (en) * 2023-08-15 2023-12-01 广东省地质调查院 Financial accounting audit supervision collaborative supervision method based on association network establishment
CN117151906B (en) * 2023-08-15 2024-02-13 广东省地质调查院 Financial accounting audit supervision collaborative supervision method based on association network establishment

Similar Documents

Publication Publication Date Title
Chen et al. Fraud detection for financial statements of business groups
Amani et al. Data mining applications in accounting: A review of the literature and organizing framework
Tsai et al. Determinants of intangible assets value: The data mining approach
Zhang et al. Credit risk prediction of SMEs in supply chain finance by fusing demographic and behavioral data
Chen et al. Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry
CN111783829A (en) Financial anomaly detection method and device based on multi-label learning
Ravisankar et al. Detection of financial statement fraud and feature selection using data mining techniques
Tsai et al. Predicting stock returns by classifier ensembles
Lee Business bankruptcy prediction based on survival analysis approach
Abedin et al. Tax default prediction using feature transformation-based machine learning
Alden et al. Detection of financial statement fraud using evolutionary algorithms
CN107437227A (en) Stock investment analysis apparatus and method
Wong et al. Financial accounting fraud detection using business intelligence
Cao et al. Bond rating using support vector machine
Stanisic et al. Predicting the type of auditor opinion: Statistics, machine learning, or a combination of the two?
Deng et al. An intelligent system for insider trading identification in Chinese security market
Hu Predicting and improving invoice-to-cash collection through machine learning
CN110689437A (en) Communication construction project financial risk prediction method based on random forest
Kaur et al. Application and performance of data mining techniques in stock market: A review
Chimonaki et al. Identification of financial statement fraud in Greece by using computational intelligence techniques
Fujii et al. Extraction and classification of risk-related sentences from securities reports
Wang et al. Multiview graph learning for small-and medium-sized enterprises’ credit risk assessment in supply chain finance
Sirikulvadhana Data mining as a financial auditing tool
Todorovic et al. Improving audit opinion prediction accuracy using metaheuristics-tuned XGBoost algorithm with interpretable results through SHAP value analysis
Jeyaraman et al. Practical Machine Learning with R: Define, build, and evaluate machine learning models for real-world applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination