CN114612253A - Financial data counterfeiting identification method - Google Patents

Financial data counterfeiting identification method Download PDF

Info

Publication number
CN114612253A
CN114612253A CN202210264201.7A CN202210264201A CN114612253A CN 114612253 A CN114612253 A CN 114612253A CN 202210264201 A CN202210264201 A CN 202210264201A CN 114612253 A CN114612253 A CN 114612253A
Authority
CN
China
Prior art keywords
financial
enterprise
counterfeiting
investment
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210264201.7A
Other languages
Chinese (zh)
Inventor
张良均
王宏刚
施兴
张敏
张尚佳
刘名军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Teddy Intelligent Technology Co ltd
Original Assignee
Guangdong Teddy Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Teddy Intelligent Technology Co ltd filed Critical Guangdong Teddy Intelligent Technology Co ltd
Priority to CN202210264201.7A priority Critical patent/CN114612253A/en
Publication of CN114612253A publication Critical patent/CN114612253A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application provides a financial data counterfeiting identification method, which comprises the following steps: classifying the enterprises to obtain enterprise categories; mining the financial categories of the enterprises with frequent financial counterfeiting by adopting an association mining algorithm according to the enterprise categories; predicting the false making probability through a logistic regression algorithm according to the financial category of the financial false making and the abnormal value of the missing value of the financial statement; the method for predicting the enterprise counterfeiting probability by combining the financial counterfeiting category and the enterprise financial report specifically comprises the following steps: carrying out missing value processing by using the difference value, processing an abnormal value by using a box type graph, and calculating the integral enterprise counterfeiting probability by using a logistic regression algorithm; carrying out prediction verification based on online sales performance on enterprises which plan to invest but have financial data counterfeiting possibility exceeding a preset threshold; providing an investment decision according to the verification result; and adjusting the investment proportion according to the future compensatable counterfeiting degree and the real gain rate after investment of the enterprise. The invention can verify financial counterfeiting and maximize the benefits of investors.

Description

Financial data counterfeiting identification method
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of intelligent equipment, in particular to a financial data counterfeiting identification method.
[ background of the invention ]
During the investment process, analysis is often required according to the financial reports of each enterprise. Although it is possible for an enterprise to provide a financial report, there may be data with counterfeit components that do not fully trust the financial report. In fact, different types of enterprises have different deviation on counterfeiting items, for example, education enterprises can often take money of customers first and then start teaching, so that the enterprises corresponding to money collection have low counterfeiting probability, while some sales-type enterprises try products first and then collect money, and the counterfeiting probability corresponding to money collection is high. Therefore, if the short-term potential of one enterprise is judged by the receivable amount, the sales type enterprise has more false making motivation, so that how to judge which financial information is easier to be made false according to different industries and fields is a difficult problem, on the other hand, the enterprise intentionally covers the real situation with null value, and some enterprises provide false value to obtain more investment trust, which is often victimized by inexperienced investors. Thus, the financial and financial data of a business cannot be completely trusted, but many times the counterfeiting information is not a hole and the business would prefer to risk providing the false information and sometimes the counterfeiting is caused because they have the expectation and ability to reach the goal and only temporarily fail. The ordinary investors leave the enterprise immediately after being Michler's thoughts of counterfeiting, which is not necessarily the most scientific way, and if the system can compare the actual conditions predicted by the system with the financial report providing conditions of the enterprise to obtain the possibility and amplitude of counterfeiting, and corresponding to the enterprises with capability, even if a small part of counterfeiting data appears, the potential can be found, and the proportion of investment is adjusted, so that a lot of wrong investment can be reduced.
[ summary of the invention ]
The invention provides a financial data counterfeiting identification method, which mainly comprises the following steps:
classifying the enterprises to obtain enterprise categories; mining the financial categories of the enterprises with frequent financial counterfeiting by adopting an association mining algorithm according to the enterprise categories; predicting the false making probability through a logistic regression algorithm according to the financial category of the financial false making and the abnormal value of the missing value of the financial statement; the method for predicting the enterprise counterfeiting probability by combining the financial counterfeiting category and the enterprise financial report specifically comprises the following steps: carrying out missing value processing by using the difference value, processing an abnormal value by using a box type graph, and calculating the integral enterprise counterfeiting probability by using a logistic regression algorithm; carrying out prediction verification based on online sales performance on enterprises which plan to invest but have financial data counterfeiting possibility exceeding a preset threshold; providing an investment decision according to the verification result; and adjusting the investment proportion according to the future compensatable counterfeiting degree and the real gain rate after investment of the enterprise.
Further optionally, the classifying the enterprises to obtain the enterprise categories includes:
establishing an enterprise category database, acquiring basic information of an enterprise, judging whether an industry category comprises preset industry category keywords or not through the basic information to obtain the enterprise category of the enterprise to be judged, and giving a plurality of classification marking values to the enterprise to obtain a first-dimension enterprise classification if the enterprise category has the property of multi-industry integration; and classifying the enterprises according to economic types except for the industry classification to obtain a second-dimension enterprise classification.
Further optionally, the mining the financial categories of the enterprises in which financial counterfeiting frequently occurs by using the association mining algorithm includes:
establishing a financial category database which comprises receivable bills, long-term equity investment, prepaid money, chargeable tax, total assets, sales cost, running asset total, payable employee salaries, continuous operation net profit, on-construction projects, intangible assets and surplus public products; calculating the enterprise category and the financial counterfeiting category through an association mining algorithm to obtain an association item as an association result, and obtaining the financial category which is most frequently subjected to financial counterfeiting under the category according to the enterprise category;
further optionally, the predicting the false making probability according to the financial classification of the financial false making and the abnormal value of the missing value of the financial statement through a logistic regression algorithm comprises:
acquiring enterprise and associated financial information according to an enterprise category database and a financial category database, and judging a financial category item which is most likely to cause counterfeiting in the enterprise category; acquiring an enterprise financial statement, and performing missing value processing and abnormal value processing; the missing value processing mainly comprises the steps of judging whether the missing proportion of the financial report items is larger than a preset threshold value or not, if so, indicating that the financial report data is too large in missing, and judging that the financial report is in a missing type; for the financial report data with the missing proportion smaller than the preset threshold value%, carrying out interpolation filling on the missing value according to historical data of the financial report; the interpolation is to calculate the most suitable value by interpolation according to the past and future data of the value by adopting an interpolation method;
processing the abnormal value, namely observing a maximum value and a minimum value by using a box diagram, and observing whether the deviation of the abnormal data of the financial reports and the average value far exceeds a standard deviation or not; finally, calculating the enterprise false making probability by using a logistic regression algorithm;
the processing of the abnormal value by using the box type graph further comprises the following steps:
the upper quartile, the median, the lower quartile, the lower edge and abnormal values are counted, and the abnormal values are obtained efficiently; comparing a plurality of groups of data distribution characteristics, wherein the upper and lower boundaries are used as the boundaries of data distribution, and data points higher than the upper boundary or lower than the lower boundary are all regarded as outliers or abnormal values;
the method for calculating the enterprise integral false making probability through the logistic regression algorithm further comprises the following steps:
acquiring financial data including enterprise type, accounts receivable, prepaid money, tax receivable, total assets, sales cost, flowing asset total, payable staff pay, continuous operation net profit and under-construction project, and predicting total false making probability of enterprise financial reports by a logistic regression algorithm after supplementing missing values in the projects by interpolation.
Further optionally, the performing of the online sales performance-based predictive verification on the enterprise to be invested but with the possibility of counterfeiting the financial data exceeding a preset threshold comprises:
after the possibility probability of enterprise financial data counterfeiting is obtained, the enterprise is further verified to determine whether the counterfeiting really exists or not, and determine whether the specific counterfeiting amplitude can be compensated in the future or not; aiming at enterprises with online e-commerce sales, acquiring sales information of the enterprises on an official website of the enterprise, the Jingdong, Taobao, Tianmao and Ali baba by using a crawler, and crawling different sales data according to different enterprise categories;
the crawling specifically comprises the steps that the website hyperlink is added into a downloading queue according to a preset enterprise entry address, the downloading queue is sequentially ordered according to financial category priority, pages which relate to a large amount of financial data are preferentially captured, and pages which do not have financial information correlation are excluded;
forecasting the possibility and the time period of the enterprise for making up the counterfeiting vacancy in a future preset time period by combining the difference between the sales data and the counterfeiting data; and judging whether the appearing financial counterfeiting data can fill the counterfeiting amount in the future preset time or not by comparing the difference between the online data trend and the financial data.
Further optionally, the providing an investment decision according to the verification result includes:
according to the income of an enterprise, the investment scheme is evaluated by utilizing the discount indexes and the non-discount indexes, the net present value, the present value indexes and the included reward rate are mainly included, the recovery period is judged, the calculation of the internal earning rate is corrected by observing the relation between the net present value and the discount rate in the net present value curve and the MIRR function, the difference between different economic years is compared, and finally whether the enterprise is worth investing is determined by combining the probability of enterprise counterfeiting.
Further optionally, the predicting the investment amount based on the investment decision comprises:
judging the future production capacity of the enterprise, and converting according to the financial counterfeiting probability; predicting the investment amount by using a unit production capacity estimation algorithm, and estimating the investment amount according to the unit production capacity investment amount of the same item and the production capacity of the proposed project, wherein the production capacity refers to the yield which is achieved every year after the investment project is built and put into operation; predicting the investment amount according to the converted investment decision by adopting the following formula: the total amount of project investment is equal to the unit production capacity investment of the same type of enterprises, the production capacity of the proposed project and the financial false probability.
Further optionally, the adjusting the investment proportion according to the future repairable counterfeiting degree and the real profit rate after investment of the enterprise comprises:
acquiring enterprise counterfeiting data and amplitude, acquiring online sales data of enterprises, and acquiring a simulated investment amount for the enterprises; and inputting the counterfeit amount and the sales amount as characteristics, taking the real return rate after investment in the historical investment process as a marking value, adopting a support vector machine classifier as a training model, training a binary classifier, predicting the investment return rate, and multiplying the confidence values of the two classifications by the investment amount according to the prediction result and the support vector machine to obtain the final investment amount.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
through the forecast to the financial report counterfeiting data, make investment decision-making more accurate, including can be to the different items of making fake of different trades, this thing of distinguishing financial fraud, can also be according to the true condition of enterprise simultaneously, avoid the one-time not invest as long as the counterfeiting, but whether more accurate prediction is given should invest in and the amount of investment, make investor's interests can not be impaired to make investment return obtain bigger guarantee and income the biggest.
[ description of the drawings ]
FIG. 1 is a flow chart of a method for identifying fraud in financial data according to the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a method for identifying fraud in financial data according to the present invention. As shown in fig. 1, the method for identifying counterfeit financial data in this embodiment may specifically include:
classifying the enterprises; analyzing the financial category of the financial counterfeiting of the enterprise according to the enterprise category; forecasting enterprise counterfeiting probability by combining financial counterfeiting categories and enterprise financial reports; carrying out prediction verification based on online sales performance on enterprises with the financial data counterfeiting possibility exceeding a preset threshold; providing an investment decision according to the verification result; and adjusting the investment proportion by combining the investment amount and the enterprise financial counterfeiting category and degree.
Step 101, distinguishing the classifications of different enterprises, including:
establishing an enterprise category database, acquiring basic information of an enterprise, judging whether the industry category comprises preset industry category keywords or not through the basic information, and obtaining the enterprise category of the enterprise to be judged, wherein the enterprise category comprises manufacturing industry, construction industry, housing estate industry, financial industry, education and technical service industry; if the enterprise has the property of multi-industry integration, giving the enterprise a plurality of labels; in addition to the industry classification, the enterprises are classified according to economic types, including whether the enterprises belong to the enterprises of the national, private or collective ownership system, whether the enterprises belong to the enterprises of the equity system, the individual exclusive or the partnership, whether the enterprises are large or small enterprises, and whether the enterprises meet the marketing standards. Because the classification of the enterprises is beneficial to judging the requirements of different enterprises on the counterfeiting categories, the classification is used for analyzing the different requirements of different enterprises on the financial counterfeiting categories and predicting the counterfeiting category with the maximum probability according to the enterprise types.
Step 102, judging financial categories of financial counterfeiting of different enterprises according to the enterprise categories, which comprises the following steps:
establishing a financial category database which comprises accounts receivable, long-term equity investment, prepaid money, tax payable, total assets, sales cost, running asset total, payable employee salaries, continuous operation net profit, on-construction projects, intangible assets and surplus public products; the enterprise category and the financial counterfeiting category are correlated through calculation of a correlation mining algorithm, and the category appearing under the category is obtained according to the enterprise categoryFinancial fraud category. The association means that the category of the most probable financial counterfeiting in each industry is counted by acquiring historical financial counterfeiting data, the most probable financial counterfeiting items corresponding to the industry are associated, and an association algorithm can be calculated by adopting an apriori association mining algorithm. An association rule is an implication expression shaped as X → Y, where X and Y are disjoint sets of terms, i.e.
Figure BDA0003551945190000041
The strength of an association rule can be measured in terms of its support (support) and confidence (confidence); the support determination rules may be for how often a given data set occurs, while the confidence determines how often Y occurs in transactions containing X; the support(s) and confidence (c) take the form of the following two measures:
s(X→Y)=σ(X∪Y)/N
c(X→Y)=σ(X∪Y)/σ(X)
where σ (X U.Y) is the support count of (X U.Y), N is the total number of transactions, and σ (X) is the support count of X.
For example, the land industry is searched in the enterprise category database, the association between multiple enterprises and multiple financial items is calculated by apriori by taking the type of the enterprise with which financial counterfeiting frequently occurs and the type of the counterfeiting as one item. The financial terms associated with the counterfeiting of the financial data, including but not limited to the receivable bills, prepaid funds, monetary funds, etc., are associated with an industry such as a private real estate rental agency, so that it is known whether the private real estate rental agency will frequently make a false answer, because apriori can calculate a confidence value and thus determine the possibility of the counterfeiting of the data.
103, predicting the enterprise counterfeiting probability by combining the financial counterfeiting category and the enterprise financial report missing value, wherein the method comprises the following steps of:
firstly, acquiring basic enterprise information, and judging enterprise categories and possibly fake financial categories according to an enterprise category database and a financial category database; acquiring enterprise financial reports, and processing missing values and abnormal values; the missing value processing is one of means for mainly judging whether the missing value is normally generated or is counterfeit. The missing value processing mainly comprises the steps of judging whether the missing proportion of the financial report items is larger than 0.5%, and if so, indicating that the financial report data is too large in missing, so that the financial data which is not wanted to be known by people is unqualified no matter the financial data is normally missing or is hidden. Thus, the financial report can be determined to be of no reference value or not qualified; for the missing proportion less than 0.5%, carrying out interpolation filling on the data; interpolation is the process of interpolation, which is to calculate a value that is likely to be suitable for the future and past data of the value. The continuous function is interpolated on the basis of the discrete data such that the continuous curve passes through all given discrete data points. Interpolation is an important method for approximation of a discrete function, and can be used for estimating the approximate value of the function at other points through the value conditions of the function at a limited number of points.
On the other hand, for abnormal value processing, a maximum value/a small value is observed by using a box diagram, and whether the deviation of the abnormal data of the financial reports and the average value far exceeds the standard deviation or not is observed; finally, calculating the enterprise false making probability by using a logistic regression algorithm;
the processing of the abnormal value by using the box type graph further comprises the following steps:
the upper quartile, the median, the lower quartile, the lower edge and abnormal values are counted, and the abnormal values are obtained efficiently; comparing a plurality of groups of data distribution characteristics, taking the upper and lower boundaries as the boundaries of data distribution, and taking the data points higher than the upper boundary or lower than the lower boundary as outliers or abnormal values;
finally, the enterprise overall false making probability is calculated through a logistic regression algorithm, and the method further comprises the following steps:
the method comprises the steps of obtaining financial data including enterprise types, accounts receivable, prepaid items, taxes payable, asset total, sales cost, flowing asset total, employee payable, continuous operation net profits, construction projects and the like, completing missing values in the items through interpolation, calculating a predicted value through a Sigmoid function through a logistic regression algorithm, judging the counterfeiting probability, and finally predicting the total counterfeiting probability of the enterprise financial newspaper by using a loss function.
Step 104, carrying out prediction verification based on online sales performance on enterprises with the possibility of financial data counterfeiting exceeding a preset threshold, wherein the prediction verification comprises the following steps:
after the possibility probability of enterprise financial data counterfeiting is obtained, the enterprise is further verified to determine whether the counterfeiting really exists or not and determine whether the specific counterfeiting amplitude can be compensated in the future or not; for enterprises with online e-commerce sales, acquiring sales information of the enterprises on an enterprise official website, a Jingdong website, a Taobao website, a Techthyst website and an Alibara website through a crawler, and crawling sales data for different enterprise categories;
the crawling further comprises the steps of adding the website hyperlink into a download queue according to a preset enterprise entry address, sequencing the download queue according to financial category priority, preferentially grabbing pages with more financial data, and excluding pages without financial information correlation;
forecasting the possibility and the time period of the enterprise for making up the counterfeiting vacancy in a future preset time period by combining the difference between the sales data and the counterfeiting data; and judging whether the emerging financial counterfeiting data is possible to fill the counterfeiting amount in the future preset time by comparing the difference between the online data trend and the financial data. For example, for a business that makes a house lease, accounts are receivables and the financial statement displays 100 ten thousand dollars of accounts receivable. If the actually signed company is less than 100 ten thousand yuan in the current year, but the evidence data provided by the enterprise has more missing values, and the enterprise is found to have higher false making probability after being calculated by a logistic regression classification algorithm according to the type of the enterprise, accounts receivable, prepaid money, chargeable tax, total assets, sales cost, flowing asset total, payable employee salaries, continuous operation net profits, construction establishment and the like. Therefore, the online sales data of the enterprise is obtained, for example, when the sales on the network, including unit price and sales volume, are found to be multiplied by a total price of more than 100 ten thousand, the enterprise can be judged to be in fact capable of remedying the financial counterfeiting risk. That is, even if the current finance is counterfeit, it can be rapidly compensated in future development. And is therefore capital intensive. But after judging investable, the investment decision can be further refined.
Step 105, providing an investment decision according to the verification result, comprising:
according to the income of an enterprise, the investment scheme is evaluated by utilizing the discount indexes and the non-discount indexes, the net present value, the present value indexes and the included reward rate are mainly included, the recovery period is judged, the calculation of the internal earning rate is corrected by observing the relation between the net present value and the discount rate in the net present value curve and the MIRR function, the difference between different economic years is compared, and finally whether the enterprise is worth investing is determined by combining the probability of enterprise counterfeiting. For example, an investment decision scheme is provided for an enterprise a, the difference between the current value of future fund inflow and the current value of future fund outflow is calculated through a net current value method, if the income of the future years is far more than the expenditure, the profitability of the enterprise is high, then the return on investment of the enterprise is calculated to see how large the proportion of the investment and the future return is, the Return On Investment (ROI) ═ end property-beginning property/beginning property × 100%, and finally the final investment scheme is given by combining the financial data counterfeiting result.
Step 106, predicting the investment amount according to the investment decision, comprising:
judging the future production capacity of the enterprise, and converting according to the financial counterfeiting probability; predicting the investment amount by using a unit production capacity estimation algorithm, and estimating the investment amount according to the unit production capacity investment amount of the same item and the production capacity of the proposed project, wherein the production capacity refers to the yield which is achieved every year after the investment project is built and put into operation; predicting the investment amount according to the converted investment decision by adopting the following formula: the total amount of project investment is equal to the unit production capacity investment of the same type of enterprises, the production capacity of the proposed project and the financial false probability. And the total investment amount is influenced by the financial counterfeiting probability which is calculated by the logistic regression classification algorithm through the enterprise type, the receivable amount, the prepaid money, the chargeable tax, the asset total, the sales cost, the flowing asset total, the payable staff salary, the continuous operation net profit, the construction project and the like. The method can prevent a large number of faking companies from being brought out, and avoid the possibility that a large amount of investment can be obtained after faking.
Step 107, adjusting the investment proportion according to the future compensatable counterfeiting degree and the real gain rate after investment of the enterprise, comprising:
acquiring enterprise counterfeiting data and amplitude, acquiring online sales data of enterprises, and acquiring a simulated investment amount for the enterprises; and inputting the counterfeit amount and the sales amount as characteristics, taking the real return rate after investment in the historical investment process as a marking value, adopting a support vector machine classifier as a training model, training a binary classifier, predicting the investment return rate, and multiplying the confidence values of the two classifications by the investment amount according to the prediction result and the support vector machine to obtain the final investment amount. Because the investment process involves a number of uncertain variables, including financial authenticity, future sales possibilities, investment amount, enterprise development, etc., the determination of the investment amount needs to be based on the true return on investment. These uncertain quantities can therefore be used as eigenvalues and the machine learning model trained on the final return on investment. A support vector machine can be used for classification because it is not only efficient but also can obtain the weight probability, i.e., confidence, of a classification. And taking the result of the binary classification as a predicted value of the final investment proportion. The amount of investment can be well predicted.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Programs for implementing the information governance of the present invention may be written in computer program code for carrying out operations of the present invention in one or more programming languages, including an object oriented programming language such as Java, python, C + +, or a combination thereof, as well as conventional procedural programming languages, such as the C language or similar programming languages.
The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention.
And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims (8)

1. A method for identifying fraud in financial data, the method comprising:
classifying the enterprises to obtain enterprise categories; mining the financial categories of the enterprises with frequent financial counterfeiting by adopting an association mining algorithm according to the enterprise categories; predicting the false making probability through a logistic regression algorithm according to the financial category of the financial false making and the abnormal value of the missing value of the financial statement; the method for predicting the enterprise counterfeiting probability by combining the financial counterfeiting category and the enterprise financial report specifically comprises the following steps: carrying out missing value processing by using the difference value, processing an abnormal value by using a box type graph, and calculating the integral enterprise counterfeiting probability by using a logistic regression algorithm; carrying out prediction verification based on online sales performance on enterprises which plan to invest but have financial data counterfeiting possibility exceeding a preset threshold; providing an investment decision according to the verification result; and adjusting the investment proportion according to the future compensatable counterfeiting degree and the real gain rate after investment of the enterprise.
2. The method of claim 1, wherein the classifying the business to obtain a business category comprises:
establishing an enterprise category database, acquiring basic information of an enterprise, judging whether an industry category comprises preset industry category keywords or not through the basic information to obtain the enterprise category of the enterprise to be judged, and giving a plurality of classification marking values to the enterprise to obtain a first-dimension enterprise classification if the enterprise category has the property of multi-industry integration; and classifying the enterprises according to economic types except for the industry classification to obtain a second-dimension enterprise classification.
3. The method of claim 1, wherein said mining financial categories of frequent financial fraud by the enterprise using a correlation mining algorithm comprises:
establishing a financial category database which comprises receivable bills, long-term equity investment, prepaid money, chargeable tax, total assets, sales cost, running asset total, payable employee salaries, continuous operation net profit, on-construction projects, intangible assets and surplus public products; calculating the enterprise category and the financial counterfeiting category through an association mining algorithm to obtain an association item as an association result, and obtaining the financial category which is most frequently subjected to financial counterfeiting under the category according to the enterprise category;
wherein an association rule is an implication expression in the form X → Y, where X and Y are disjoint sets of terms, i.e.
Figure FDA0003551945180000011
The strength of an association rule may be measured in terms of its support (support) and confidence (confidence); the support determination rules may be for how often a given data set occurs, while the confidence determines how often Y occurs in transactions containing X;
the support(s) and confidence (c) take the form of the following two measures:
s(X→Y)=σ(X∪Y)/N
c(X→Y)=σ(X∪Y)/σ(X)
where σ (X U.Y) is the support count of (X U.Y), N is the total number of transactions, and σ (X) is the support count of X.
4. The method of claim 1, wherein the predicting the false positive probability by a logistic regression algorithm according to the financial category of the financial false positive and the financial statement missing value abnormal value comprises:
acquiring enterprise and associated financial information according to an enterprise category database and a financial category database, and judging a financial category item which is most likely to cause counterfeiting in the enterprise category; acquiring an enterprise financial statement, and performing missing value processing and abnormal value processing; the missing value processing mainly comprises the steps of judging whether the missing proportion of the financial report items is larger than a preset threshold value or not, if so, indicating that the financial report data is too large in missing, and judging that the financial report is in a missing type; for the financial report data with the missing proportion smaller than the preset threshold value%, carrying out interpolation filling on the missing value according to historical data of the financial report; the interpolation is to calculate the most suitable value by interpolation according to the past and future data of the value by adopting an interpolation method;
processing the abnormal value, namely observing a maximum value and a minimum value by using a box diagram, and observing whether the deviation of the abnormal data of the financial reports and the average value far exceeds a standard deviation or not; finally, calculating the enterprise false making probability by using a logistic regression algorithm;
the processing of the abnormal value by using the box graph further comprises the following steps:
the upper quartile, the median, the lower quartile, the lower edge and abnormal values are counted, and the abnormal values are obtained efficiently; comparing a plurality of groups of data distribution characteristics, wherein the upper and lower boundaries are used as the boundaries of data distribution, and data points higher than the upper boundary or lower than the lower boundary are all regarded as outliers or abnormal values;
the method for calculating the enterprise integral false making probability through the logistic regression algorithm further comprises the following steps:
acquiring financial data including enterprise type, accounts receivable, prepaid money, tax receivable, total assets, sales cost, flowing asset total, payable staff pay, continuous operation net profit and under-construction project, and predicting total false making probability of enterprise financial reports by a logistic regression algorithm after supplementing missing values in the projects by interpolation.
5. The method of claim 1, wherein said performing online sales performance-based predictive validation of the business to be invested but with a likelihood of financial data fraud exceeding a preset threshold comprises:
after the possibility probability of enterprise financial data counterfeiting is obtained, the enterprise is further verified to determine whether the counterfeiting really exists or not, and determine whether the specific counterfeiting amplitude can be compensated in the future or not; aiming at enterprises with online e-commerce sales, acquiring sales information of the enterprises on an official website of the enterprise, the Jingdong, Taobao, Tianmao and Ali baba by using a crawler, and crawling different sales data according to different enterprise categories;
the crawling specifically comprises the steps that the website hyperlink is added into a downloading queue according to a preset enterprise entry address, the downloading queue is sequentially ordered according to financial category priority, pages which relate to a large amount of financial data are preferentially captured, and pages which do not have financial information correlation are excluded;
forecasting the possibility and the time period of the enterprise for making up the counterfeiting vacancy in a future preset time period by combining the difference between the sales data and the counterfeiting data; and judging whether the appearing financial counterfeiting data can fill the counterfeiting amount in the future preset time or not by comparing the difference between the online data trend and the financial data.
6. The method of claim 1, wherein said providing investment decisions based on validation results comprises:
according to the income of an enterprise, the investment scheme is evaluated by utilizing the discount indexes and the non-discount indexes, the net present value, the present value indexes and the included reward rate are mainly included, the recovery period is judged, the calculation of the internal earning rate is corrected by observing the relation between the net present value and the discount rate in the net present value curve and the MIRR function, the difference between different economic years is compared, and finally whether the enterprise is worth investing is determined by combining the probability of enterprise counterfeiting.
7. The method of claim 1, wherein said forecasting an investment based on investment decisions comprises:
judging the future production capacity of the enterprise, and converting according to the financial counterfeiting probability; predicting the investment amount by using a unit production capacity estimation algorithm, and estimating the investment amount according to the unit production capacity investment amount of the same item and the production capacity of the proposed project, wherein the production capacity refers to the yield which is achieved every year after the investment project is built and put into operation; and predicting the investment amount according to the converted investment decision by adopting the following formula: the total amount of project investment is equal to the unit production capacity investment of the same type of enterprises, the production capacity of the proposed project and the financial false probability.
8. The method of claim 1, wherein said adjusting the investment scale according to the future repairable counterfeiting level and the real profit margin after investment of the enterprise comprises:
acquiring enterprise counterfeiting data and amplitude, acquiring online sales data of enterprises, and acquiring a simulated investment amount for the enterprises; and inputting the counterfeit amount and the sales amount as characteristics, taking the real return rate after investment in the historical investment process as a marking value, adopting a support vector machine classifier as a training model, training a binary classifier, predicting the investment return rate, and multiplying the confidence values of the two classifications by the investment amount according to the prediction result and the support vector machine to obtain the final investment amount.
CN202210264201.7A 2022-03-17 2022-03-17 Financial data counterfeiting identification method Pending CN114612253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210264201.7A CN114612253A (en) 2022-03-17 2022-03-17 Financial data counterfeiting identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210264201.7A CN114612253A (en) 2022-03-17 2022-03-17 Financial data counterfeiting identification method

Publications (1)

Publication Number Publication Date
CN114612253A true CN114612253A (en) 2022-06-10

Family

ID=81865213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210264201.7A Pending CN114612253A (en) 2022-03-17 2022-03-17 Financial data counterfeiting identification method

Country Status (1)

Country Link
CN (1) CN114612253A (en)

Similar Documents

Publication Publication Date Title
Huang et al. A hybrid financial analysis model for business failure prediction
US20110166987A1 (en) Evaluating Loan Access Using Online Business Transaction Data
CN109711955B (en) Poor evaluation early warning method and system based on current order and blacklist base establishment method
US8775291B1 (en) Systems and methods for enrichment of data relating to consumer credit collateralized debt and real property and utilization of same to maximize risk prediction
CN112801529B (en) Financial data analysis method and device, electronic equipment and medium
Doumpos et al. Explaining qualifications in audit reports using a support vector machine methodology
Karaa et al. Credit-risk assessment using support vectors machine and multilayer neural network models: a comparative study case of a tunisian bank
Haider et al. Predicting corporate failure for listed shipping companies
CN107133862A (en) Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation
Kumar et al. Forecasting credit ratings Using ANN and statistical techniques
Dimitras et al. Evaluation of empirical attributes for credit risk forecasting from numerical data
Chimonaki et al. Identification of financial statement fraud in Greece by using computational intelligence techniques
CN115526700A (en) Risk prediction method and device and electronic equipment
CN113269629A (en) Credit limit determining method, electronic equipment and related product
Paul et al. Artificial intelligence in predictive analysis of insurance and banking
CN110910002B (en) Account receivables default risk identification method and system
CN116629998A (en) Automatic information counting method and device, electronic equipment and readable storage medium
CN114612253A (en) Financial data counterfeiting identification method
CN114119107A (en) Steel trade enterprise transaction evaluation method, device, equipment and storage medium
CN113870007A (en) Product recommendation method, device, equipment and medium
Laitinen et al. Why does an auditor not issue a going concern opinion for a failing company? Impact of financial risk, time to bankruptcy, and cognitive style
KR102133668B1 (en) Lending Meditation Platform System and Credit Estimating Apparatus
Sharma et al. Assessing Regulatory Responses to Banking Crises
CN111612602A (en) Suspected financial risk distinguishing method and device for listed company
Ansari et al. Financial Performance Evaluation of Companies Using Decision Trees Algorithm and Multi-Criteria Decision-Making Techniques with an Emphasis on Investor’s Risk-Taking Behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination