CN116308829B - Supply chain financial risk assessment processing method and device - Google Patents

Supply chain financial risk assessment processing method and device Download PDF

Info

Publication number
CN116308829B
CN116308829B CN202310558781.5A CN202310558781A CN116308829B CN 116308829 B CN116308829 B CN 116308829B CN 202310558781 A CN202310558781 A CN 202310558781A CN 116308829 B CN116308829 B CN 116308829B
Authority
CN
China
Prior art keywords
data
strip
multidimensional
multidimensional data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310558781.5A
Other languages
Chinese (zh)
Other versions
CN116308829A (en
Inventor
胡娜
施婉瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengye Information Technology Service Shenzhen Co ltd
Original Assignee
Shengye Information Technology Service Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengye Information Technology Service Shenzhen Co ltd filed Critical Shengye Information Technology Service Shenzhen Co ltd
Priority to CN202310558781.5A priority Critical patent/CN116308829B/en
Publication of CN116308829A publication Critical patent/CN116308829A/en
Application granted granted Critical
Publication of CN116308829B publication Critical patent/CN116308829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a supply chain financial risk assessment processing method and a supply chain financial risk assessment processing device, wherein when a risk assessment model predicts that a risk level prediction result of a target enterprise indicates that the risk assessment result of the target enterprise is not passing (namely, the risk score is too high), a multidimensional data strip with highest similarity with a multidimensional data strip of the target enterprise is obtained as a reference multidimensional data strip; and converting the first multidimensional data strip and the reference multidimensional data strip into a preset format and displaying the format on an interface. The method can provide reference data (namely, the reference multidimensional data strip converted into a preset format and displayed on an interface) for comparison when workers of enterprises, banks or warranty institutions analyze failing reasons, and the most probable reasons of failing are conveniently and pointedly found out by the reference data and the first multidimensional data strip due to high similarity, so that the processing efficiency of complaints is quickened.

Description

Supply chain financial risk assessment processing method and device
Technical Field
The invention relates to the technical field of supply chain financial risk assessment, in particular to a supply chain financial risk assessment processing method and device.
Background
Supply chain finance is a structural financial innovation aiming at small and medium enterprises, and a bank or a security institution provides fund support and risk guarantee for the small and medium enterprises through integrating and optimizing the material flow, the fund flow and the information flow on the supply chain. However, since the supply chain has many financial participants, flexible financing mode and complex contract design, there is a risk in its operation. Thus, a warranty institution or bank needs to perform risk assessment on supply chain finances to prevent and control credit risk, operational risk, market risk, etc. that may occur.
The conventional supply chain financial risk assessment is performed based on a purely manual operation mode, namely, the professional staff of a bank or a guarantee organization collates and calculates various index data of an enterprise, so that the enterprise is subjected to risk assessment. With the development of artificial intelligence technology, particularly with the promotion of computer power and the support of big data on the artificial intelligence technology, the risk assessment or prediction of enterprises with supply chain financial business demands by using a trained neural network model becomes a trend. The method and system for monitoring financial risk of supply chain based on neural network disclosed in China patent with publication number of CN115907937A comprises the steps of obtaining multidimensional data strips composed of a plurality of index data based on enterprise data to form a training set, and training the training set for training the neural network to obtain a risk assessment (prediction) model. The patent application publication No. CN115860924A discloses a supply chain financial credit risk early warning method and related equipment, which comprises the steps of extracting data of a plurality of target indexes from a supply chain where a plurality of sample enterprises are located to form a training set, and training the training set to a neural network to obtain a risk assessment (prediction) model. In actual operation, the staff of the bank or the warranty institution can apply these neural network-based risk assessment (prediction) models to the preliminary risk assessment of the supply chain finance, which greatly improves the working efficiency.
However, the neural network model has a "black box" nature, which means that there is a "hidden layer" that we cannot know between the data input by the neural network and the answer output by it, which is called a "black box". That is, it is unclear how the neural network learns, infers, and makes decisions from the data, nor is it able to explain the reasons and logic of its output results. Thus, when the neural network-based risk assessment (prediction) model feeds back a result to an enterprise that needs supply chain financial services as failed (i.e., risk score is too high), the enterprise cannot know why this risk score is obtained, which cannot be pertinently complained when complaints are made without approving the assessment result. Similarly, although a neural network model is used by a worker of a bank or a security organization, due to the nature of a black box, complaints of enterprises cannot be checked in a targeted manner, so that efficiency in processing the complaints is low.
In addition, the inventors have found that the result of risk assessment is related to a plurality of index data constituting a multi-dimensional data bar and the relationship between the index data, and that the more index data is used in training, the more details the model can learn, and generally, the more accurate risk assessment effect can be obtained. With the increase of data volume, it has been difficult for an evaluator to manually analyze the influence of these index data and the relationship between them on the evaluation result. However, the impact of the index data of the same item on the enterprise risk assessment of different industries is not the same, for example, the liability rate of an asset is an important index for measuring the long-term liability of an enterprise, which reflects the proportion of liabilities of the enterprise to the total assets. Generally, the lower this index, the greater the long term repayment capability of the business. However, for some high-level industries, such as the financial industry, the real estate industry, etc., their liability rates may be higher than the level of the general industry. This does not mean that the long term payouts of these industries are poor, but rather are to be comprehensively analyzed in combination with other metrics such as interest payment multiplier, net asset return, liability structure, etc. In other words, for industries that are not included in the training set, the model may not learn some rules of the industry, and thus there is a problem of inaccurate prediction when predicting industries that are not included in the training set. In particular, for the small-people industry, banks or warranty institutions have not been able to collect enough data to update models.
Disclosure of Invention
The invention aims to at least solve one of the technical problems in the prior art, and provides a supply chain financial risk assessment processing method and a supply chain financial risk assessment processing device, which can help enterprises to purposefully analyze reasons of non-passing so as to conduct efficient complaints when a neural network model fails to pass the risk assessment result of the enterprises; and helping the evaluators to purposefully audit the complaints file so as to improve the audit efficiency.
In a first aspect, the present invention provides a supply chain financial risk assessment processing method, the method comprising:
acquiring a first multidimensional data strip; the first multidimensional data strip is a multidimensional data strip of a target enterprise, and the multidimensional data strip is composed of a plurality of index data obtained based on the target enterprise;
inputting the first multidimensional data strip into a risk assessment model to obtain a risk level prediction result of the target enterprise; the risk assessment model is trained by using a training data set, wherein the training data set comprises a plurality of groups of training data, and each group of training data comprises a group of multidimensional data strips and a label for marking the risk level corresponding to the group of multidimensional data strips;
when the risk level prediction result of the target enterprise indicates that the risk assessment result of the target enterprise is failed, acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip as a reference multidimensional data strip according to a preset rule;
Converting the first multidimensional data strip and the reference multidimensional data strip into a preset format and displaying the format on an interface; after the reference multidimensional data strip is converted into a preset format, the description information of each item of index data forming the reference multidimensional data strip is displayed on the interface.
In a second aspect, the present invention provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the supply chain financial risk assessment processing method according to any one of the first aspects of the present invention.
In a third aspect, the present invention provides an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the supply chain financial risk assessment processing method according to any one of the first aspects of the invention when the program is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the supply chain financial risk assessment processing method of any one of the first aspects of the present invention.
The invention provides a supply chain financial risk assessment processing method, electronic equipment, a computer readable storage medium and a computer program product, wherein in the method, when a risk level prediction result of a target enterprise is predicted to indicate that the risk assessment result of the target enterprise is not passing (namely, the risk score is too high) through a risk assessment model, a multidimensional data strip with highest similarity with a multidimensional data strip of the target enterprise is obtained as a reference multidimensional data strip; and converting the first multidimensional data strip and the reference multidimensional data strip into a preset format and displaying the format on an interface. The method can provide reference data (namely, the reference multidimensional data strip converted into a preset format and displayed on an interface) for comparison when workers of enterprises, banks or warranty institutions analyze failing reasons, and the most probable reasons of failing are conveniently and pointedly found out by the reference data and the first multidimensional data strip due to high similarity, so that the processing efficiency of complaints is quickened.
In another aspect, on the basis of the first aspect, the method provided by the present invention further includes:
when the evaluation result of the first multidimensional data strip is complained and then never passes through and is adjusted to pass through, detecting that the industry type of the target enterprise is not contained in the training data set and is not marked, adding the multidimensional data strip of the target enterprise into a target domain and executing migration learning operation;
The performing the transfer learning operation includes:
acquiring the risk assessment model trained on a source domain; the source domain is the training data set;
acquiring a first data set containing a target domain; the format of each group of data in the target domain is the same as that of each group of training data in the training data set, and index data used for indicating the industry type of an enterprise in each group of data in the target domain does not appear in the training data set;
and applying the risk assessment model to a first data set, and performing transfer learning to update the risk assessment model.
In the invention, aiming at the problem of small financial data quantity in a supply chain in the public industry, a migration learning method is adopted to update the model, and the migration learning can utilize knowledge learned by the model trained on the original task to accelerate the training process and improve the performance on the new task, which is particularly effective when the data of the new task is small.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The invention is further described below with reference to the drawings and examples;
FIG. 1 is a flowchart of a supply chain financial risk assessment processing method according to a first embodiment,
FIG. 2 is a flowchart of a supply chain financial risk assessment processing method according to a second embodiment,
FIG. 3 is a flowchart of a supply chain financial risk assessment processing method according to a third embodiment,
FIG. 4 is a flowchart of a supply chain financial risk assessment processing method according to a fourth embodiment,
fig. 5 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
Reference will now be made in detail to the present embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the accompanying drawings are used to supplement the description of the written description so that one can intuitively and intuitively understand each technical feature and overall technical scheme of the present invention, but not to limit the scope of the present invention.
Since the neural network model has a "black box" nature, the black box nature of the neural network means that there is a "hidden layer" between the data input by the neural network and the answer output by it, which we cannot know, and it is called a "black box". That is, it is unclear how the neural network learns, infers, and makes decisions from the data, nor is it able to explain the reasons and logic of its output results. Thus, when the neural network-based risk assessment (prediction) model feeds back a result to an enterprise that needs supply chain financial services as failed (i.e., risk score is too high), the enterprise cannot know why this risk score is obtained, which cannot be pertinently complained when complaints are made without approving the assessment result. Similarly, although a neural network model is used by a worker of a bank or a security organization, due to the nature of a black box, complaints of enterprises cannot be checked in a targeted manner, so that efficiency in processing the complaints is low.
Accordingly, as shown in fig. 1, the present embodiment provides a supply chain financial risk assessment processing method, which includes:
step S202, acquiring a first multidimensional data strip; the first multidimensional data strip is a multidimensional data strip of a target enterprise, and the multidimensional data strip is composed of a plurality of index data obtained based on the target enterprise.
Step S204, inputting the first multidimensional data strip into a risk assessment model to obtain a risk level prediction result of the target enterprise; the risk assessment model is trained by using a training data set, wherein the training data set comprises a plurality of groups of training data, and each group of training data comprises a group of multidimensional data strips and a label for marking the risk level corresponding to the group of multidimensional data strips.
Step S206, when the risk level prediction result of the target enterprise indicates that the risk assessment result of the target enterprise is failed, acquiring a multidimensional data strip with the highest similarity with the first multidimensional data strip as a reference multidimensional data strip according to a preset rule.
Step S208, converting the first multidimensional data strip and the reference multidimensional data strip into a preset format and displaying the format on an interface; after the reference multidimensional data strip is converted into a preset format, the description information of each item of index data forming the reference multidimensional data strip is displayed on the interface.
According to the supply chain financial risk assessment processing method provided by the embodiment, when the risk level prediction result of the target enterprise is predicted to indicate that the risk assessment result of the target enterprise is not passing (namely, the risk score is too high) through the risk assessment model, a multi-dimensional data strip with highest similarity with the multi-dimensional data strip of the target enterprise is obtained to serve as a reference multi-dimensional data strip; and converting the first multidimensional data strip and the reference multidimensional data strip into a preset format and displaying the format on an interface. The method can provide reference data (namely, the reference multidimensional data strip converted into a preset format and displayed on an interface) for comparison when workers of enterprises, banks or warranty institutions analyze failing reasons, and the most probable reasons of failing are conveniently and pointedly found out by the reference data and the first multidimensional data strip due to high similarity, so that the processing efficiency of complaints is quickened.
Specifically, the plurality of pieces of index data for constituting the first multidimensional data strip include index data for indicating an industry type to which the target enterprise belongs.
In one example, the acquiring, according to a preset rule, the multidimensional data strip having the highest similarity with the first multidimensional data strip as the reference multidimensional data strip specifically includes:
When the industry type of the target enterprise is detected not to be contained in the training data set, acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip from the passing data set as a reference multidimensional data strip; the pass data set is a data set formed by multidimensional data strips with risk levels not higher than a threshold value in the training data set.
It can be understood that when the feedback result is not passed, the enterprise will try to analyze which index data in the input enterprise data (i.e. the first multidimensional data strip) does not meet the requirements of the bank or the policy and management institution, but since the bank or the policy and management institution automatically learns the influence relationship of each index data in the multidimensional data strip and the relationship thereof on the evaluation result by adopting the neural network, the bank or the policy and management institution generally cannot explain the evaluation result in particular detail. In the invention, the multidimensional data strip with the highest similarity with the multidimensional data strip of the target enterprise is selected as the reference multidimensional data strip and is converted into the format visible on the interface for reference when the enterprise analyzes, so that the pertinence is improved.
In this embodiment, a reference multi-dimensional data bar (i.e., a multi-dimensional data bar with a risk level not higher than a threshold) with a passing risk evaluation result is selected, the enterprise can pertinently pay attention to index data with low similarity during analysis, and it is assumed that one multi-dimensional data bar has 100 index data, wherein 93 index data first multi-dimensional data bars are almost equal to the reference multi-dimensional data bar (i.e., each difference value in 93 index data of the 93 index data first multi-dimensional data bars and the reference multi-dimensional data bar is within a set threshold range), only 7 index data differences are obvious, obviously cause that the enterprise does not pass the evaluation to be mainly found out on the 7 index data, and enterprise personnel can quickly find out approximate reasons and pertinently write complaints when not taking the evaluation result by looking up the description information of the 7 index data forming the reference multi-dimensional data bar displayed on an interface. For example, although the 7 index data and the evaluation result are marked as being very different from each other by the reference multi-dimensional data bar, the difference is a hierarchy of industry variability, i.e., the 7 index data is very high in the industry described by the reference multi-dimensional data bar to represent the business has a good repayment capability, but the 7 index data is very low in the industry to which the target business belongs to represent the business has a good repayment capability or the 7 index data does not affect the repayment capability of the business, etc. Similarly, when an evaluator reviews and complains, the evaluator can further know the characteristics of the industry of the enterprise by comparing and referencing the difference between the multidimensional data strip and the first multidimensional data strip, and can review and investigate the reason of the complaint described by the enterprise more pertinently, thereby accelerating the review efficiency.
In one example, to improve the efficiency with which enterprise personnel or evaluators find the 7 pieces of index data, the 7 pieces of index data may be displayed on the interface in a different manner than the other 93 pieces of index data, for example, highlighting the area corresponding to the 7 pieces of index data, and so on.
It should be noted that, the industry type to which the enterprise belongs is generally determined by an evaluator of a bank or a warranty organization according to an analysis result when the enterprise data is collected, and the general industry type can select the industry type that appears in the index data of each multidimensional data strip in the training data set, so as to improve the prediction effect of the model. However, in some cases, some industries or new industries are not already present in the training data set, so an evaluator may divide the enterprises into new industries according to the evaluation result, and use a new number to represent the industry that is not already present in the training data set in the process of collecting the information of the enterprises.
Further, in another example, the acquiring, according to a preset rule, the multidimensional data strip having the highest similarity with the first multidimensional data strip as the reference multidimensional data strip specifically includes:
When the target enterprise is detected to be an enterprise of which the belonging industry type is contained in a training data set, acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip from a data set as a reference multidimensional data strip; the non-passing dataset is a dataset formed by multidimensional data strips with risk levels higher than a threshold value in the training dataset.
It should be noted that, when the target enterprise is detected to be an enterprise of the industry type included in the training data set, the index data of the industry type is added into similarity calculation, and the multidimensional data strip with the highest similarity is selected from the multidimensional data strips of the same industry type as the reference multidimensional data strip.
In this embodiment, a reference multidimensional data strip (i.e., a multidimensional data strip with a risk level higher than a threshold) is selected, and a multidimensional data strip is assumed to have 50 index data, wherein the first multidimensional data strip of the 50 index data and the reference multidimensional data strip are almost the same (i.e., each difference value in the 50 index data of the two is within a set threshold range), and it is obvious that the enterprise belongs to a situation that the typical evaluation does not pass, and enterprise personnel can quickly find out the approximate reason by checking the description information of the index data forming the reference multidimensional data strip displayed on the interface. It should be noted that, although the description information separately analyzes the influence of each index data on the repayment capability, this is not comprehensive, and the neural network model may learn the influence of the relationship between the index data on the repayment capability to ensure a more comprehensive evaluation. Although the influence of the relation among the index data on the repayment capability of the enterprise cannot be seen in the explanation information, the relation is enough to help the enterprise personnel to primarily know the reason of the failure because the index data is almost the same as the typical failure condition.
As shown in fig. 2, in one embodiment, the acquiring, as a reference multidimensional data strip, a multidimensional data strip having the highest similarity with the multidimensional data strip of the target enterprise specifically includes:
step S302, in the corresponding dataset, calculating the difference between each item of index data of each multidimensional data strip in the dataset and the index data of the multidimensional data strip corresponding item of the target enterprise.
Step S304, detecting the number of approximate pairs formed by each index data of any multi-dimensional data bar and the index data of the corresponding item of the multi-dimensional data bar of the target enterprise, and judging that the higher the similarity between the multi-dimensional data bar and the multi-dimensional data bar of the target enterprise is, the more the number of approximate pairs is; the approximation pair refers to a pair of index data that the difference value between the index data in any one of the multi-dimensional data bars and the index data of the corresponding item of the multi-dimensional data bar of the target enterprise is smaller than a corresponding preset threshold value.
And step S306, taking the multidimensional data strip with the highest similarity with the multidimensional data strip of the target enterprise as a reference multidimensional data strip.
For example, assume that a piece of multidimensional data has 21 index data, where a first multidimensional data strip a is denoted as [ a0, a1, a2, …, a20], multidimensional data strip B is denoted as [ B0, B1, B2, …, B20], and multidimensional data strip C is denoted as [ C0, C1, C2, …, C20], where a0, B0 and C0 are first index data, and are index data in the multidimensional data strip for indicating an industry type of an enterprise to which the piece of multidimensional data corresponds, after the first index data is excluded, each index data of each multidimensional data strip is respectively calculated to be different from the index data of the corresponding item of the first multidimensional data strip, that is, an is respectively calculated to be different from bn and cn, where n is 1 is equal to or less than 20, and n is an integer, that is, each index data of the first multidimensional data strip is respectively calculated to be 20 times with each index data of multidimensional data strips a and B. If |a1-b1| is less than the corresponding preset threshold, then a1 and b1 are considered to form a pair of approximate pairs. Assuming that the number of approximate pairs formed between a and B is 17 and the number of approximate pairs formed between a and C is 15, the multi-dimensional data bar B is taken as a reference multi-dimensional data bar.
In one example, if the multi-dimensional data bar B belongs to the passing data set, that is, the risk assessment result of B is passing, and the risk assessment result of the first multi-dimensional data bar a is not passing, after B with the highest similarity to a is selected by the above method, it can be known that in a stack of multi-dimensional data bars, each index data of B and a has the smallest difference, but B is determined to pass and a is determined to not pass, according to the thinking of the controlled variable method, it is obvious that the index data with the largest difference between the two makes a significant difference in the risk assessment result. The two different index data comprise 1 index data for indicating the industry type of the enterprise and other 3 index data, so that the approximation calculation method provided by the embodiment can help enterprise staff or evaluation staff to quickly focus on the index data, and influence reasons of the index data on the enterprise risk level are analyzed through economic knowledge.
It should be noted that, when the industry type to which the target enterprise belongs is detected not to be included in the training data set, the index data used for indicating the industry type to which the target enterprise belongs in the first multidimensional data strip is not added into similarity calculation with other multidimensional data strips. The reason is that the degree of approximation of the other index data values (i.e. the degree of similarity of values) between two multidimensional data strips reflects the degree of approximation of two enterprises on the same index to a certain degree, but the index data corresponding to the type of the industry is artificially encoded, for example, the index data can be encoded by one byte length, different industries adopt different values, obviously, the degree of similarity of the values should be only 0 (different) or 1 (same), and other differences have no practical meaning, so that the index data need to be skipped. The multidimensional data strip in the same training set also comprises index data for the industry type of the enterprise corresponding to the multidimensional data strip, and because of the existence of the index data, when the neural network model is continuously iterated in the training process, the rules of the relation between other index data of the industry and the industry on the enterprise risk assessment result are learned, namely, the risk assessment rules of different industries can be learned by introducing the index data of the industry type. On the one hand, although the neural network learns that the rule is important, each enterprise adopts index data of the same item to form a multidimensional data strip for training, for enterprises of different industries, even if the index data of each enterprise are the same, the repayment capability (namely the risk level) of each enterprise is possibly different, so that the influence rule of the relation between the learning industry and each index data on the risk assessment can further improve the accuracy of the risk assessment when the enterprises carry out supply chain financial loan or guarantee.
On the other hand, by taking the index data of the type of the industry of the enterprise as the factors considered in training, a bank or a warranty organization can be reminded of the industry of the enterprise of the business to be handled by finer distinction, and the bank or the warranty organization is helped to pay targeted attention to the weight and influence of each index data when carrying out risk assessment on different industries. Particularly for the private industry, the traditional neural network which does not distinguish the industries from the training data can default that the multidimensional data strip corresponding to the enterprises of the private industry belongs to the situation that the training set appears, so that the risk assessment result of the private industry can be predicted according to the rules learned in other industries, and once the risk assessment rules of the private industry are greatly different from the rules of the industries in the training set, inaccurate prediction can be caused, and the trained neural network cannot be applied to the prediction of the enterprises of the private industry at the moment, so that the risk assessment of the private industry is difficult.
In practical applications, it is likely that the enterprise type to which the enterprise belongs does not belong to the enterprise type included in the training set. In general, to ensure that neural network models have better versatility, banks or warranty institutions collect as much industry data as possible for training when training models, and the trained models can generally encompass most industries in the market. In other words, industries not included in the trained model are generally industries of the masses, that is, industries of the masses refer to the industries which have just generated at present or are rarely related to supply chain financial business or loan business at present, and the commonality is that banks or warranty institutions cannot acquire enough training data of the masses to comprehensively train the model, and the model cannot acquire enough data for grasping most rules of the industries of the masses at a time for training. Therefore, only a few case data can be obtained for the public industry, and the data size is insufficient to train a model for comprehensively learning the relation between various index data and risk assessment results of the public industry, so that in the current stage, assessment staff and enterprise staff are generally required to carry out risk assessment of the public industry enterprise in a manual mode according to expert experience and investigation results, but the manual assessment has the defect of low efficiency.
In summary, the following technical contradictions are faced at present: if the neural network model is used for evaluating enterprise risks in the public industry, the evaluation efficiency can be improved; however, for the industries of the masses not included in the training set, the model may not learn some rules of the industries, so that when predicting the industries of the masses not included in the training set, there is a problem of inaccurate prediction.
In order to solve the technical contradiction, the present embodiment provides the following method, which includes:
when the evaluation result of the first multidimensional data strip is complained and then never passes through and is adjusted to pass through, detecting that the industry type of the target enterprise is not contained in the training data set and is not marked, adding the multidimensional data strip of the target enterprise into a target domain and executing migration learning operation;
as shown in fig. 3, the performing the transfer learning operation includes:
step S402, acquiring the risk assessment model trained on a source domain; the source domain is the training data set.
Step S404, a first data set containing a target domain is acquired; the format of each group of data in the target domain is the same as the format of each group of training data in the training data set, and index data used for indicating the industry type of the enterprise in each group of data in the target domain does not appear in the training data set.
Step S406, applying the risk assessment model to the first data set, and performing transfer learning to update the risk assessment model.
In this embodiment, the industry to which the target enterprise belongs is a public industry, and the public industry is not in the training dataset, so that the existing risk assessment model has an inaccurate problem for risk assessment of the target enterprise.
Therefore, it is necessary to update the model with data of the public industry in order to improve the accuracy of the evaluation, but the data size of the public industry is relatively small, further, in this embodiment, for the problem that the data size of the public industry in the supply chain is small, a method of migration learning is adopted to update the model, and the migration learning can use knowledge learned by the model trained on the original task to accelerate the training process on the new task and improve the performance, which is particularly effective when the data of the new task is relatively small.
Specifically, in the migration learning, a source domain (source domain) and a target domain (target domain) are two key concepts that describe the relationship between the original task (the model that has been trained) and the new task (the task on which the model is desired to obtain better performance). The source domain refers to the data domain where the original task is located, and comprises a data set and a label of the original task. In this embodiment, the source domain is a training data set that is used by a bank or a policy authority to train out a risk assessment model. The target domain refers to the data domain where the new task is located, including the data set and the label of the new task. The data in the target domain is used for performing migration learning on the basis of the trained risk assessment model. It should also be noted that the similarity of data between the source domain and the target domain is critical to the success of the migration learning. When the data between the source domain and the target domain have similar characteristics and structures, the knowledge learned by the pre-training model on the original task is more easily migrated to the new task, so in this embodiment, the format of each set of data in the target domain is kept the same as the format of each set of training data in the training dataset. For example, each training data constructed by 100 index data is selected in the source domain, and then the target selects 100 indexes which are the same as the source domain to construct each training data so as to ensure the similarity of the two data.
Specifically, the method for executing the migration learning operation includes:
and step A, fixing certain layers in the neural network model trained by the source domain, namely keeping the knowledge learned by the layers on the source domain.
The fixation refers to keeping parameters of certain layers in the neural network model trained by the source domain unchanged and not carrying out gradient update on the parameters of the layers in the transfer learning process. The specific implementation method is that when training the target domain data, only the parameters of the newly added layer are updated, and the parameters of the fixed layer are not updated.
Taking the neural network framework as a PyTorch example, in PyTorch we can implement the fixing by setting the Requires_grad attribute of the parameter to False. For example, if we want to fix the convolutional layer in the model named conv1, using python, we can do so:
for param in model.conv1.parameters():
param.requires_grad = False
and step B, adding one or more new layers after the reserved layers for adapting to the risk assessment requirements of the target domain.
Wherein adding new layers may be accomplished by inserting new fully connected layers, convolutional layers, or other types of layers in the neural network model. The specific implementation depends on the deep learning framework used. The newly added layer can help the model learn the characteristics of risk assessment of the industry of the masses, so that the model is more suitable for the task of the target domain.
In PyTorch, new layers can be added by defining new neural network layers and adding them to the model. For example, if we wish to add a new full connection layer (nn. Linear) to the model, using python, we can do so:
model.new_fc = torch.nn.Linear(in_features, out_features)
and step C, training the newly added layer by using a training set of the target domain so as to learn the risk assessment rule of the audience industry.
The training of the newly added layer is to perform forward propagation and backward propagation on the training set of the target domain, and update the parameters of the newly added layer. The specific implementation method depends on the deep learning framework used, and training is performed by designating layers which need to be updated.
In PyTorch, newly added layers may be trained by updating only the parameters of the newly added layers in a training loop. For example, we can create an optimizer that only optimizes the parameters of the newly added layer:
optimizer = torch.optim.Adam(model.new_fc.parameters(), lr=learning_rate)
and D, adopting a smaller learning rate in the training process so as to avoid larger influence on knowledge learned by the source domain.
The learning rate is the magnitude of parameter update in the neural network training process, and we define the learning rate to be a smaller learning rate when the learning rate ranges from 0.0001 to 0.0005. A smaller learning rate means that the magnitude of the parameter updates is smaller, thereby avoiding a larger impact on the knowledge learned by the source domain. The specific setup method depends on the deep learning framework used, and smaller learning values can be set in the training configuration.
In pyrerch, a smaller learning rate can be achieved by setting a smaller learning rate when creating the optimizer. For example, we can create an optimizer with a small learning rate (e.g., 0.0001):
learning_rate = 0.0001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
and E, applying an Early Stopping method (Early Stopping), and Stopping training under the condition that the performance of the verification set is not remarkably improved so as to prevent over fitting.
The early-stop method is to stop training in advance to prevent overfitting when the performance on the verification set is not improved significantly in the training process. The specific application method is that the performance index (such as accuracy, loss value and the like) on the verification set is monitored, and if the performance index is not improved remarkably within a certain continuous iteration number (such as 10 rounds or 20 rounds), training is stopped. The performance is not improved significantly by setting a threshold, such as verification set accuracy is improved by no more than 0.1% or loss is reduced by no more than 0.001. The specific implementation method depends on the deep learning framework used, and early stop can be realized through callback functions or condition judgment in training cycles.
In PyTorch, early-stop rules may be implemented by monitoring the validation set performance and setting early-stop conditions during a training cycle. For example, we can check the loss value on the validation set after each training round, stopping training in advance when the loss value decreases by no more than a threshold (e.g., 0.001) for several consecutive rounds (e.g., 10 rounds):
early_stop_rounds = 10
min_delta = 0.001
best_loss = float('inf')
no_improvement_rounds = 0
for epoch in range(num_epochs):
# training and validating model here
# ...
Check early stop system
if best_loss - valid_loss>min_delta:
best_loss = valid_loss
no_improvement_rounds = 0
else:
no_improvement_rounds += 1
if no_improvement_rounds>= early_stop_rounds:
print("Early stopping triggered")
break
Specifically, in pyrerch, the transfer learning can be achieved by using a trained risk assessment model as an initial model and training on a new dataset (target domain). For example, the model may be subject to transfer learning using the following python code:
# load risk assessment model
pretrained_model = torch.load("pretrained_model.pth")
Transfer risk assessment model to target domain #
target_model = pretrained_model
Desired layer of # fixed target model
for param in target_model.conv1.parameters():
param.requires_grad = False
# adding new layers in object model
target_model.new_fc = torch.nn.Linear(in_features, out_features)
# set training cycle for target model
learning_rate = 0.0001
optimizer = torch.optim.Adam(target_model.new_fc.parameters(), lr=learning_rate)
criterion = torch.nn.CrossEntropyLoss()
num_epochs = 50
early_stop_rounds = 10
min_delta = 0.001
best_loss = float('inf')
no_improvement_rounds = 0
for epoch in range(num_epochs):
Training object model on New dataset
target_model.train()
for batch in target_domain_train_dataloader:
inputs, labels = batch
optimizer.zero_grad()
outputs = target_model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# validation of object model on New dataset
target_model.eval()
valid_loss = 0
with torch.no_grad():
for batch in target_domain_valid_dataloader:
inputs, labels = batch
outputs = target_model(inputs)
loss = criterion(outputs, labels)
valid_loss += loss.item()
Check early stop system
if best_loss - valid_loss>min_delta:
best_loss = valid_loss
no_improvement_rounds = 0
else:
no_improvement_rounds += 1
if no_improvement_rounds>= early_stop_rounds:
print("Early stopping triggered")
break
# save the adjusted target model, and the obtained target model is the updated risk assessment model
torch.save(target_model, "fine_tuned_model.pth")
The above code demonstrates how to implement transfer learning in pyrerch. First, a trained risk assessment model is loaded and serves as an initial model of a target model. Next, certain layers in the target model are fixed and new layers are added. Then, a training cycle and an optimizer are set up to train with a small learning rate and early-stop rule. Finally, after training is completed on the new data set, the adjusted target model is saved. In this way, we can use transfer learning to perform risk assessment on the data set of the small business.
In summary, through step A, B, C, D, E, the migration learning of the risk assessment model on the first data set is completed, and a target model is obtained, wherein the target model has better accuracy in risk assessment on the data set of the public industry, and the risk assessment rule of the public industry is learned by using a neural network under the condition of a small amount of data.
It should be noted that, the first data set may include only the target domain, and may also include data obtained by expanding the target domain. It can be understood that the data in the target domain are all actual data collected when the enterprise of the public industry applies for the financial service of the supply chain, and when the enterprise of the public industry applies for the financial service of the supply chain, the accuracy of evaluating the risk is obviously insufficient only by means of a trained risk evaluation model, so that the risk is generally evaluated by manually extending into the analysis and investigation of the characteristics of the industry, in other words, the labels, namely the risk grades, are required to be manually assigned to the multidimensional data strips of the enterprise of the public industry.
The risk assessment model can be updated under the condition of less data volume by using the migration learning method, so that the risk assessment model is better suitable for risk assessment of the public industry. Further, in order to increase the amount of data for the transfer learning, it is necessary to provide a method for expanding the first data set required for the transfer learning.
In one embodiment, the method further comprises: the target domain expansion policy is executed before the transfer learning operation is executed.
As shown in fig. 4, the target domain expansion policy (i.e. a policy for expanding the first data set required for the migration learning) specifically includes:
step S502, when the evaluation result of the first multidimensional data strip is complained and then never passes through adjustment, and it is detected that the industry type of the target enterprise is not included in the training data set and is not marked, acquiring the multidimensional data strip with the similarity with the first multidimensional data strip not smaller than the expansion threshold value from the passing data set as the first target multidimensional data strip.
Step S504, for each first target multi-dimensional data strip, generating a corresponding first extension multi-dimensional data strip according to the following steps: and replacing index data of corresponding items in the first multi-dimensional data strip by index data which forms an approximate pair with the first multi-dimensional data strip in the first target multi-dimensional data strip to obtain a first extension multi-dimensional data strip corresponding to the first target multi-dimensional data strip.
Step S506, adding the obtained first extended multidimensional data strip to the first data set.
As shown in table 1, the results of comparing each item of index data and the similarity of a part of the first target multi-dimensional data bar and the first multi-dimensional data bar are shown. In table 1, a is a first multi-dimensional data bar, and B to M are first target multi-dimensional data bars, and in the example corresponding to the table, the expansion threshold is 80%. Taking the first target multi-dimensional data bar B as an example to describe the meaning of a table, all multi-dimensional data bars in the table respectively contain 21 index data, the index data are numbered from 0 to 20 in sequence, wherein index data with the index of 0 express the industry type of the enterprise to which the multi-dimensional data bar corresponds, and the index data are not compared when the similarity of the two multi-dimensional data bars is calculated. Wherein the "x" mark in the table means that the difference between the index data of the corresponding serial number of the first target multi-dimensional data bar and the index data of the corresponding serial number of the first multi-dimensional data bar is larger than the set range, for example, the set ranges are all 1 for index data with serial numbers 1 to 10, and all 3 for index data with serial numbers 11 to 20. For the first target multi-dimensional data bar B, the difference between the index data of 19 and 20 and the index data of 19 and 20 of the first multi-dimensional data bar is greater than 3, the difference between the index data of other 1 to 18 and the corresponding index data of the first multi-dimensional data bar is not greater than the set range, the two index data of which the difference between the corresponding index data is not greater than the set range are called approximate pairs, and then a and B have 18 approximate pairs in total. The similarity between B and a is equal to the ratio of the approximate pair number of the two to the number of index data after the index data corresponding to the first multi-dimensional data bar excluding industry, i.e., 18/20=90%, not less than 80% of the expansion threshold, so B is determined to be the first target multi-dimensional data bar.
TABLE 1 alignment table of index data and similarity of partial first target multidimensional data strip and first multidimensional data strip
In this embodiment, by executing the target domain expansion policy, the number of training data used for performing migration learning on the risk identification model in the target domain may be increased, and the increase in the data volume may improve the generalization capability of the model after the migration learning.
From the previous analysis, it is known that when the similarity between the multidimensional data strip of the enterprise in the public industry and the multidimensional data strip of the existing enterprise in the training data set is high, for example, the 17 index data are in the same range, but only 3 index data have large differences, but the evaluation results of the two enterprises are the same, the risk evaluation results are all determined to be passed (wherein, the reference data strip is determined to be passed by the risk recognition model, the multidimensional data strip of the target enterprise is determined to be not passed by the risk recognition model, but after complaints, the evaluator is determined to be passed after the industrial analysis and study), the combination of the 3 index data and the industrial differences may be the main reasons for the differences of the evaluation results.
The principle behind it is as follows: after the manual complaint, if the enterprise and the bank evaluator consider that although the enterprise and the bank evaluator have obvious differences of 3 indexes according to the characteristics and the professional knowledge of the industry, the influence of the 3 indexes on the lending capability of the industry of the masses is not negative or does not influence, that is, the influence of the 3 indexes on the risk in the industry of the masses is opposite to the influence of the existing industry on the risk, and the existing model does not learn the rule between the risk and each index data of the industry of the masses. For example, a larger value for a-index data in the a-industry means that the more repayment capacity of the industry is stronger, the lower the risk is, but a-index data may not affect the repayment capacity of the industry in the b-industry.
In other words, it is possible that the combination of the 3 index data with significant differences contains a certain rule of risk assessment in the industry of the public, which rule is obviously not mastered by the already trained risk recognition model. While combinations of these index data that have significant differences from the first multi-dimensional data bar may be represented by combinations of the crossing positions of each row in table 1. From table 1 it can be seen that there are a total of 12 different first target multi-dimensional data strips from B to M that are not identical in the position of the cross (i.e. marked "x" in the table), which means that the regularity that caused the first multi-dimensional data strip may be distributed among these 12 combinations. In other words, after the neural network model learns the combination relationship between the 12 index data, it is possible to learn that the first multidimensional data strip is still determined as a cause of passing in the case where there is a corresponding difference from B to M, respectively. The essence is to let the neural network learn the rules between these 12 index data combinations and risk assessment in the industry.
When the first data set is expanded, if each index data in each multidimensional data strip in the first data set is artificially created, the data can deviate from the actual situation of enterprise operation greatly, and the prediction result can be unavailable. As can be seen from the comparison analysis between the above-mentioned industries and the existing industries, the risk assessment of the enterprise is decisively influenced, and generally, some index data or combinations among index data, that is, the index data or the combination relation among index data has a major influence on the repayment capability of the enterprise. Therefore, our idea is as follows that the index data or combination of index data that has a major impact on the risk assessment of an enterprise in the small-scale industry must use the real data of the enterprise in that industry, while the index data in the multidimensional data strip of other industries can be directly transplanted to other secondary index data. For example, of 20 index data, 3 are index data of primary influence, and the other 17 are index data of secondary influence. In the transfer learning process, in order to enable the model to learn the rules corresponding to the real data of the 3 indexes, when the data are expanded, the 3 data are unchanged, only the other 17 data are modified, and after modification, the data are combined with the 3 data to obtain multidimensional data strips of a plurality of target domains. While the other 17 index data are modified, the following principles are required to be followed:
In addition, the relation among 17 index data may also contain a certain rule of the enterprise real experience process, so that the data cannot be fictional, but the data are matched from the multidimensional data strips in the source domain through similarity calculation, namely, in the training data set, the multidimensional data strips which can form approximate pairs with the 17 index data are selected, and the index data of the approximate pairs are directly utilized as the data of the multidimensional data strips in the target domain.
For example, when the first target multi-dimensional data strip is B, index data of a corresponding item in the first multi-dimensional data strip is replaced by index data of an approximate pair formed by the first target multi-dimensional data strip and the first multi-dimensional data strip, so as to obtain a first extended multi-dimensional data strip corresponding to the first target multi-dimensional data strip, specifically, all index data from No. 1 to No. 18 of a are replaced by index data from No. 1 to No. 18 of B (common part of a and B), and after replacement, industry type of a and index data from No. 19 and No. 20 (characteristic part of a and B) are reserved. To summarize, the data set is expanded by replacing the common part, and the characteristic part is reserved for model learning of risk assessment knowledge of the industry of the masses.
In summary, the embodiment generates more training data for the transfer learning in the target domain by executing the expansion strategy, so that the problem of insufficient training data in the transfer learning of the public industry is solved. And according to the standard of the control variable method, the expanded data well reserves main index data for carrying out risk assessment on the industries of the masses, so that the model can accurately learn rules between corresponding index data and risk assessment of the industries of the masses during transfer learning, and the accuracy of the model on the assessment of the industries of the masses is improved. Meanwhile, the index data of secondary influence is not randomly generated, but is replaced by other industry actual index data, and the data similarity between the target domain and the source domain can be well ensured because part of the data of the source domain is introduced into the first extension multidimensional data strip, and the data similarity between the source domain and the target domain is important to the success of migration learning. When the data between the source domain and the target domain have similar characteristics and structures, the knowledge learned by the pre-training model on the original task is more easily migrated to the new task.
The marking of the industry type refers to the statistical behavior of the industry type of the enterprise performing risk assessment on the belonging industry type or the acquired application appearing in the training data set of the risk identification model by the computer service system of the bank or the insurance institution. It will be appreciated that the number of times that the label is not counted is marked for all industry types that appear in the training dataset; for industry types not included in the training data set, the system may count the number of occurrences of the industry type corresponding to the company applying for the supply chain financial service, i.e., mark the number of occurrences of a certain industry type. The specific marking method is that when an enterprise applies for business, a form is filled in, various index data are filled in the form so as to generate a multi-dimensional data bar according to the content of the form, and when an evaluator inputs various data submitted by the form and the enterprise into a system, the types of industries to which the enterprise belongs are divided according to professional knowledge and investigation analysis results. The system can call the risk identification model to carry out risk assessment on a certain enterprise, complaints are proposed when the enterprise considers that the model assessment is wrong, and when the assessment result of the multidimensional data strip of the enterprise is never adjusted to pass after complaints, the model can be considered to not learn the risk assessment rule of the industry to which the enterprise belongs.
Therefore, in the foregoing embodiments, a method for migration learning is provided, where a target domain is constructed by using a multidimensional data strip corresponding to an industry that does not appear in a training data set, and migration learning is performed on an original model, so that the target model after migration information learns a risk assessment rule in a new industry. We claim the new industry as the a industry, after the model is transferred and learned once by using the multidimensional data strip of the a industry, the number of marked times is added to the industry type corresponding to the multidimensional data strip. The method is characterized in that the model learns the learning times of the risk assessment rule of the industry through transfer learning. If the marked times of the type B industry is 0, the description model does not learn the rule of the type B industry by using the multidimensional data strip containing the index data of the type B industry. And the A industry and the B industry belong to new industries or the industries of the masses which are not contained in the training data set, so in order to learn the risk prediction rules of the industries and improve the prediction accuracy of the industries, we need to learn the rules of the B industry by utilizing a multidimensional data strip containing the index data of the type of the B industry. Further, each time the transfer learning is performed on the industry data, there is a problem that the data volume in the target domain is insufficient, that is, each time the transfer learning can only learn a part of rules in one industry, it is necessary to perform multiple transfer learning on the model obtained by the transfer learning, so that the model grasps more rules in the industry in multiple transfer learning.
The following provides a method of continually iterating the model.
Step S602, when the evaluation result of the second multidimensional data strip is detected to pass from no pass adjustment after complaints, acquiring multidimensional data strips with the similarity with the second multidimensional data strip not smaller than an expansion threshold value from a pass data set as second target multidimensional data strips; the second multi-dimensional data bar is a multi-dimensional data bar whose belonging industry type is marked at least once and is not included in training data of the model.
Step S604, for each second target multi-dimensional data strip, generating a corresponding second extended multi-dimensional data strip according to the following steps: and replacing index data of corresponding items in the second multi-dimensional data strip by index data which forms an approximate pair with the second multi-dimensional data strip in the second target multi-dimensional data strip to obtain a second extension multi-dimensional data strip corresponding to the second target multi-dimensional data strip.
Step S606, adding the obtained second extended multidimensional data strip to the second data set and adding a label for indicating a risk level corresponding to the corresponding second extended multidimensional data set to each second extended multidimensional data set in the second data set.
Step S608, the risk assessment model of the latest version is obtained, and the second data set is used as the target domain to perform the migration learning operation on the risk assessment model of the latest version.
Specifically, how to perform the migration learning operation on the latest version of the risk assessment model by using the second data set as the target domain may refer to the above steps S402 to S406, and will not be described herein.
In the present embodiment, the same rule is learned by the multiple transfer learning, unlike the conventional transfer learning. In the conventional transfer learning, enough training data is collected once to form a target domain, so that one rule learning is completed once, the conventional transfer learning method is only suitable for general industries with enough training data, and when the task scope of the existing model is expanded by using the transfer learning (for example, the task of the existing model for better risk assessment on other industries), the other industries are required to be general industries, namely, the industries have enough training data, so that the transfer learning can be completed once. Obviously, the traditional transfer learning method is not suitable for the industries of the masses or the new industries with insufficient training data, but the embodiment provides the transfer learning method for the industries of the masses or the new industries with insufficient training data, and the risk assessment of the industries of the masses or the new industries is carried out based on the characteristics of the model, and when the assessment result of the second multidimensional data strip is detected to pass through the model after complaints, the training data is collected once for constructing a target domain and triggering one transfer learning, so that the risk assessment accuracy of the model to the industries of the masses or the new industries is continuously improved. The method does not need to collect sufficient training data, and is suitable for completing migration learning in the scenes that mass training data cannot be provided in the masses industry or the new industry.
FIG. 5 illustrates an internal block diagram of a computer device in one embodiment. The computer device may in particular be a terminal (or a server). As shown in fig. 5, the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by a processor, causes the processor to implement a supply chain financial risk assessment processing method. The internal memory may also have stored therein a computer program which, when executed by the processor, causes the processor to perform the supply chain financial risk assessment processing method. It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, there is provided an electronic device including: a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed performs the steps of the supply chain financial risk assessment processing method described above. The steps of the supply chain financial risk assessment process method herein may be the steps of the supply chain financial risk assessment process method of the respective embodiments described above.
In one embodiment, a computer readable storage medium is provided having stored thereon computer executable instructions for causing a computer to perform the steps of the supply chain financial risk assessment processing method described above. The steps of the supply chain financial risk assessment process method herein may be the steps of the supply chain financial risk assessment process method of the respective embodiments described above.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRA), memory bus direct RAM (RDRA), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

Claims (9)

1. A supply chain financial risk assessment processing method, the method comprising:
acquiring a first multidimensional data strip; the first multidimensional data strip is a multidimensional data strip of a target enterprise, and the multidimensional data strip is composed of a plurality of index data obtained based on the target enterprise;
inputting the first multidimensional data strip into a risk assessment model to obtain a risk level prediction result of the target enterprise; the risk assessment model is trained by using a training data set, wherein the training data set comprises a plurality of groups of training data, and each group of training data comprises a group of multidimensional data strips and a label for marking the risk level corresponding to the group of multidimensional data strips;
when the risk level prediction result of the target enterprise indicates that the risk assessment result of the target enterprise is failed, acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip as a reference multidimensional data strip according to a preset rule;
Converting the first multidimensional data strip and the reference multidimensional data strip into a preset format and displaying the format on an interface; after the reference multidimensional data strip is converted into a preset format, displaying the description information of each item of index data forming the reference multidimensional data strip on the interface;
wherein, the index data used for forming the first multidimensional data strip comprises index data used for indicating the industry type of the target enterprise;
the step of obtaining the multidimensional data strip with the highest similarity with the first multidimensional data strip as the reference multidimensional data strip according to a preset rule comprises the following steps:
and acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip from the corresponding data set as a reference multidimensional data strip according to whether the industry type of the target enterprise is contained in the training data set.
2. The method according to claim 1, wherein the plurality of index data for forming the first multi-dimensional data bar includes index data for indicating an industry type to which the target enterprise belongs;
the step of acquiring the multidimensional data strip with the highest similarity with the first multidimensional data strip as the reference multidimensional data strip according to a preset rule comprises the following steps:
When the industry type of the target enterprise is detected not to be contained in the training data set, acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip from the passing data set as a reference multidimensional data strip; the pass data set is a data set formed by multidimensional data strips with risk levels not higher than a threshold value in the training data set.
3. The method for processing financial risk assessment of a supply chain according to claim 2, wherein the step of obtaining, as the reference multidimensional data strip, the multidimensional data strip having the highest similarity to the first multidimensional data strip according to a preset rule comprises:
when the target enterprise is detected to be an enterprise of which the belonging industry type is contained in a training data set, acquiring a multidimensional data strip with highest similarity with the first multidimensional data strip from a data set as a reference multidimensional data strip; the non-passing dataset is a dataset formed by multidimensional data strips with risk levels higher than a threshold value in the training dataset.
4. A supply chain financial risk assessment processing method according to any one of claims 1 to 3, wherein the acquiring, as a reference multidimensional data strip, a multidimensional data strip having the highest similarity to the multidimensional data strip of the target enterprise, specifically comprises:
In the corresponding data set, calculating difference values between each index data of each multi-dimensional data bar in the data set and the index data of the corresponding item of the multi-dimensional data bar of the target enterprise;
detecting the number of approximate pairs formed by each index data of any multi-dimensional data bar and the index data of the multi-dimensional data bar corresponding item of the target enterprise, and judging that the higher the similarity between the multi-dimensional data bar and the multi-dimensional data bar of the target enterprise is; wherein, the approximate pair refers to a pair of index data that the difference value between the index data in any multi-dimensional data bar and the index data of the corresponding item of the multi-dimensional data bar of the target enterprise is smaller than a corresponding preset threshold value;
and taking the multidimensional data strip with the highest similarity with the multidimensional data strip of the target enterprise as a reference multidimensional data strip.
5. The supply chain financial risk assessment processing method of claim 4, further comprising:
when the evaluation result of the first multidimensional data strip is complained and then never passes through and is adjusted to pass through, detecting that the industry type of the target enterprise is not contained in the training data set and is not marked, adding the multidimensional data strip of the target enterprise into a target domain and executing migration learning operation;
The performing the transfer learning operation includes:
acquiring the risk assessment model trained on a source domain; the source domain is the training data set;
acquiring a first data set containing a target domain; the format of each group of data in the target domain is the same as that of each group of training data in the training data set, and index data used for indicating the industry type of an enterprise in each group of data in the target domain does not appear in the training data set;
and applying the risk assessment model to a first data set, and performing transfer learning to update the risk assessment model.
6. The supply chain financial risk assessment processing method of claim 5, further comprising:
before performing the transfer learning operation, performing a target domain expansion policy;
the target domain expansion strategy specifically comprises the following steps:
when the evaluation result of the first multidimensional data strip is complained and then is never adjusted to pass, and the fact that the industry type of the target enterprise is not contained in the training data set and is not marked is detected, acquiring the multidimensional data strip with the similarity with the first multidimensional data strip not smaller than an expansion threshold value from the passing data set as a first target multidimensional data strip;
For each first target multi-dimensional data strip, generating a corresponding first extension multi-dimensional data strip according to the following steps: replacing index data of corresponding items in the first multi-dimensional data bar by index data which forms an approximate pair with the first multi-dimensional data bar in the first target multi-dimensional data bar to obtain a first extension multi-dimensional data bar corresponding to the first target multi-dimensional data bar;
the resulting first extended multidimensional data strip is added to the first data set.
7. The supply chain financial risk assessment processing method of claim 6, further comprising:
when the evaluation result of the second multidimensional data strip is detected to pass from no pass adjustment to pass after complaint, acquiring multidimensional data strips with the similarity with the second multidimensional data strip not smaller than an expansion threshold value from a pass data set as second target multidimensional data strips; the second multidimensional data strip is a multidimensional data strip which is marked at least once by the industry type and is not included in training data of the model;
for each second target multi-dimensional data strip, generating a corresponding second extension multi-dimensional data strip according to the following steps: replacing index data of corresponding items in the second multi-dimensional data bar by index data which form an approximate pair with the second multi-dimensional data bar in the second target multi-dimensional data bar to obtain a second extension multi-dimensional data bar corresponding to the second target multi-dimensional data bar;
Adding the obtained second extension multi-dimensional data strip to a second data set and adding a label for indicating the corresponding risk level of the corresponding second extension multi-dimensional data set to each second extension multi-dimensional data set in the second data set;
and acquiring the risk assessment model of the latest version, and executing transfer learning operation on the risk assessment model of the latest version by taking the second data set as a target domain.
8. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the supply chain financial risk assessment processing method according to any one of claims 1 to 7 when the program is executed by the processor.
9. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the supply chain financial risk assessment processing method of any one of claims 1 to 7.
CN202310558781.5A 2023-05-18 2023-05-18 Supply chain financial risk assessment processing method and device Active CN116308829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310558781.5A CN116308829B (en) 2023-05-18 2023-05-18 Supply chain financial risk assessment processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310558781.5A CN116308829B (en) 2023-05-18 2023-05-18 Supply chain financial risk assessment processing method and device

Publications (2)

Publication Number Publication Date
CN116308829A CN116308829A (en) 2023-06-23
CN116308829B true CN116308829B (en) 2023-09-01

Family

ID=86794547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310558781.5A Active CN116308829B (en) 2023-05-18 2023-05-18 Supply chain financial risk assessment processing method and device

Country Status (1)

Country Link
CN (1) CN116308829B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097459A (en) * 2019-05-08 2019-08-06 重庆斐耐科技有限公司 A kind of financial risks appraisal procedure and system based on big data technology
CN113159482A (en) * 2021-01-05 2021-07-23 航天信息股份有限公司广州航天软件分公司 Method and system for evaluating information security risk
CN114495267A (en) * 2022-01-04 2022-05-13 哈尔滨工业大学(威海) Old people falling risk assessment method based on multi-dimensional data fusion
CN114549154A (en) * 2022-01-28 2022-05-27 南京科融数据系统股份有限公司 Financial data early warning method and system
CN115713403A (en) * 2022-11-16 2023-02-24 中证数智科技(深圳)有限公司 Enterprise risk identification method, device and equipment based on self-coding neural network
CN115907937A (en) * 2022-11-18 2023-04-04 广东海术云电子科技有限公司 Supply chain financial risk monitoring method and system based on neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11538236B2 (en) * 2019-09-16 2022-12-27 International Business Machines Corporation Detecting backdoor attacks using exclusionary reclassification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097459A (en) * 2019-05-08 2019-08-06 重庆斐耐科技有限公司 A kind of financial risks appraisal procedure and system based on big data technology
CN113159482A (en) * 2021-01-05 2021-07-23 航天信息股份有限公司广州航天软件分公司 Method and system for evaluating information security risk
CN114495267A (en) * 2022-01-04 2022-05-13 哈尔滨工业大学(威海) Old people falling risk assessment method based on multi-dimensional data fusion
CN114549154A (en) * 2022-01-28 2022-05-27 南京科融数据系统股份有限公司 Financial data early warning method and system
CN115713403A (en) * 2022-11-16 2023-02-24 中证数智科技(深圳)有限公司 Enterprise risk identification method, device and equipment based on self-coding neural network
CN115907937A (en) * 2022-11-18 2023-04-04 广东海术云电子科技有限公司 Supply chain financial risk monitoring method and system based on neural network

Also Published As

Publication number Publication date
CN116308829A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Bracke et al. Machine learning explainability in finance: an application to default risk analysis
Zhang et al. Credit risk prediction of SMEs in supply chain finance by fusing demographic and behavioral data
CN110390465A (en) Air control analysis and processing method, device and the computer equipment of business datum
AU2010212343B2 (en) Claims analytics engine
Sartori et al. Bankruptcy forecasting using case-based reasoning: The CRePERIE approach
Van Thiel et al. Artificial intelligence credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
US8984022B1 (en) Automating growth and evaluation of segmentation trees
CN111539811B (en) Risk account identification method and device
CN111429006A (en) Financial risk index prediction model construction method and device and risk situation prediction method and device
Petrides et al. Cost-sensitive learning for profit-driven credit scoring
CN112784986A (en) Feature interpretation method, device, equipment and medium for deep learning calculation result
Van Thiel et al. Artificial intelligent credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
CN111738762A (en) Method, device, equipment and storage medium for determining recovery price of poor assets
Cascarino et al. Explainable artificial intelligence: interpreting default forecasting models based on machine learning
Perko Behaviour-based short-term invoice probability of default evaluation
Zupan et al. Journal entry anomaly detection model
CN112766814A (en) Training method, device and equipment for credit risk pressure test model
CN116308829B (en) Supply chain financial risk assessment processing method and device
CN117010914A (en) Identification method and device for risk group, computer equipment and storage medium
CN112200340A (en) Block chain system for predicting escaping waste and debt
US20230237589A1 (en) Model output calibration
US20220027986A1 (en) Systems and methods for augmenting data by performing reject inference
Yeh et al. Predicting failure of P2P lending platforms through machine learning: The case in China
CN112150276A (en) Training method, using method, device and equipment of machine learning model
Dacorogna Approaches and techniques to validate internal model results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant