JP6855604B2

JP6855604B2 - How to predict short-term profits, equipment, computer devices, programs and storage media

Info

Publication number: JP6855604B2
Application number: JP2019570544A
Authority: JP
Inventors: ▲義▼文王; 健宗王; 京肖
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-04-17
Filing date: 2018-07-12
Publication date: 2021-04-07
Anticipated expiration: 2038-07-12
Also published as: JP2020524346A; CN108710965A; WO2019200742A1

Description

本出願は、２０１８年４月１７日に中国特許庁に出願された、「短期利益を予測する方法、装置、コンピューターデバイスおよび記憶媒体」と題された申請番号第２０１８１０３４５２５７９号に基づく優先権を主張し、その全ての内容は参照により本出願に組み込まれる。
本出願は、インターネット技術の分野に関し、特に、短期利益を予測する方法、装置、コンピューターデバイス、プログラムおよび記憶媒体に関する。 This application claims priority under Application No. 2018103452579, entitled "Methods, Devices, Computer Devices and Storage Media for Predicting Short-Term Profit," filed with the China Patent Office on April 17, 2018. However, all its contents are incorporated into this application by reference.
The application relates to the field of Internet technology, in particular to methods, devices, computer devices, programs and storage media for predicting short-term profits.

ブロックチェーンは、分散化された、信頼を必要としない新しいデータアーキテクチャであり、ネットワークにおける全てのノードによって共有、管理および監視され、単一の方面によって制御されない。ブロックチェーンは新しいデータアーキテクチャであるため、ブロックチェーンをレイアウトする初期段階でのデータ量が少なく、銀行などの金融機関は現在の「スモールデータ」を通じて短期的な利益予測をすることが難しく、適切な融資額を貸すことができないという問題がある。 Blockchain is a new, decentralized, trust-free data architecture that is shared, managed and monitored by all nodes in the network and is not controlled by a single direction. Since blockchain is a new data architecture, the amount of data in the initial stage of laying out the blockchain is small, and it is difficult for financial institutions such as banks to make short-term profit forecasts through the current "small data", which is appropriate. There is a problem that the loan amount cannot be lent.

本出願の主な目的は、ブロックチェーンをレイアウトする初期段階で企業に関連するデータ量が少ない場合に、企業の短期利益を予測する方法、装置、コンピューターデバイス、プログラムおよび記憶媒体を提供することである。 The main purpose of this application is to provide a method, device, computer device, program and storage medium for predicting the short-term profit of a company when the amount of data related to the company is small in the early stage of laying out the blockchain. is there.

本出願は、短期利益の予測方法を提供し、当該方法は、ブロックチェーンから取得された融資対象に関連するデータ量がプリセット量よりも少ない場合に使用され、前記予測方法は、ブロックチェーンから融資対象に関連する第１の関連データを取得するステップと、
前記第１の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うステップと、
１回目のクラスタリング計算によって取得された各クラスタに対して事前に設定された方法で回帰予測を行い、第１の予測結果を得るステップと、
前記第１の予測結果に従って融資対象の短期収益性を決定するステップと、を含む。 The present application provides a method for forecasting short-term profit, which is used when the amount of data related to the loan object obtained from the blockchain is less than the preset amount, and the forecasting method is financed from the blockchain. Steps to get the first relevant data related to the subject,
The step of inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
A step of performing regression prediction for each cluster acquired by the first clustering calculation by a preset method and obtaining the first prediction result, and
It includes a step of determining the short-term profitability of the loan target according to the first prediction result.

本出願は、短期利益の予測装置をさらに提供し、当該予測装置は、ブロックチェーンから取得された融資対象に関連するデータ量がプリセット量よりも少ない場合に使用され、前記予測装置は、
ブロックチェーンから融資対象に関連する第１の関連データを取得するための取得手段と、
前記第１の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うためのクラスタリング手段と、
１回目のクラスタリング計算によって取得された各クラスタに対して事前に設定された方法で回帰予測を行い、第１の予測結果を得るための回帰手段と、
前記第１の予測結果に従って融資対象の短期収益性を決定するための決定手段と、を含む。 The present application further provides a short-term profit forecasting device, which is used when the amount of data related to the loan object obtained from the blockchain is less than the preset amount.
An acquisition method for acquiring the first related data related to the loan target from the blockchain,
A clustering means for inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
Regression means for obtaining the first prediction result by performing regression prediction for each cluster acquired by the first clustering calculation by a preset method, and
It includes a determination means for determining the short-term profitability of the loan target according to the first prediction result.

本出願は、メモリおよびプロセッサを含むコンピューターデバイスをさらに提供し、前記メモリにコンピューター読み取り可能な命令即ちコンピュータープログラムが記憶され、前記プロセッサは前記コンピューター読み取り可能な命令を行うときに上記予測方法のステップを実現する。
本出願は、コンピューター読み取り可能な命令が記憶される不揮発性コンピューター読み取り可能な記憶媒体をさらに提供し、前記コンピューター読み取り可能な命令は、プロセッサによって実行されるときに上記予測方法のステップを実現することを特徴とする。 The present application further provides a computer device including a memory and a processor, in which a computer-readable instruction or computer program is stored in the memory, and the processor performs the steps of the prediction method when performing the computer-readable instruction. Realize.
The present application further provides a non-volatile computer-readable storage medium in which computer-readable instructions are stored, and the computer-readable instructions realize the steps of the predictive method when executed by a processor. It is characterized by.

本出願に係る短期利益を予測する方法、装置、コンピューターデバイス、プログラムおよび記憶媒体は、最初に取得された少量のデータに対してＫ−ｍｅａｎｓアルゴリズムによってクラスタリングし、次いでに回帰アルゴリズムによって予測して予測結果を取得し、最後に予測結果に従って融資対象の短期収益性を決定する。各企業のデータリンクをレイアウトする初期段階に関連するデータが少ない場合、銀行などの金融機関は融資企業の短期収益性を正確に予測できないという問題を解決し、融資対象の融資額を比較的正確に限定し、銀行機構の融資リスクを低減することに資する。 The methods, devices, computer devices, programs and storage media for predicting short-term profits according to the present application are clustered by the K-means algorithm for a small amount of data acquired first, and then predicted and predicted by the regression algorithm. Obtain the results and finally determine the short-term profitability of the loan target according to the forecast results. When there is little data relevant to the initial stages of laying out each company's data links, banks and other financial institutions solve the problem of not being able to accurately predict the short-term profitability of lenders, and the loan amount to be financed is relatively accurate. Contributes to reducing the lending risk of banking organizations.

本発明の一実施例による短期利益の予測方法を示すフローチャートである。It is a flowchart which shows the prediction method of the short-term profit by one Example of this invention. 本発明の一実施例による短期利益の予測方法を示すフローチャートである。It is a flowchart which shows the prediction method of the short-term profit by one Example of this invention. 本発明の一実施例による短期利益の予測装置の構造を示すブロック図である。It is a block diagram which shows the structure of the short-term profit forecasting apparatus according to one Example of this invention. 本発明の一実施例による回帰部ユニットの構造を示すブロック図である。It is a block diagram which shows the structure of the regression part unit by one Example of this invention. 本発明の一実施例によるクラスタリング部の構造を示すブロック図であるIt is a block diagram which shows the structure of the clustering part by one Example of this invention. 本発明の一実施例による短期利益の予測装置の構造を示すブロック図である。It is a block diagram which shows the structure of the short-term profit forecasting apparatus according to one Example of this invention. 本発明の一実施例によるコンピューターデバイスの構造を示すブロック図である。It is a block diagram which shows the structure of the computer device by one Example of this invention.

図１を参照し、本出願は、短期利益の予測方法を提供し、ブロックチェーンから取得された融資対象に関連するデータ量がプリセット量よりも少ない場合に使用される。
本出願において、銀行などの金融機関の運転資金融資は、通常、一時融資、短期融資および中期貸款に分けられ、そのうち、短期融資は、期限が通常３ヶ月から１年（３ヶ月を除き、１年を含む）となる運転資金融資である。市場の変化は不規則であるため、歴史データを利用して抽出されたルールは一定の期間において正確であるが、一定の期間が経過すると、その正確性が低下する。予測時間の範囲の長さに応じて、短期予測、中期予測および長期予測の３種類に分けることができる。一般に、予測時間の範囲が短いほど、予測品質が高くなり、逆に、予測結果の精度が低くなる。本出願において、ブロックチェーン上のデータ量がプリセット量よりも少ないことを限定条件として、本方法は、各企業のデータリンクをレイアウトする初期段階で、様々なデータが比較的少ない場合に使用されることが限定され、本出願において「プリセット量よりも少ないデータ量」は、現在の「ビッグデータ」と比較して「スモールデータ」と呼ばれることがある。 With reference to FIG. 1, the present application provides a method for forecasting short-term profits and is used when the amount of data associated with a loan object obtained from the blockchain is less than the preset amount.
In this application, working capital loans of financial institutions such as banks are usually divided into temporary loans, short-term loans and medium-term loans, of which short-term loans usually have a maturity of 3 months to 1 year (excluding 3 months, 1). It is a working capital loan that will be (including the year). Due to the irregular market changes, the rules extracted using historical data are accurate over a period of time, but after a period of time their accuracy declines. Depending on the length of the forecast time range, it can be divided into three types: short-term forecast, medium-term forecast, and long-term forecast. In general, the shorter the prediction time range, the higher the prediction quality, and conversely, the lower the accuracy of the prediction result. In this application, the method is used when various data are relatively small in the initial stage of laying out the data link of each company, provided that the amount of data on the blockchain is smaller than the preset amount. In this application, "amount of data less than the preset amount" may be referred to as "small data" as compared with the current "big data".

上記の予測方法は、
Ｓ１、ブロックチェーンから融資対象に関連する第１の関連データを取得するステップと、
Ｓ２、前記第１の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うステップと、
Ｓ３、１回目のクラスタリング計算によって取得された各クラスタに対して事前に設定された方法で回帰予測を行い、第１の予測結果を得るステップと、
Ｓ４、前記第１の予測結果に従って融資対象の短期収益性を決定するステップと、を含む。 The above prediction method is
S1, the step of acquiring the first related data related to the loan target from the blockchain,
S2, the step of inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
S3, the step of performing regression prediction for each cluster acquired by the first clustering calculation by a preset method and obtaining the first prediction result, and
S4, including the step of determining the short-term profitability of the loan target according to the first prediction result.

上記のステップＳ１で説明したように、上記の融資対象は、銀行などの金融機関に融資する必要のある企業または個人である。上記の第１の関連データは、ブロックチェーン上の融資対象に関連する全てのデータであってもよく、指定された要件に従って検索されたデータであってもよく、例えば、異なる企業またはプロジェクトに従って、ブロックチェーン上の異なるデータを取得し、調達代理融資企業を例に挙げて、それは金融機関ブロックデータ、中核企業ブロックデータ、倉庫物流ブロックデータ、ディーラーブロックデータなどを取得できる。 As described in step S1 above, the loan target is a company or individual who needs to lend to a financial institution such as a bank. The first relevant data above may be all data related to the loan object on the blockchain or may be data retrieved according to specified requirements, eg, according to different companies or projects. It can acquire different data on the blockchain and, for example, a procurement agency lending company, it can acquire financial institution block data, core company block data, warehouse distribution block data, dealer block data, and so on.

上記のステップＳ２で説明したように、上記のＫ−ｍｅａｎｓアルゴリズムは、クラスタの数ｋ、およびｎ個のデータ対象を含むデータベースを入力し、分散の最小標準を満たすｋ個のクラスタを出力するアルゴリズムである。ｋ−ｍｅａｎｓアルゴリズムは入力量ｋを受け入れ、次いで、取得されたクラスタについて、同じクラスタ内の対象の類似性が高いが、異なるクラスタ内の対象の類似性が低いということを満たすように、ｎ個のデータ対象をｋ個のクラスタに分割する。その原理は次のとおりである：最初にいくつかの中心の位置を設定し、全ての点からこれらの中心までの距離を計算し、次いで、これらの中心に属する点を見つけ、例えば、Ａ点は、中心１との距離が最も近ければ、１番に属する。１番に属する全ての点を平均して、新しい中心点を取得する。各中心に属する中心点が変更されなくなるまで繰り返し、最終的な中心位置を取得し、データのクラスタリングを完了する。 As described in step S2 above, the K-means algorithm inputs a database containing k and n data objects of clusters and outputs k clusters that meet the minimum variance standard. Is. The k-means algorithm accepts the input amount k, and then n pieces of the acquired clusters so as to satisfy that the similarities of the objects in the same cluster are high, but the similarity of the objects in different clusters is low. The data target of is divided into k clusters. The principle is as follows: First set the positions of some centers, calculate the distances from all points to these centers, then find the points that belong to these centers, for example point A. Belongs to No. 1 if the distance to the center 1 is the shortest. A new center point is obtained by averaging all the points belonging to No. 1. Repeat until the center points belonging to each center are not changed, obtain the final center position, and complete the data clustering.

本出願において、上記のステップＳ２の具体的なプロセスは以下のとおりである：
Ｓ２１、与えられた、ｎ個のｄ次元のデータ点を含む関連データのデータセット（第１の関連データ）Ｘ＝｛ｘ_１、ｘ_２、…、ｘ_ｎ｝に対して、ここで、ｘ_ｉ∈Ｒ^ｄ、データセットにおけるＫ個の点を選択して初期のクラスタ中心として、各対象は１種類の中心μ_ｋ（ｋ＝１、２、…、Ｋ）を表す。 In this application, the specific process of step S2 above is as follows:
S21, for a given data set of related data containing n d-dimensional data points (first related data) X = {x ₁ , x ₂ , ..., X _n }, here, x _i ∈ R ^d , with K points in the dataset selected as the initial cluster centers, each object represents one type of center μ _k (k = 1, 2, ..., K).

Ｓ２２、各点から中心μ_ｋまでのユークリッド距離を計算し、距離の最も近い基準に従ってそれらを最も類似したクラスタ中心で表されるクラスにそれぞれ割り当てて、Ｋ個のクラスタＣ＝｛ｃ_ｋ、ｋ＝１、２、…、ｋ｝を形成する。各クラスタｃ_ｋは１つのクラスを表す。当該クラスの各点からクラスタ中心μ_ｋまでの距離の二乗和Ｊ（ｃ_ｋ）を計算する。すなわち、

Ｓ２３、各クラスのサンプルからそれが属するクラスのクラスタ中心μｋまでの合計距離の二乗和を最小になるまで計算する。

式おいて、

の場合、クラス内の全ての対象の平均値を当該クラスの新しいクラスタ中心として計算する。
Ｓ２４、クラスタ中心と値が変化したかどうかを判断し、変化した場合はステップＳ２２に戻り、変化しなかった場合にクラスタを終了する。 S22, the Euclidean distance from each point to the center mu _k calculated, distance allocated to the class represented by the most similar cluster centers them according nearest criteria, K-number of cluster C = {c _k, k = 1, 2, ..., K} is formed. Each cluster _kk represents one class. Calculate the sum of squares J (c _k ) of the distance from each point in the class to the cluster center μ _k. That is,

S23, Calculate the sum of squares of the total distance from the sample of each class to the cluster center μk of the class to which it belongs until it becomes the minimum.

In the formula

In the case of, the average value of all the objects in the class is calculated as the new cluster center of the class.
S24, it is determined whether or not the value has changed from the cluster center, and if it has changed, the process returns to step S22, and if it has not changed, the cluster is terminated.

本出願は、Ｋ−ｍｅａｎｓアルゴリズムを利用してデータのクラスタリングを行い、簡単かつ迅速で、アルゴリズムはスケーラビリティと高効率を維持し、クラスタがガウス分布に近づける場合、より良い効果が得られる。
上記のステップＳ３で説明したように、上記の回帰予測は、予測の関連性の原則に基づいて、予測目標に影響を与える各要因を見つけて、次いで、これらの要因と予測目標との間の関数関係の類似表現を見つけて、数学の方法で見つける。上記の第１の予測結果は、１回目のクラスタリング計算により得られた各クラスタを事前に設定された方法の回帰予測によって算出された結果であり、また、上記の第１の関連データが融資対象の関連データであるため、第１の予測結果は、ある程度、融資対象の短期間内の収益性を反映できる。回帰予測の基本的なステップは以下のとおりである、すなわち、１、予測目標に従って、独立変数および従属変数を決定する。具体的には、予測される特定の目標を決定し、従属変数も決定される。予測される特定の目標が次年度の販売量である場合、販売量Ｙは従属変数である。市場調査と資料調査を通じて、予測目標に関連する影響因子、つまり独立変数を見つけ、その中から主な影響因子を選択する。２、回帰予測モデルを確立する。具体的には、独立変数および従属変数の履歴統計資料に従って計算し、これに基づいて回帰分析方程式、すなわち回帰予測モデルを確立する。３、相関分析を行う。具体的には、回帰分析は、因果関係を有する影響因子（独立変数）および予測対象（従属変数）に対して実行される数学的統計分析処理である。確立された回帰方程式は、変数と従属変数との間に関係がある場合のみ意味がある。従って、独立変数としての要因が従属変数としての予測対象に関連するかどうか、どの程度関連するか、およびこのような関連程度を判断する把握性は、回帰分析を行うときに解決する必要がある問題となる。通常、相関分析には相関関係の算出が必要であり、相関係数の大きさに従って独立変数と従属変数との間の関連程度を判断する。４、回帰予測モデルを検証し、予測誤差を計算する。具体的には、回帰予測モデルが実際の予測に使用できるかどうかは、回帰予測モデルに対する検証および予測誤差への計算によって決められる。回帰方程式は、様々な検証に合格し、予測誤差が小さい場合のみ、回帰方程式を予測モデルとして予測できる。５、予測値を計算して決定する。具体的には、回帰予測モデルを利用して予測値を計算し、予測値を総合的に分析し、最終的な予測値を決定する。本出願において、まずデータをクラスタリングし、次いでにクラスタリングされた後のデータを回帰予測し、予測速度がより速くなる。 The present application clusters data using the K-means algorithm, which is simple and fast, the algorithm maintains scalability and high efficiency, and a better effect is obtained when the cluster approaches a Gaussian distribution.
As described in step S3 above, the regression prediction above finds each factor influencing the prediction goal based on the principle of prediction relevance, and then between these factors and the prediction goal. Find similar representations of function relationships and find them mathematically. The above-mentioned first prediction result is a result calculated by regression prediction of each cluster obtained by the first clustering calculation by a preset method, and the above-mentioned first related data is a loan target. Since it is the related data of, the first prediction result can reflect the profitability of the loan target within a short period of time to some extent. The basic steps of regression prediction are as follows: 1. Determine the independent and dependent variables according to the prediction goals. Specifically, it determines the specific goals to be predicted and the dependent variables. If the particular target expected is the sales volume for the next year, the sales volume Y is the dependent variable. Through market research and data research, find the influential factors related to the forecast target, that is, the independent variables, and select the main influential factors from them. 2. Establish a regression prediction model. Specifically, the calculation is performed according to the historical statistical data of the independent variable and the dependent variable, and the regression analysis equation, that is, the regression prediction model is established based on the calculation. 3. Perform correlation analysis. Specifically, regression analysis is a mathematical statistical analysis process performed on influential factors (independent variables) and prediction targets (dependent variables) that have a causal relationship. The established regression equation is meaningful only if there is a relationship between the variable and the dependent variable. Therefore, whether or not the factor as the independent variable is related to the prediction target as the dependent variable, how much it is related, and the graspability to judge such the degree of relation need to be resolved when performing the regression analysis. It becomes a problem. Correlation analysis usually requires the calculation of the correlation, and the degree of association between the independent variable and the dependent variable is determined according to the magnitude of the correlation coefficient. 4. Verify the regression prediction model and calculate the prediction error. Specifically, whether the regression prediction model can be used for actual prediction is determined by verification of the regression prediction model and calculation of prediction error. The regression equation can be predicted as a prediction model only when it passes various verifications and the prediction error is small. 5. Calculate and determine the predicted value. Specifically, the prediction value is calculated using the regression prediction model, the prediction value is comprehensively analyzed, and the final prediction value is determined. In the present application, the data is first clustered, and then the clustered data is regression-predicted to increase the prediction speed.

上記のステップＳ４で説明したように、第１の予測結果に従って融資対象の短期収益性を決定する。そして、銀行などの金融機関はその収益性に従って上記の融資対象の融資額、つまり上記の融資対象の融資額上限を決定できる。上記の第１の予測結果はレベルを表す数字であってよく、例えば、レベル１〜１０に分けられ、レベルが上がると、融資対象の短期収益性が高くなり、それに応じてその融資額も高くなり、本実施例では、融資額はさらに融資対象の登録資本、市場価値などのデータに関連する。 As described in step S4 above, the short-term profitability of the loan target is determined according to the first prediction result. Then, a financial institution such as a bank can determine the loan amount of the above-mentioned loan target, that is, the upper limit of the loan amount of the above-mentioned loan target according to its profitability. The first prediction result above may be a number representing the level, for example, it is divided into levels 1 to 10, and as the level goes up, the short-term profitability of the loan target increases, and the loan amount also increases accordingly. Therefore, in this embodiment, the loan amount is further related to data such as the registered capital to be loaned and the market value.

本実施例では、上記の１回目のクラスタリング計算によって取得された各クラスタに対して事前に設定された方法で回帰予測を行うステップＳ３は、
算出された各クラスタを事前に設定されたＳＶＲ予測モデルに入力して回帰予測を行うステップＳ３１を含む。 In this embodiment, step S3 in which regression prediction is performed by a preset method for each cluster acquired by the above first clustering calculation is
The step S31 is included in which each calculated cluster is input to a preset SVR prediction model to perform regression prediction.

上記のステップＳ３１に説明したように、上記のＳＶＲ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＲｅｇｒｅｓｓｉｏｎ、サポートベクトル回帰）は、サポートベクターマシン（ＳＶＭ）の重要な応用ブランチである。本実施例では、目的関数を最小化することによって回帰関数を決定し、回帰関数はｆ（ｘ）＝ｗｘ＋ｂである。その具体的なプロセスは次のとおりである：

As described in step S31 above, the SVR (Support Vector Regression) described above is an important application branch of a Support Vector Machine (SVM). In this embodiment, the regression function is determined by minimizing the objective function, and the regression function is f (x) = wx + b. The specific process is as follows:

２００２年に提出されたｖ−ＳＶＣと同様に、

の不等式は等式で置き換えることができる。しかも、ユーザーはしばしば C＝１に類似した小さな定数を選択するため、C/lが小さすぎることになる。従って、ＬＩＢＳＶＭ（台湾大学のＬｉｎＣｈｉｈ−Ｊｅｎ教授らによって開発および設計された、簡単で使いやすく、高速で効率的なＳＶＭモード認識および回帰のソフトウェアパッケージ）では、ユーザーが指定したパラメータをC/lとし、つまり、

はユーザーによって指定され、ＬＩＢＳＶＭは次の問題を解決する：

ε-SVRがパラメータ

の下で取得された解は、v-SVRがパラメータ

の下で取得された解と同様である。 Similar to the v-SVC submitted in 2002,

The inequality of can be replaced by the equation. Moreover, users often choose small constants similar to C = 1, resulting in C / l being too small. Therefore, in LIBSVM (a simple, easy-to-use, fast and efficient SVM mode recognition and regression software package developed and designed by Professor Lin Chih-Jen of National Taiwan University), user-specified parameters are C / l. That is,

Is specified by the user and LIBSVM solves the following problems:

ε-SVR is a parameter

The solution obtained under is a parameter of v-SVR

Similar to the solution obtained under.

上式において、ｌはトレーニングサンプルの数であり、ここでｌ＝ｋ、Ｃは平衡モデルの複雑さ(1/2)w^Twとトレーニング誤差項の重みパラメータであり、εは不感損失関数であり、ζは緩和因子である。K(x_i,x)はカーネル関数である。
上記のＳＶＲ（サポートベクトル回帰アルゴリズム）は主に、クラスタリング結果を次元上げして、高次元空間で線形決定関数を構築することによって線形回帰を実現し、ｅ不感損失関数を使用する場合、その基本は主にｅ不感損失関数およびカーネル関数アルゴリズムである。フィッティングした数学モデルが多次元空間でのある曲線を表す場合、ｅ不感損失関数から得られた結果は、当該曲線およびトレーニング点の「ｅパイプ」を含む。全てのサンプル点のうち、「パイプ壁」に分布するサンプル点の部分のみによってパイプの位置を決定する。トレーニングサンプルのこの部分は、「サポートベクトル」と呼ばれる。トレーニングサンプル集合の非線形に対応するために、従来のフィッティング方法では通常、線形方程式の後に高次項を追加する。この方法は効果的であるが、調整可能なパラメータを増やすとオーバーフィッティングのリスクが高まる。ＳＶＲはカーネル関数を採用することによってこの矛盾を解決した。線形方程式中の線形項をカーネル関数で置き換えると、元の線形アルゴリズムを「非線形化」にすることができ、つまり、非線形回帰を実行できる。同時に、カーネル関数の導入は「次元上げ」の目的を達成し、増加した調整可能なパラメータはオーバーフィッティングでも制御されることができる。本出願では、成熟した技術を備えたＳＶＲアルゴリズムが使用され、計算結果は信頼でき、さらに正確な予測の効果を達成できる。 In the above equation, l is the number of training samples, where l = k, C is the equilibrium model complexity (1/2) w ^T w and the weight parameter of the training error term, and ε is the dead loss function. Yes, ζ is a relaxation factor. K (x _i , x) is a kernel function.
The above SVR (Support Vector Regression Algorithm) mainly realizes linear regression by increasing the size of the clustering result and constructing a linear decision function in a high-dimensional space, and when using the e-dead loss function, the basics. Are mainly e dead loss functions and kernel function algorithms. If the fitted mathematical model represents a curve in multidimensional space, the results obtained from the e-dead loss function include the "e-pipe" of the curve and training points. Of all the sample points, the position of the pipe is determined only by the part of the sample points distributed on the "pipe wall". This part of the training sample is called the "support vector". To accommodate the non-linearity of the training sample set, traditional fitting methods usually add a higher order term after the linear equations. While this method is effective, increasing the adjustable parameters increases the risk of overfitting. SVR resolved this contradiction by adopting kernel functions. Replacing the linear terms in a linear equation with a kernel function can make the original linear algorithm "non-linear", that is, perform non-linear regression. At the same time, the introduction of kernel functions achieves the purpose of "elevation", and the increased adjustable parameters can also be controlled by overfitting. In this application, an SVR algorithm with mature technology is used, the calculation result is reliable, and the effect of more accurate prediction can be achieved.

一実施例において、上記の前記第１の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うステップＳ２は、
前記第１の関連データに対して特徴抽出を行うステップＳ２１と、
抽出された特徴データに対して関連性分析を行い、他の特徴データに関連しない無相関特徴データを得るステップＳ２２と、
前記第１の関連データにおいて前記無相関特徴データに対応する第１の関連データをクリアした後、Ｋ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うステップＳ２３と、を含む。 In one embodiment, step S2 in which the first related data described above is input to the K-means algorithm and the first clustering calculation is performed is
Step S21 for extracting features from the first related data,
In step S22, the extracted feature data is subjected to a relevance analysis to obtain uncorrelated feature data that is not related to other feature data.
The first related data includes, after clearing the first related data corresponding to the uncorrelated feature data, a step S23 of inputting to the K-means algorithm and performing the first clustering calculation.

上記のステップＳ２０１からＳ２０３に説明したように、上記の融資対象に関連する第１の関連データに対して特徴抽出を行い、関連性分析を行い特徴データにおいて他の特徴データに関連しない無相関特徴データを見つけて、次いで、これらの無相関特徴データに対応する第１の関連データを第１の関連データから削除し、残された第１の関連データを使用してクラスタリング計算し、得られたクラスタがより正確になり、無相関特徴データに対応する第１の関連データが削除されたため、クラスタリング計算の効率を向上させる。 As described in steps S201 to S203 above, feature extraction is performed on the first related data related to the loan target, relevance analysis is performed, and the feature data is uncorrelated features that are not related to other feature data. The data was found, then the first related data corresponding to these uncorrelated feature data was removed from the first related data, and the remaining first related data was used for clustering calculations to obtain the data. The cluster becomes more accurate and the first relevant data corresponding to the uncorrelated feature data is deleted, thus improving the efficiency of the clustering calculation.

本実施例では、第１の関連データに対して特徴抽出を行う方法は、具体的に、Ｒｅｌｉｅｆアルゴリズム（Ｒｅｌｉｅｆアルゴリズムは特徴重みアルゴリズム（Ｆｅａｔｕｒｅｗｅｉｇｈｔｉｎｇａｌｇｏｒｉｔｈｍｓである）であり、各特徴およびクラスの関連性に従って特徴の異なる重みを与え、重みが特定の閾値より小さい特徴は削除される）を使用して特徴抽出を行う。Ｒｅｌｉｅｆアルゴリズムは、トレーニング集合ＤからサンプルＲをランダムに選択し、そして、Ｒと同じクラスに属するサンプルからＮｅａｒＨｉｔと呼ばれる最近傍サンプルＨを検索し、Ｒと異なるクラスに属するサンプルからＮｅａｒＭｉｓｓと呼ばれる最近傍サンプルＭを検索し、その後、次のルールに従って各特徴の重みを更新する。すなわち、特定の特徴でＲとＮｅａｒＨｉｔとの間の距離がＲとＮｅａｒＭｉｓｓとの間の距離より小さい場合、当該特徴が同じクラスと異なるクラスの最近傍を区別することに役立つことが示され、当該特徴の重みを増やす。逆に、特定の特徴でＲとＮｅａｒＨｉｔとの間の距離がＲとＮｅａｒＭｉｓｓとの間の距離より大きい場合、当該特徴が同じクラスと異なるクラスの最近傍を区別することにマイナスの影響を与えることが示され、当該特徴の重みを減らす。上記のプロセスをｍ回繰り返し、最後に各特徴の平均重みを取得する。特徴の重みが大きいほど、当該特徴の分類能力が強くなり、逆に、当該特徴の分類能力が弱くなる。Ｒｅｌｉｅｆアルゴリズムの実行時間は、サンプルのサンプリング回数ｍおよび元の特徴の数Ｎの増加につれて線形増加するため、実行効率が非常に高くなる。具体的なアルゴリズムは以下のとおりである： In this embodiment, the method of performing feature extraction on the first related data is specifically a Relief algorithm (the Relief algorithm is a feature weighting algorithm), and the relevance of each feature and class. The features are given different weights according to the above, and features whose weights are smaller than a specific threshold are deleted). The Near miss algorithm randomly selects sample R from the training set D, searches for the nearest neighbor sample H called Near Hit from samples belonging to the same class as R, and recently called Near Miss from samples belonging to a class different from R. The near-sample M is searched, and then the weight of each feature is updated according to the following rule. That is, it has been shown that when the distance between R and Near Hit is smaller than the distance between R and Near Miss for a particular feature, the feature helps to distinguish the nearest neighbors of the same class and different classes. , Increase the weight of the feature. Conversely, if the distance between R and Near Hit is greater than the distance between R and Near Miss for a particular feature, it will have a negative effect on distinguishing the nearest neighbors of the same class and different classes. Shown to give and reduce the weight of the feature. The above process is repeated m times, and finally the average weight of each feature is obtained. The heavier the weight of a feature, the stronger the classification ability of the feature, and conversely, the weaker the classification ability of the feature. Since the execution time of the Relief algorithm increases linearly as the number of sampling times m of the sample and the number N of the original features increase, the execution efficiency becomes very high. The specific algorithm is:

トレーニングデータセットをＤとし、サンプルのサンプリング回数をｍとし、特徴重みの閾値をδとし、最近傍サンプルの数を各特性の特徴重みＴとして出力される：
１、全ての特徴重みを０に設定し、Ｔを空集合とする。
２、ｆｏｒｉ＝１ｔｏｍｄｏ
１）、サンプルＲをランダムに選択する；
２）、同じクラスに属するサンプル集合からＲの最近傍Ｈを見つけて、異なるクラスに属するサンプル集合から最近傍サンプルＭを見つける。
３）、ｆｏｒＡ＝１ｔｏＮｄｏ
W(A)=W(A)-diff(A,R,H)/m+diff(A,R,M)/m
３、ｆｏｒＡ＝１ｔｏＮｄｏ
ｉｆ W(A)≧δ
Ａ番目の特徴をＴに追加する。 The training data set is D, the number of sample samplings is m, the threshold of the feature weight is δ, and the number of nearest neighbor samples is output as the feature weight T of each characteristic:
1. Set all feature weights to 0 and let T be the empty set.
2, for i = 1 to m do
1), sample R is randomly selected;
2) Find the nearest neighbor H of R from the sample sets belonging to the same class, and find the nearest neighbor sample M from the sample sets belonging to different classes.
3), for A = 1 to N do
W (A) = W (A) -diff (A, R, H) / m + diff (A, R, M) / m
3, for A = 1 to N do
if W (A) ≧ δ
Add the Ath feature to T.

一実施例において、上記の抽出された特徴データに対して関連性分析を行い、他の特徴データに関連しない無相関特徴データを得るステップＳ２０２は、以下を含む：
Ｓ２０２１、前記特徴データを散布図として作成し、前記散布図における離散点に対応する特徴データを前記無相関特徴データとして記録する。 In one example, step S202 of performing a relevance analysis on the extracted feature data to obtain uncorrelated feature data not related to the other feature data includes:
S2021, the feature data is created as a scatter diagram, and the feature data corresponding to the discrete points in the scatter diagram is recorded as the uncorrelated feature data.

上記のステップＳ２０２１で説明したように、上記の散布図（ｓｃａｔｔｅｒｄｉａｇｒａｍ）は、回帰分析においてデカルト座標系平面上のデータ点の分布を指し、通常はクラス間の集計データを比較するために使用される。散布図に含まれるデータが多いほど、比較する効果がよりよくなる。本実施例において、上記の特徴データは一般に行列であり、この場合、散布図行列を利用して各独立変数間の散布図を同時に描くことができ、こうして複数の変数間の主な関連性を迅速に見つけることができる。上記の特徴データを散布図に作成するプロセスは視覚化のプロセスであり、特徴データが視覚化されるため、肉眼でグラフまたは画像上の離散点の存在を直感的に識別し、そして離散点を選択することができ、コンピューターデバイスは、選択された離散点に対応する特徴データを無相関特徴データとして記録する。 As described in step S2021 above, the scatter diagram above refers to the distribution of data points on the Cartesian coordinate system plane in regression analysis and is typically used to compare aggregated data between classes. To. The more data contained in the scatter plot, the better the comparison effect. In this embodiment, the above feature data is generally a matrix, in which case a scatter plot matrix can be used to draw a scatter plot between each independent variable at the same time, thus showing the main relationships between the plurality of variables. Can be found quickly. The process of creating the above feature data in a scatter plot is a visualization process, and because the feature data is visualized, the presence of discrete points on the graph or image can be intuitively identified with the naked eye, and the discrete points can be identified. It can be selected and the computer device records the feature data corresponding to the selected discrete points as uncorrelated feature data.

別の実施例において、上記の抽出された特徴データに対して関連性分析を行い、他の特徴データに関連しない無相関特徴データを得るステップＳ２０２は、以下を含む：
Ｓ２０２２、前記特徴データに対して関連行列分析を行い、他の特徴データに関連しない前記無相関特徴データを抽出する。 In another embodiment, step S202 of performing a relevance analysis on the extracted feature data to obtain uncorrelated feature data not related to the other feature data includes:
S2022, the related matrix analysis is performed on the feature data, and the uncorrelated feature data not related to other feature data is extracted.

上記のステップＳ２０２２で説明したように、上記の関連行列は、相関係数行列とも呼ばれ、行列の各列間の相関係数から構成される。つまり、関連行列のｉ行目のｊ列目の要素は、元の行列のｉ列目とｊ列目の相関係数である。本実施例において、普通は、共分散行列を用いて分析し、共分散は、２つの変数の全体誤差を測定するために使用され、２つの変数の変化傾向が一致する場合、共分散は正の値であり、２つの変数が正の相関であることが示される。２つの変数が反対方向に変化する場合、共分散は負の値であり、２つの変数が負の相関であることが示される。２つの変数が互いに独立している場合、共分散は０であり、２つの変数が無関係であることが示され、変数が３組以上である場合、対応する共分散行列が使用される。 As described in step S2022 above, the related matrix is also called a correlation coefficient matrix and is composed of correlation coefficients between each column of the matrix. That is, the element of the i-th row and the j-th column of the related matrix is the correlation coefficient between the i-th column and the j-th column of the original matrix. In this example, the covariance is typically analyzed using a covariance matrix, the covariance is used to measure the overall error of the two variables, and the covariance is positive if the two variables have the same tendency to change. The value of, indicating that the two variables are positively correlated. If the two variables change in opposite directions, the covariance is a negative value, indicating that the two variables are negatively correlated. If the two variables are independent of each other, the covariance is 0, indicating that the two variables are unrelated, and if there are three or more sets of variables, the corresponding covariance matrix is used.

図２を参照し、在本実施例では、上記の前記第１の予測結果に従って融資対象の短期収益性を決定するステップＳ４の後、以下を含む：
Ｓ５、非ブロックチェーン上の前記融資対象に関連する第２の関連データを取得する。
Ｓ６、前記第２の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、２回目のクラスタリング計算を行う。
Ｓ７、２回目のクラスタリング計算によって得られた各クラスタに対して事前に設定された方法で回帰予測を行い、第２の予測結果を取得する。
Ｓ８、前記第１の予測結果と前記第２の予測結果との差が事前に設定された閾値よりも小さいかどうかを判断する。
Ｓ９、前記差が前記閾値よりも小さい場合、前記第１の予測結果に従って融資対象の短期収益性を決定した結果は使用可能な結果であると判断する。 With reference to FIG. 2, in the present embodiment, after step S4 in which the short-term profitability of the loan target is determined according to the above-mentioned first prediction result, the following is included:
S5, Acquire a second related data related to the loan target on the non-blockchain.
S6, the second related data is input to the K-means algorithm, and the second clustering calculation is performed.
S7: Regression prediction is performed for each cluster obtained by the second clustering calculation by a preset method, and the second prediction result is acquired.
S8, it is determined whether or not the difference between the first prediction result and the second prediction result is smaller than the preset threshold value.
S9, When the difference is smaller than the threshold value, it is determined that the result of determining the short-term profitability of the loan target according to the first prediction result is a usable result.

上記のステップＳ５からＳ９で説明したように、上記の非ブロックチェーン上の第２の関連データとは、ブロックチェーンに記録されていないデータ、通常はビッグデータネットワーク内のデータを指す。第２の関連データのクラスタリングアルゴリズムおよび回帰予測方法は、上記の第１の関連データと同一であり、ここで再度の説明を省略する。本実施例では、第１の関連データに従って得られた第１の予測結果を第２の関連データに従って得られた第２の予測結果と比較し、つまり、第１の予測結果が利用可能かどうかを判断するための検証ステップを設定する。本出願において、主にブロックチェーンをレイアウトする初期段階を狙うため、各企業の歴史データの多くは、企業自身のサーバーや企業に関連する他の企業のサーバーなどのビッグデータのインターネット上に存在し、インターネット環境にある限り、入手することが可能である。本ステップにおいて、主にインターネット上の「ビッグデータ」を利用して得られた第２の予測結果によって、ブロックチェーン上の「スモールデータ」を利用して得られた第１の予測結果を検証し、第２の予測結果と第１の予測結果との差が事前に設定された閾値よりも小さい場合のみ、第１の予測結果が実質的に正しく、使用できると判断する。 As described in steps S5 to S9 above, the second related data on the non-blockchain refers to data that is not recorded on the blockchain, usually data in a big data network. The clustering algorithm and regression prediction method of the second related data are the same as those of the first related data described above, and the description thereof will be omitted again here. In this embodiment, the first prediction result obtained according to the first related data is compared with the second prediction result obtained according to the second related data, that is, whether the first prediction result is available. Set up verification steps to determine. In this application, most of the historical data of each company exists on the big data Internet such as the server of the company itself or the server of other companies related to the company, mainly aiming at the initial stage of laying out the blockchain. , It is possible to obtain it as long as it is in the Internet environment. In this step, the first prediction result obtained by using "small data" on the blockchain is verified by the second prediction result obtained mainly by using "big data" on the Internet. , It is determined that the first prediction result is substantially correct and can be used only when the difference between the second prediction result and the first prediction result is smaller than the preset threshold value.

一実施例において、上記の前記第１の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うステップＳ２の前、以下を含む：
Ｓ２０１、前記第１の関連データのデータ量が事前に設定されたデータ閾値よりも大きいかどうかを判断する。
Ｓ２０２、そうであれば、前記第１の関連データを事前に設定されたビッグデータに基づく予測アルゴリズムに入力して予測する。 In one embodiment, before step S2 in which the first related data described above is input to the K-means algorithm and the first clustering calculation is performed, the following is included:
S201, it is determined whether the data amount of the first related data is larger than the preset data threshold.
S202, if so, the first related data is input to a prediction algorithm based on preset big data to make a prediction.

上記のステップＳ２０１およびＳ２０２で説明したように、データ閾値が設定され、取得された第１の関連データのデータ量がデータ閾値よりも大きい場合、上記の短期利益の予測方法が適用される「スモールデータ」の範囲から逸脱しているため、その後のクラスタリング、回帰予測などのステップを停止し、予測方法を切り替える。具体的な切り替え方法は、取得された第１の関連データを、ＴＤ−ＡＢＣモデルに基づく企業利益モデルなどの事前に設定された既存の比較的成熟した予測モデルに入力してよい。 As described in steps S201 and S202 above, when a data threshold is set and the amount of acquired first related data is greater than the data threshold, the short-term profit forecasting method described above applies to "small". Since it deviates from the range of "data", the subsequent steps such as clustering and regression prediction are stopped, and the prediction method is switched. As a specific switching method, the acquired first relevant data may be input to a preset existing relatively mature forecast model such as a corporate profit model based on the TD-ABC model.

一実施例において、上記の第１の関連データには不正データが含まれるかどうかをさらに分析してもよく、具体的な方法として、取得された第１の関連データに対して特徴抽出を行い、特徴データを得て、前記特徴データから他の特徴データに関連しない無相関特徴データを抽出して、次いで、Ｖｏｒｏｎｏｉアルゴリズムによって前記無相関特徴データに対して外れ値の認識を行い、不正データを得る。不正データの量によって、融資対象の評判値を分析できる。そして、評判値と短期収益性に基づいて、融資対象の融資額を決定する。 In one embodiment, it may be further analyzed whether or not the first related data described above contains fraudulent data, and as a specific method, feature extraction is performed on the acquired first related data. , Feature data is obtained, uncorrelated feature data not related to other feature data is extracted from the feature data, and then outliers are recognized for the uncorrelated feature data by the Voronoi algorithm, and invalid data is obtained. obtain. The amount of fraudulent data can be used to analyze the reputation value of a loan target. Then, the loan amount to be loaned is determined based on the reputation value and short-term profitability.

特定の実施例において、企業ａは、銀行Ｐから融資する必要があり、銀行Ｐは、企業ａを評価する必要があり、その評価のプロセスは次のとおりである：１、ブロックチェーンから企業ａの販売データ、生産データ、財務データなど、当該企業ａに関連する全てのデータを収集する。その後、取得されたデータに対して特徴抽出を行い、不要なデータを事前に削除し、後続のクラスタリング計算の速度および効率を高める。具体的な削除方法は、最初に抽出されたデータを散布図として視覚的に形成し、その後、散布図中の離散点を削除する。２、ブロックチェーンから取得された企業ａのデータに対してＫ−ｍｅａｎｓアルゴリズムによってクラスタリング計算を行う。３、クラスタリング計算の結果に対してＳＶＲ回帰予測を行い、さらに当該企業ａの収益性などの結果を得る。４、上記の不正データの認識方法を通じて企業ａの信用などをさらに判断する。５、銀行Ｐは、企業ａの信用、収益性などに従って、企業ａに融資できるかどうか、および最大融資限度などを決定する。具体的には、企業ａの信用がプリセット値よりも小さい場合、企業ａへの融資が拒否される。企業ａの信用がプリセット値である場合、企業ａに融資ができ、この場合、当該企業ａの収益性と組み合わせて、最大の融資限度などを計算することによって、リスクを回避する銀行Ｐの能力を効果的に向上させる。具体的には、取得された企業ａのデータリンク上のデータは、調達する商品の種類や当該調達資金のデータ、税関輸出品、関税、輸入品、関税、国内販売データ、販売製品データ、融資データ、返済信用データ、在庫データ、物流関連データ（倉庫の数量、倉庫の地理的分布、各倉庫の保管データ、販売地域の分布）などを含む。 In a particular embodiment, the company a needs to finance from the bank P, the bank P needs to evaluate the company a, and the evaluation process is as follows: 1, from the blockchain to the company a. Collect all data related to the company a, such as sales data, production data, and financial data. After that, feature extraction is performed on the acquired data, unnecessary data is deleted in advance, and the speed and efficiency of subsequent clustering calculations are increased. The specific deletion method is to visually form the first extracted data as a scatter plot, and then delete the discrete points in the scatter plot. 2. Clustering calculation is performed on the data of company a acquired from the blockchain by the K-means algorithm. 3. SVR regression prediction is performed on the result of the clustering calculation, and the profitability of the company a is obtained. 4. Further judge the credit of the company a through the above-mentioned recognition method of fraudulent data. 5. Bank P determines whether or not it can lend to company a and the maximum loan limit, etc., according to the credit and profitability of company a. Specifically, if the credit of the company a is smaller than the preset value, the loan to the company a is refused. If the credit of the company a is a preset value, the loan can be made to the company a. In this case, the ability of the bank P to avoid the risk by calculating the maximum loan limit in combination with the profitability of the company a. Effectively improve. Specifically, the data on the acquired company a data link includes data on the type of goods to be procured, data on the funds raised, customs exports, customs duties, imports, customs duties, domestic sales data, sales product data, and loans. Includes data, repayment credit data, inventory data, distribution-related data (quantity of warehouses, geographical distribution of warehouses, storage data of each warehouse, distribution of sales areas), etc.

本出願に係る短期利益の予測方法は、最初に取得された「スモールデータ」に対してＫ−ｍｅａｎｓアルゴリズムによってクラスタリングを行い、その後、回帰アルゴリズムによって予測して予測結果を得て、最後に予測結果に従って融資対象の短期収益性を決定する。各企業のデータリンクをレイアウトする初期段階に関連するデータが少ない場合、銀行などの金融機関は融資企業の短期収益性を正確に予測できないという問題を解決し、融資対象の融資額を比較的正確に限定し、銀行機構の融資リスクを低減することに資する。 In the method of predicting short-term profit according to the present application, clustering is performed on the first acquired "small data" by the K-means algorithm, then the prediction result is obtained by the prediction by the regression algorithm, and finally the prediction result. Determine the short-term profitability of the loan target according to. When there is little data relevant to the initial stages of laying out each company's data links, banks and other financial institutions solve the problem of not being able to accurately predict the short-term profitability of lenders, and the loan amount to be financed is relatively accurate. Contributes to reducing the lending risk of banking organizations.

図３を参照し、本出願の実施例は短期利益の予測装置をさらに提供し、ブロックチェーンから取得された融資対象に関連するデータ量がプリセット量よりも少ない場合に使用される。
本出願において、銀行などの金融機関の運転資金融資は、通常、一時融資、短期融資および中期貸款に分類され、そのうち、短期融資は、期限が通常３ヶ月から１年（３ヶ月を除き、１年を含む）となる運転資金融資である。市場の変化は不規則であるため、歴史データを利用して抽出されたルールは一定の期間において正確であるが、一定の期間が経過すると、その正確性が低下する。予測時間の範囲の長さに応じて、短期予測、中期予測および長期予測の３種類に分けることができる。普通は、予測時間の範囲が短いほど、予測品質が高くなり、逆に、予測結果の精度が低くなる。本出願において、ブロックチェーン上のデータ量がプリセット量よりも少ないことを限定条件として、本方法は、各企業のデータリンクをレイアウトする初期段階で、様々なデータが比較的少ない場合に使用されることが限定され、本出願において「プリセット量よりも少ないデータ量」は、現在の「ビッグデータ」と比較して「スモールデータ」と呼ばれることがある。 With reference to FIG. 3, the embodiments of the present application further provide a short-term profit forecaster and are used when the amount of data associated with the loan object obtained from the blockchain is less than the preset amount.
In this application, working capital loans of financial institutions such as banks are usually classified into temporary loans, short-term loans and medium-term loans, of which short-term loans usually have a maturity of 3 months to 1 year (excluding 3 months, 1). It is a working capital loan that will be (including the year). Due to the irregular market changes, the rules extracted using historical data are accurate over a period of time, but after a period of time their accuracy declines. Depending on the length of the forecast time range, it can be divided into three types: short-term forecast, medium-term forecast, and long-term forecast. Generally, the shorter the prediction time range, the higher the prediction quality, and conversely, the lower the accuracy of the prediction result. In this application, the method is used when various data are relatively small in the initial stage of laying out the data link of each company, provided that the amount of data on the blockchain is smaller than the preset amount. In this application, "amount of data less than the preset amount" may be referred to as "small data" as compared with the current "big data".

上記の予測装置は、
ブロックチェーンから融資対象に関連する第１の関連データを取得するための取得部１０と、
前記第１の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うためのクラスタリング部２０と、
１回目のクラスタリング計算によって取得された各クラスタに対して事前に設定された方法で回帰予測を行い、第１の予測結果を得るための回帰部３０と、
前記第１の予測結果に従って融資対象の短期収益性を決定するための決定部４０と、を含む。 The above prediction device
Acquisition unit 10 for acquiring the first related data related to the loan target from the blockchain,
The clustering unit 20 for inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
Regression unit 30 for performing regression prediction for each cluster acquired by the first clustering calculation by a preset method and obtaining the first prediction result, and
It includes a determination unit 40 for determining the short-term profitability of the loan target according to the first prediction result.

上記の取得部１０において、上記の融資対象は、銀行などの金融機関から融資する必要がある企業または個人である。上記の第１の関連データは、ブロックチェーン上の融資対象に関連する全てのデータであってもよく、指定された要件に従って検索されたデータであってもよく、例えば、異なる企業またはプロジェクトに従って、ブロックチェーン上の異なるデータを取得し、調達代理融資企業を例に挙げて、それは金融機関ブロックデータ、中核企業ブロックデータ、倉庫物流ブロックデータ、ディーラーブロックデータなどを取得できる。 In the acquisition unit 10, the loan target is a company or an individual who needs to make a loan from a financial institution such as a bank. The first relevant data above may be all data related to the loan object on the blockchain or may be data retrieved according to specified requirements, eg, according to different companies or projects. It can acquire different data on the blockchain and, for example, a procurement agency lending company, it can acquire financial institution block data, core company block data, warehouse distribution block data, dealer block data, and so on.

上記のクラスタリング部２０において、上記のＫ−ｍｅａｎｓアルゴリズムは、クラスタの数ｋ、およびｎ個のデータ対象を含むデータベースを入力し、分散の最小標準を満たすｋ個のクラスタを出力するアルゴリズムである。ｋ−ｍｅａｎｓアルゴリズムは入力量ｋを受け入れ、次いでに、取得されたクラスタについて、同じクラスタ内の対象の類似性が高いが、異なるクラスタ内の対象の類似性が低いということを満たすように、ｎ個のデータ対象をｋ個のクラスタに分割する。その原理は次のとおりである：最初にいくつかの中心の位置を設定し、全ての点からこれらの中心までの距離を計算し、次いでにこれらの中心に属する点を見つけ、例えば、Ａ点は、中心１との距離が最も近ければ、１番に属する。１番に属する全ての点を平均して、新しい中心点を取得する。各中心に属する中心点が変更されなくなるまで繰り返し、最終的な中心位置を取得し、データのクラスタリングを完了する。 In the clustering unit 20, the K-means algorithm is an algorithm that inputs a database containing a number of clusters and n data objects and outputs k clusters that satisfy the minimum standard of variance. The k-means algorithm accepts the input amount k and then n so that for the acquired clusters, the similarity of the objects in the same cluster is high, but the similarity of the objects in different clusters is low. Divide the data target into k clusters. The principle is as follows: First set the positions of some centers, calculate the distances from all points to these centers, then find the points that belong to these centers, for example point A. Belongs to No. 1 if the distance to the center 1 is the shortest. A new center point is obtained by averaging all the points belonging to No. 1. Repeat until the center points belonging to each center are not changed, obtain the final center position, and complete the data clustering.

本出願において、上記のクラスタリング部２０の具体的なプロセスは以下のとおりである：
（１）、与えられた、ｎ個のｄ次元のデータ点を含む関連データのデータセット（第１の関連データ）Ｘ＝｛ｘ_１、ｘ_２、…、ｘ_ｎ｝に対して、ここで、ｘ_ｉ∈Ｒ^ｄ、データセットにおけるＫ個の点を初期のクラスタ中心として選択し、各対象は１種類の中心μ_ｋ（ｋ＝１、２、…、Ｋ）を表す。
（２）、各点から中心μ_ｋまでのユークリッド距離を計算し、距離の最も近い基準に従ってそれらを最も類似したクラスタ中心で表されるクラスにそれぞれ割り当てて、Ｋ個のクラスタＣ＝｛ｃ_ｋ、ｋ＝１、２、…、ｋ｝を形成する。各クラスタｃ_ｋは１つのクラスを表す。当該クラスの各点からクラスタ中心μ_ｋまでの距離の二乗和Ｊ（ｃ_ｋ）を計算する。すなわち、

（３）、各クラスのサンプルからその属するクラスのクラスタ中心μｋまでの合計距離の二乗和を最小になるまで計算する。

式おいて、

の場合、クラス内の全ての対象の平均値を当該クラスの新しいクラスタ中心として計算する。
（４）、クラスタ中心と値が変化したかどうかを判断し、変化した場合はステップＳ２２に戻り、変化しなかった場合にクラスタを終了する。 In this application, the specific process of the above clustering unit 20 is as follows:
(1) Here, for a given data set of related data including n d-dimensional data points (first related data) X = {x ₁ , x ₂ , ..., X _n}. , X _i ∈ R ^d , select K points in the dataset as the initial cluster centers, and each object represents one type of center μ _k (k = 1, 2, ..., K).
(2) Calculate the Euclidean distance from each point to the center μ _k , assign them to the classes represented by the most similar cluster centers according to the closest criterion of distance, and assign K clusters C = {c _k. , K = 1, 2, ..., K}. Each cluster _kk represents one class. Calculate the sum of squares J (c _k ) of the distance from each point in the class to the cluster center μ _k. That is,

(3) Calculate the sum of squares of the total distance from the sample of each class to the cluster center μk of the class to which it belongs until it becomes the minimum.

In the formula

In the case of, the average value of all the objects in the class is calculated as the new cluster center of the class.
(4) It is determined whether or not the value has changed from the cluster center, and if it has changed, the process returns to step S22, and if it has not changed, the cluster is terminated.

本出願は、Ｋ−ｍｅａｎｓアルゴリズムを利用してデータのクラスタリングを行い、簡単かつ迅速で、アルゴリズムはスケーラビリティと高効率を維持し、クラスタがガウス分布に近い場合、より良い効果が得られる。
上記の回帰部３０において、上記の回帰予測は、予測の関連性の原則に基づいて、予測目標に影響を与える各要因を見つけて、次いでにこれらの要因と予測目標との間の関数関係の類似表現を見つけて、数学の方法で見つける。上記の第１の予測結果は、１回目のクラスタリング計算により得られた各クラスタを事前に設定された方法の回帰予測によって算出された結果であり、また、上記の第１の関連データが融資対象の関連データであるため、第１の予測結果は、ある程度、融資対象の短期間内の収益性を反映できる。回帰予測の基本的なステップは以下のとおりである、すなわち、（１）予測目標に従って、独立変数および従属変数を決定する。具体的には、予測の特定の目標を決定し、従属変数も決定される。予測の特定の目標が次年度の販売量である場合、販売量Ｙは従属変数である。市場調査と資料調査を通じて、予測目標に関連する影響因子、つまり独立変数を見つけ、その中から主な影響因子を選択する。（２）回帰予測モデルを確立する。具体的には、独立変数および従属変数の履歴統計資料に従って計算し、これに基づいて回帰分析方程式、すなわち回帰予測モデルを確立する。（３）相関分析を行う。具体的には、回帰分析は、因果関係を有する影響因子（独立変数）および予測対象（従属変数）に対して行われる数学的統計分析処理である。確立された回帰方程式は、変数と従属変数との間に関係がある場合のみ意味がある。従って、独立変数としての要因が従属変数としての予測対象に関連するかどうか、どの程度関連するか、およびこのような関連程度を判断する把握性は、回帰分析を行うときに解決する必要がある問題となる。通常、相関分析には相関関係の算出が必要であり、相関係数の大きさに従って独立変数と従属変数との間の関連程度を判断する。（４）回帰予測モデルを検証し、予測誤差を計算する。具体的には、回帰予測モデルが実際の予測に使用できるかどうかは、回帰予測モデルに対する検証および予測誤差への計算によって決められる。回帰方程式は、様々な検証に合格し、予測誤差が小さい場合のみ、回帰方程式を予測モデルとして予測できる。（５）予測値を計算して決定する。具体的には、回帰予測モデルを利用して予測値を計算し、予測値を総合的に分析し、最終的な予測値を決定する。本出願において、まずデータをクラスタリングし、次いでにクラスタリングされた後のデータを回帰予測し、予測速度がより速くなる。 The present application clusters data using the K-means algorithm, which is simple and fast, the algorithm maintains scalability and high efficiency, and better effects are obtained when the clusters are close to a Gaussian distribution.
In the regression section 30, the regression prediction finds each factor influencing the prediction goal based on the principle of prediction relevance, and then finds the functional relationship between these factors and the prediction goal. Find similar expressions and find them in mathematical ways. The above-mentioned first prediction result is a result calculated by regression prediction of each cluster obtained by the first clustering calculation by a preset method, and the above-mentioned first related data is a loan target. Since it is the related data of, the first prediction result can reflect the profitability of the loan target within a short period of time to some extent. The basic steps of regression prediction are as follows: (1) Determine the independent and dependent variables according to the prediction goals. Specifically, it determines the specific goals of the forecast and also determines the dependent variables. If the particular target of the forecast is the sales volume for the next year, the sales volume Y is the dependent variable. Through market research and data research, find the influential factors related to the forecast target, that is, the independent variables, and select the main influential factors from them. (2) Establish a regression prediction model. Specifically, the calculation is performed according to the historical statistical data of the independent variable and the dependent variable, and the regression analysis equation, that is, the regression prediction model is established based on the calculation. (3) Perform correlation analysis. Specifically, regression analysis is a mathematical statistical analysis process performed on influencing factors (independent variables) and prediction targets (dependent variables) having a causal relationship. The established regression equation is meaningful only if there is a relationship between the variable and the dependent variable. Therefore, whether or not the factor as the independent variable is related to the prediction target as the dependent variable, how much it is related, and the graspability to judge such the degree of relation need to be resolved when performing the regression analysis. It becomes a problem. Correlation analysis usually requires the calculation of the correlation, and the degree of association between the independent variable and the dependent variable is determined according to the magnitude of the correlation coefficient. (4) Verify the regression prediction model and calculate the prediction error. Specifically, whether the regression prediction model can be used for actual prediction is determined by verification of the regression prediction model and calculation of prediction error. The regression equation can be predicted as a prediction model only when it passes various verifications and the prediction error is small. (5) Calculate and determine the predicted value. Specifically, the prediction value is calculated using the regression prediction model, the prediction value is comprehensively analyzed, and the final prediction value is determined. In the present application, the data is first clustered, and then the clustered data is regression-predicted to increase the prediction speed.

上記の決定部４０において、第１の予測結果に従って融資対象の短期収益性を決定する。そして、銀行などの金融機関はその収益性に従って上記の融資対象の融資額、つまり上記の融資対象の融資額上限を決定できる。上記の第１の予測結果はレベルを表す数字であってよく、例えば、レベル１〜１０に分けられ、レベルが上がると、融資対象の短期収益性が強くなり、それに応じてその融資額も高くなり、本実施例では、融資額はさらに融資対象の登録資本、市場価値などのデータに関連する。 In the determination unit 40 described above, the short-term profitability of the loan target is determined according to the first prediction result. Then, a financial institution such as a bank can determine the loan amount of the above-mentioned loan target, that is, the upper limit of the loan amount of the above-mentioned loan target according to its profitability. The first prediction result above may be a number representing a level, for example, it is divided into levels 1 to 10, and as the level goes up, the short-term profitability of the loan target becomes stronger, and the loan amount increases accordingly. Therefore, in this embodiment, the loan amount is further related to data such as the registered capital to be loaned and the market value.

図４を参照し、本実施例では、上記の回帰部３０は、
算出された各クラスタを事前に設定されたＳＶＲ予測モデルに入力して回帰予測を行うためのＳＶＲ予測モジュール３１を含む。 With reference to FIG. 4, in this embodiment, the regression unit 30 is
It includes an SVR prediction module 31 for inputting each calculated cluster into a preset SVR prediction model to perform regression prediction.

上記のＳＶＲ予測モジュール３１において、上記のＳＶＲ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＲｅｇｒｅｓｓｉｏｎ、サポートベクトル回帰）は、サポートベクターマシン（ＳＶＭ）の重要な応用ブランチである。本実施例では、目的関数を最小化することによって回帰関数を決定し、回帰関数はｆ（ｘ）＝ｗｘ＋ｂである。その具体的なプロセスは次のとおりである：

２００２年に提出されたｖ−ＳＶＣと同様に、e^T(α＋α^*)≦Cvの不等式は等式で置き換えることができる。しかも、ユーザーはしばしば C＝１に類似した小さな定数を選択するため、C/lが小さすぎる。従って、ＬＩＢＳＶＭでは、ユーザーが指定したパラメータをC/lとし、つまり、

ε-SVRがパラメータ

の下で取得された解は、v-SVRがパラメータ

の下で取得された解と同様である。 In the SVR prediction module 31, the SVR (Support Vector Regression) is an important application branch of a support vector machine (SVM). In this embodiment, the regression function is determined by minimizing the objective function, and the regression function is f (x) = wx + b. The specific process is as follows:

Similar to the v-SVC submitted in 2002, ^{the inequality of e T} (α + α ^* ) ≤ Cv can be replaced by the equation. Moreover, C / l is too small because users often choose small constants similar to C = 1. Therefore, in LIBSVM, the parameter specified by the user is C / l, that is,

Is specified by the user and LIBSVM solves the following problems:

ε-SVR is a parameter

The solution obtained under is a parameter of v-SVR

Similar to the solution obtained under.

上式において、ｌはトレーニングサンプルの数であり、ここでｌ＝ｋ、Ｃは平衡モデルの複雑さ（１／２）ｗ^Ｔｗとトレーニング誤差項の重みパラメータであり、εは不感損失関数であり、ζは緩和因子である。Ｋ（ｘ_ｉ、ｘ）はカーネル関数である。 In the above equation, l is the number of training samples, where l = k, C is the equilibrium model complexity (1/2) w ^T w and the weight parameter of the training error term, and ε is the dead loss function. Yes, ζ is a relaxation factor. K (x _i , x) is a kernel function.

上記のＳＶＲ（サポートベクトル回帰アルゴリズム）は主に、クラスタリング結果を次元上げして、高次元空間で線形決定関数を構築することによって線形回帰を実現し、ｅ不感損失関数を使用する場合、その基本は主にｅ不感損失関数およびカーネル関数アルゴリズムである。フィッティングした数学モデルが多次元空間でのある曲線を表す場合、ｅ不感損失関数から得られた結果は、当該曲線およびトレーニング点の「ｅパイプ」を含む。全てのサンプル点のうち、「パイプ壁」に分布するサンプル点の部分のみによってパイプの位置を決定する。トレーニングサンプルのこの部分は、「サポートベクトル」と呼ばれる。トレーニングサンプル集合の非線形に対応するために、従来のフィッティング方法では通常、線形方程式の後に高次項を追加する。この方法は効果的であるが、調整可能なパラメータを増やすとオーバーフィッティングのリスクが高まる。ＳＶＲはカーネル関数を採用することによってこの矛盾を解決する。線形方程式中の線形項をカーネル関数で置き換えると、元の線形アルゴリズムを「非線形化」にすることができ、つまり、非線形回帰を実行できる。同時に、カーネル関数の導入は「次元上げ」の目的を達成し、増加した調整可能なパラメータはオーバーフィッティングでも制御されることができる。本出願では、成熟した技術を備えたＳＶＲアルゴリズムが使用され、計算結果は信頼でき、さらに正確な予測の効果を達成できる。 The above SVR (Support Vector Regression Algorithm) mainly realizes linear regression by increasing the size of the clustering result and constructing a linear decision function in a high-dimensional space, and when using the e-dead loss function, the basics. Are mainly e dead loss functions and kernel function algorithms. If the fitted mathematical model represents a curve in multidimensional space, the results obtained from the e-dead loss function include the "e-pipe" of the curve and training points. Of all the sample points, the position of the pipe is determined only by the part of the sample points distributed on the "pipe wall". This part of the training sample is called the "support vector". To accommodate the non-linearity of the training sample set, traditional fitting methods usually add a higher order term after the linear equations. While this method is effective, increasing the adjustable parameters increases the risk of overfitting. SVR resolves this contradiction by adopting kernel functions. Replacing the linear terms in a linear equation with a kernel function can make the original linear algorithm "non-linear", that is, perform non-linear regression. At the same time, the introduction of kernel functions achieves the purpose of "elevation", and the increased adjustable parameters can also be controlled by overfitting. In this application, an SVR algorithm with mature technology is used, the calculation result is reliable, and the effect of more accurate prediction can be achieved.

図５を参照し、一実施例において、上記のクラスタリング部２０は、
前記第１の関連データに対して特徴抽出を行うための抽出モジュール２１と、
抽出された特徴データに対して関連性分析を行い、他の特徴データに関連しない無相関特徴データを得るための分析モジュール２２と、
前記第１の関連データにおいて前記無相関特徴データに対応する第１の関連データをクリアした後、Ｋ−ｍｅａｎｓアルゴリズムに入力し、１回目のクラスタリング計算を行うためのクラスタリングモジュール２３と、を含む。 With reference to FIG. 5, in one embodiment, the clustering unit 20 described above
An extraction module 21 for performing feature extraction on the first related data, and
An analysis module 22 for performing relevance analysis on the extracted feature data and obtaining uncorrelated feature data not related to other feature data.
After clearing the first related data corresponding to the uncorrelated feature data in the first related data, the clustering module 23 for inputting to the K-means algorithm and performing the first clustering calculation is included.

上記の抽出モジュール２１、分析モジュール２２およびクラスタリングモジュール２３において、上記の融資対象に関連する第１の関連データに対して特徴抽出を行い、関連性分析を行いて特徴データにおける他の特徴データに関連しない無相関特徴データを見つけて、次いでに、これらの無相関特徴データに対応する第１の関連データを第１の関連データから削除し、残された第１の関連データを使用してクラスタリング計算し、得られたクラスタがより正確になり、無相関特徴データに対応する第１の関連データが削除されるため、クラスタリング計算の効率を向上させる。本実施例では、第１の関連データに対して特徴抽出を行う方法は、具体的に、Ｒｅｌｉｅｆアルゴリズム（Ｒｅｌｉｅｆアルゴリズムは特徴重みアルゴリズム（Ｆｅａｔｕｒｅｗｅｉｇｈｔｉｎｇａｌｇｏｒｉｔｈｍｓである）であり、各特徴およびクラスの関連性に従って特徴の異なる重みを割り当て、重みが特定の閾値より小さい特徴は削除される）を使用して特徴抽出を行う。Ｒｅｌｉｅｆアルゴリズムは、トレーニング集合ＤからサンプルＲをランダムに選択し、そして、Ｒと同じクラスに属するサンプルからＮｅａｒＨｉｔと呼ばれる最近傍サンプルＨを検索し、Ｒと異なるクラスに属するサンプルからＮｅａｒＭｉｓｓと呼ばれる最近傍サンプルＭを検索し、その後、次のルールに従って各特徴の重みを更新する。すなわち、特定の特徴でＲとＮｅａｒＨｉｔとの間の距離がＲとＮｅａｒＭｉｓｓとの間の距離より小さい場合、当該特徴が同じクラスと異なるクラスの最近傍を区別することに役立つことが示され、当該特徴の重みを増やす。逆に、特定の特徴でＲとＮｅａｒＨｉｔとの間の距離がＲとＮｅａｒＭｉｓｓとの間の距離より大きい場合、当該特徴が同じクラスと異なるクラスの最近傍を区別することにマイナスの影響を与えることが示され、当該特徴の重みを減らす。上記のプロセスをｍ回繰り返し、最後に各特徴の平均重みを取得する。特徴の重みが大きいほど、当該特徴の分類能力が強くなり、逆に、当該特徴の分類能力が弱くなる。Ｒｅｌｉｅｆアルゴリズムの実行時間は、サンプルのサンプリング回数ｍおよび元の特徴の数Ｎの増加につれて線形増加するため、実行効率が非常に高くなる。具体的なアルゴリズムは既に方法の実施例で説明されているため、再度の説明を省略する。 In the extraction module 21, the analysis module 22, and the clustering module 23, feature extraction is performed on the first related data related to the loan target, and relevance analysis is performed to relate to other feature data in the feature data. No uncorrelated feature data is found, then the first related data corresponding to these uncorrelated feature data is deleted from the first related data, and the remaining first related data is used for clustering calculation. However, the resulting cluster becomes more accurate and the first relevant data corresponding to the uncorrelated feature data is deleted, thus improving the efficiency of the clustering calculation. In this embodiment, the method of performing feature extraction on the first related data is specifically a Relief algorithm (the Relief algorithm is a feature weighting algorithm), and the relevance of each feature and class. Features with different weights are assigned according to the above, and features whose weights are smaller than a specific threshold are deleted). The Near miss algorithm randomly selects sample R from the training set D, searches for the nearest neighbor sample H called Near Hit from samples belonging to the same class as R, and recently called Near Miss from samples belonging to a class different from R. The near-sample M is searched, and then the weight of each feature is updated according to the following rule. That is, it has been shown that when the distance between R and Near Hit is smaller than the distance between R and Near Miss for a particular feature, the feature helps to distinguish the nearest neighbors of the same class and different classes. , Increase the weight of the feature. Conversely, if the distance between R and Near Hit is greater than the distance between R and Near Miss for a particular feature, it will have a negative effect on distinguishing the nearest neighbors of the same class and different classes. Shown to give and reduce the weight of the feature. The above process is repeated m times, and finally the average weight of each feature is obtained. The heavier the weight of a feature, the stronger the classification ability of the feature, and conversely, the weaker the classification ability of the feature. Since the execution time of the Relief algorithm increases linearly as the number of sampling times m of the sample and the number N of the original features increase, the execution efficiency becomes very high. Since the specific algorithm has already been described in the embodiment of the method, the description thereof will be omitted again.

一実施例において、上記の分析モジュール２２は、前記特徴データを散布図として作成し、前記散布図において離散点に対応する特徴データを前記無相関特徴データとして記録するための視覚分析サブモジュールを含む。
上記の視覚分析サブモジュールにおいて、上記の散布図（ｓｃａｔｔｅｒｄｉａｇｒａｍ）は、回帰分析においてデカルト座標系平面上のデータ点の分布を指し、通常はクラス間の集計データを比較するために使用される。散布図に含まれるデータが多いほど、比較する効果がよりよくなる。本実施例において、上記の特徴データは一般に行列であり、この場合、散布図行列を利用して各独立変数間の散布図を同時に描くことができ、こうして複数の変数間の主な関連性を迅速に見つけることができる。上記の特徴データを散布図に作成するプロセスは即ち視覚化のプロセスであり、特徴データが視覚化されるため、肉眼でグラフまたは画像上の離散点の存在を直感的に識別し、そして離散点を選択することができ、コンピューターデバイスは、選択された離散点に対応する特徴データを無相関特徴データとして記録する。 In one embodiment, the analysis module 22 includes a visual analysis submodule for creating the feature data as a scatter plot and recording the feature data corresponding to discrete points in the scatter plot as the uncorrelated feature data. ..
In the above visual analysis submodule, the above scatter diagram refers to the distribution of data points on the Cartesian coordinate system plane in regression analysis and is typically used to compare aggregated data between classes. The more data contained in the scatter plot, the better the comparison effect. In this embodiment, the above feature data is generally a matrix, in which case a scatter plot matrix can be used to draw a scatter plot between each independent variable at the same time, thus showing the main relationships between the plurality of variables. Can be found quickly. The process of creating the above feature data in a scatter plot is the process of visualization, which visualizes the feature data so that the presence of discrete points on the graph or image can be intuitively identified with the naked eye, and the discrete points The computer device records the feature data corresponding to the selected discrete points as uncorrelated feature data.

別の実施例において、上記の分析モジュール２２は、前記特徴データに対して関連行列分析を行い、他の特徴データに関連しない前記無相関特徴データを抽出するための行列分析サブモジュールを含む。
上記の行列分析サブモジュールにおいて、上記の関連行列は、相関係数行列とも呼ばれ、行列の各列間の相関係数から構成される。つまり、関連行列のｉ行目のｊ列目の要素は、元の行列のｉ列目とｊ列目の相関係数である。本実施例において、一般に共分散行列を用いて分析し、共分散は、２つの変数の全体誤差を測定するために使用され、２つの変数の変化傾向が一致する場合、共分散は正の値であり、２つの変数が正の相関であることが示される。２つの変数が反対方向に変化する場合、共分散は負の値であり、２つの変数が負の相関であることが示される。２つの変数が互いに独立している場合、共分散は０であり、２つの変数が無関係であることが示され、変数が３組以上である場合、対応する共分散行列が使用される。 In another embodiment, the analysis module 22 includes a matrix analysis submodule for performing a related matrix analysis on the feature data and extracting the uncorrelated feature data that is not related to the other feature data.
In the above matrix analysis submodule, the above related matrix is also called a correlation coefficient matrix and is composed of the correlation coefficient between each column of the matrix. That is, the element of the i-th row and the j-th column of the related matrix is the correlation coefficient between the i-th column and the j-th column of the original matrix. In this example, the covariance is generally analyzed using a covariance matrix, the covariance is used to measure the overall error of the two variables, and if the change trends of the two variables match, the covariance is a positive value. It is shown that the two variables are positively correlated. If the two variables change in opposite directions, the covariance is a negative value, indicating that the two variables are negatively correlated. If the two variables are independent of each other, the covariance is 0, indicating that the two variables are unrelated, and if there are three or more sets of variables, the corresponding covariance matrix is used.

図６を参照し、本実施例では、上記の短期利益の予測装置は、
非ブロックチェーン上の前記融資対象に関連する第２の関連データを取得するためのデータ取得部５０と、
前記第２の関連データをＫ−ｍｅａｎｓアルゴリズムに入力し、２回目のクラスタリング計算を行うためのデータクラスタリング部６０と、
２回目のクラスタリング計算によって得られた各クラスタに対して事前に設定された方法で回帰予測を行い、第２の予測結果を取得するためのクラスタリング回帰部７０と、
前記第１の予測結果と前記第２の予測結果との差が事前に設定された閾値よりも小さいかどうかを判断するための比較部８０と、
前記差が前記閾値よりも小さい場合、前記第１の予測結果に従って融資対象の短期収益性を決定した結果は使用可能な結果であると判断するための判定部９０と、をさらに含む。 With reference to FIG. 6, in this embodiment, the above-mentioned short-term profit forecasting device is
A data acquisition unit 50 for acquiring a second related data related to the loan target on the non-blockchain, and a data acquisition unit 50.
The data clustering unit 60 for inputting the second related data into the K-means algorithm and performing the second clustering calculation, and
A clustering regression unit 70 for performing regression prediction for each cluster obtained by the second clustering calculation by a preset method and acquiring a second prediction result, and
A comparison unit 80 for determining whether the difference between the first prediction result and the second prediction result is smaller than a preset threshold value, and
When the difference is smaller than the threshold value, a determination unit 90 for determining that the result of determining the short-term profitability of the loan target according to the first prediction result is a usable result is further included.

上記の非ブロックチェーン上の第２の関連データとは、ブロックチェーンに記録されていないデータ、通常はビッグデータネットワーク内のデータを指す。第２の関連データのクラスタリングアルゴリズムおよび回帰予測方法は、上記の第１の関連データと同一であり、ここで再度の説明を省略する。本実施例では、第１の関連データに従って得られた第１の予測結果を第２の関連データに従って得られた第２の予測結果と比較し、つまり、第１の予測結果が利用可能かどうかを判断するための検証ステップを設定する。本出願において、主にブロックチェーンをレイアウトする初期段階を狙うため、各企業の歴史データの多くは、企業自身のサーバーや企業に関連する他の企業のサーバーなどのビッグデータのインターネット上に存在し、インターネット環境にある限り、入手することが可能である。本ステップにおいて、主にインターネット上の「ビッグデータ」を利用して得られた第２の予測結果によって、ブロックチェーン上の「スモールデータ」を利用して得られた第１の予測結果を検証し、第２の予測結果と第１の予測結果との差が事前に設定された閾値よりも小さい場合のみ、第１の予測結果が実質的に正しく、使用できると判断する。 The second relevant data on the non-blockchain mentioned above refers to data not recorded on the blockchain, usually data in a big data network. The clustering algorithm and regression prediction method of the second related data are the same as those of the first related data described above, and the description thereof will be omitted again here. In this embodiment, the first prediction result obtained according to the first related data is compared with the second prediction result obtained according to the second related data, that is, whether the first prediction result is available. Set up verification steps to determine. In this application, most of the historical data of each company exists on the big data Internet such as the server of the company itself or the server of other companies related to the company, mainly aiming at the initial stage of laying out the blockchain. , It is possible to obtain it as long as it is in the Internet environment. In this step, the first prediction result obtained by using "small data" on the blockchain is verified by the second prediction result obtained mainly by using "big data" on the Internet. , It is determined that the first prediction result is substantially correct and can be used only when the difference between the second prediction result and the first prediction result is smaller than the preset threshold value.

一実施例において、上記の短期利益の予測装置は、
前記第１の関連データのデータ量が事前に設定されたデータ閾値よりも大きいかどうかを判断するための判断部と、
前記第１の関連データを事前に設定されたビッグデータに基づく予測アルゴリズムに入力して予測するための切替部と、をさらに含む。 In one embodiment, the short-term profit forecaster described above
A determination unit for determining whether the amount of the first related data is larger than a preset data threshold, and a determination unit.
It further includes a switching unit for inputting the first related data into a prediction algorithm based on preset big data and making a prediction.

上記の判断部および切替部において、データ閾値が設定され、取得された第１の関連データのデータ量がデータ閾値よりも大きい場合、上記の短期利益の予測装置が適用される「スモールデータ」の範囲から逸脱しているため、その後のクラスタリング、回帰予測などの予測プロセスを停止して、予測方法を切り替える。具体的な切り替え方法は、取得された第１の関連データを、ＴＤ−ＡＢＣモデルに基づく企業利益モデルなどの事前に設定された既存の比較的に成熟した予測モデルに入力してよい。 When a data threshold is set in the above-mentioned determination unit and switching unit and the amount of acquired first related data is larger than the data threshold, the above-mentioned short-term profit forecasting device is applied to the "small data". Since it is out of range, the prediction process such as subsequent clustering and regression prediction is stopped and the prediction method is switched. As a specific switching method, the acquired first relevant data may be input to an existing relatively mature forecast model set in advance, such as a corporate profit model based on the TD-ABC model.

一実施例において、上記の短期利益の予測装置は、以下をさらに含む。
不正分析部であって、上記の第１の関連データには不正データが含まれるかどうかを分析するために使用され、具体的な方法として、取得された第１の関連データに対して特徴抽出を行い、特徴データを得て、前記特徴データから他の特徴データに関連しない無相関特徴データを抽出して、次いでにＶｏｒｏｎｏｉアルゴリズムによって前記無相関特徴データに対して外れ値の認識を行い、不正データを得る。不正データの量によって、融資対象の評判値を分析できる。そして、評判値と短期収益性に基づいて、融資対象の融資額を決定する。 In one embodiment, the short-term profit forecaster described above further includes:
The fraud analysis unit is used to analyze whether the first related data described above contains fraudulent data, and as a specific method, feature extraction is performed on the acquired first related data. Is performed, feature data is obtained, uncorrelated feature data not related to other feature data is extracted from the feature data, and then outliers are recognized for the uncorrelated feature data by the Voronoi algorithm, which is invalid. Get the data. The amount of fraudulent data can be used to analyze the reputation value of a loan target. Then, the loan amount to be loaned is determined based on the reputation value and short-term profitability.

特定の実施例において、企業ａは、銀行Ｐから融資する必要があり、銀行Ｐは、企業ａを評価する必要があり、その評価のプロセスは次のとおりである：１、ブロックチェーンから企業ａの販売データ、生産データ、財務データなど、当該企業ａに関連する全てのデータを収集する。その後、取得されたデータに対して特徴抽出を行い、不要なデータを事前に削除し、後続のクラスタリング計算の速度および効率を高める。具体的な削除方法は、最初に抽出されたデータを散布図として視覚的に形成し、その後、散布図中の離散点を削除する。２、ブロックチェーンから取得された企業ａのデータに対してＫ−ｍｅａｎｓアルゴリズムによってクラスタリング計算を行う。３、クラスタリング計算の結果に対してＳＶＲ回帰予測を行い、さらに当該企業ａの収益性などの結果を得る。４、上記の不正データの認識方法を通じて企業ａの信用などをさらに判断する。５、銀行Ｐは、企業ａの信用、収益性などに従って、企業ａに融資できるかどうか、および最大融資限度などを決定する。具体的には、企業ａの信用がプリセット値よりも小さい場合、企業ａへの融資が拒否される。企業ａの信用がプリセット値である場合、企業ａに融資でき、この場合、当該企業ａの収益性と組み合わせて、最大の融資限度などを計算することによって、リスクを回避する銀行Ｐの能力が効果的に向上させる。具体的には、取得された企業ａのデータリンク上のデータは、調達商品の種類や当該調達資金のデータ、税関輸出品、関税、輸入品、関税、国内販売データ、販売製品データ、融資データ、返済信用データ、在庫データ、物流関連データ（倉庫の数量、倉庫の地理的分布、各倉庫の保管データ、販売地域の分布）などを含む。 In a particular embodiment, the company a needs to finance from the bank P, the bank P needs to evaluate the company a, and the evaluation process is as follows: 1, from the blockchain to the company a. Collect all data related to the company a, such as sales data, production data, and financial data. After that, feature extraction is performed on the acquired data, unnecessary data is deleted in advance, and the speed and efficiency of subsequent clustering calculations are increased. The specific deletion method is to visually form the first extracted data as a scatter plot, and then delete the discrete points in the scatter plot. 2. Clustering calculation is performed on the data of company a acquired from the blockchain by the K-means algorithm. 3. SVR regression prediction is performed on the result of the clustering calculation, and the profitability of the company a is obtained. 4. Further judge the credit of the company a through the above-mentioned recognition method of fraudulent data. 5. Bank P determines whether or not it can lend to company a and the maximum loan limit, etc., according to the credit, profitability, etc. of company a. Specifically, if the credit of the company a is smaller than the preset value, the loan to the company a is refused. If the credit of the company a is a preset value, the loan can be made to the company a. In this case, the bank P's ability to avoid the risk by calculating the maximum loan limit in combination with the profitability of the company a is available. Effectively improve. Specifically, the data on the acquired company a data link includes data on the type of procured product, data on the procured funds, customs exports, customs duties, imported goods, customs duties, domestic sales data, sales product data, and loan data. , Repayment credit data, inventory data, distribution-related data (quantity of warehouses, geographical distribution of warehouses, storage data of each warehouse, distribution of sales areas), etc.

本出願に係る短期利益の予測装置は、最初に取得された「スモールデータ」に対してＫ−ｍｅａｎｓアルゴリズムによってクラスタリングを行い、その後、回帰アルゴリズムによって予測して予測結果を得て、最後に予測結果に従って融資対象の短期収益性を決定する。各企業のデータリンクをレイアウトする初期段階に関連するデータが少ない場合、銀行などの金融機関は融資企業の短期収益性を正確に予測できないという問題を解決し、融資対象の融資額を比較的正確に限定し、銀行機構の融資リスクを低減することに資する。 The short-term profit prediction device according to the present application clusters the first acquired "small data" by the K-means algorithm, then predicts by the regression algorithm to obtain the prediction result, and finally the prediction result. Determine the short-term profitability of the loan target according to. When there is little data relevant to the initial stages of laying out each company's data links, banks and other financial institutions solve the problem of not being able to accurately predict the short-term profitability of lenders, and the loan amount to be financed is relatively accurate. Contributes to reducing the lending risk of banking organizations.

図７を参照し、本発明の実施例は、コンピューターデバイスをさらに提供し、当該コンピューターデバイスはサーバーであってよく、その内部構造は図７に示すとおりである。当該コンピューターデバイスは、システムバスを介して接続されたプロセッサと、メモリと、ネットワークインターフェースと、データベースと、を含む。ここで、当該コンピューターにおけるプロセッサは、計算および制御能力を提供するために使用される。当該コンピューターデバイスのメモリは、不揮発性記憶媒体と、内部メモリとを含む。当該不揮発性記憶媒体には、オペレーティングシステムと、コンピューター読み取り可能な命令、およびデータベースが記憶されている。当該内部メモリは、不揮発性記憶媒体におけるオペレーティングシステムおよびコンピューター読み取り可能な命令に対して動作環境を提供する。当該コンピューターデバイスのデータベースは、取得された第１の関連データ、第２の関連データ、およびＫ−ｍｅａｎｓアルゴリズムモデルなどのデータを記憶するために使用される。当該コンピューターデバイスのネットワークインターフェースは、ネットワーク接続を介して外部端末と通信するために使用される。当該コンピューター読み取り可能な命令は、上記の各方法の実施例のフローを実現するために、プロセッサによって実行される。 With reference to FIG. 7, an embodiment of the present invention further provides a computer device, which computer device may be a server, the internal structure of which is as shown in FIG. The computer device includes a processor connected via a system bus, a memory, a network interface, and a database. Here, the processor in the computer is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an operating environment for operating system and computer readable instructions in non-volatile storage media. The database of the computer device is used to store the acquired first related data, the second related data, and data such as the K-means algorithm model. The network interface of the computer device is used to communicate with an external terminal over a network connection. The computer-readable instruction is executed by the processor in order to realize the flow of the embodiment of each of the above methods.

本発明の一実施例は、コンピューター読み取り可能な命令が記憶される不揮発性コンピューター読み取り可能な記憶媒体をさらに提供し、コンピューター読み取り可能な命令がプロセッサによって実行されるときに、上記の各方法の実施例のフローを実現する。
上記の説明は本出願の好適な実施例に過ぎず、本出願の特許範囲を限定するものではなく、本出願の明細書および図面の内容によってなされる同等の構造または同等のプロセス変換、或いは、他の関連する技術分野に直接または間接的に適用されるものは、いずれも本出願の特許請求の範囲に含まれる。 An embodiment of the present invention further provides a non-volatile computer-readable storage medium in which computer-readable instructions are stored, and when computer-readable instructions are executed by the processor, the above methods are performed. Realize the example flow.
The above description is merely a preferred embodiment of the present application and does not limit the scope of the claims of the present application. Anything that applies directly or indirectly to other relevant arts is within the scope of the claims of this application.

１０取得部
２０クラスタリング部
２１抽出モジュール
２２分析モジュール
２３クラスタリングモジュール
３０回帰部
３１ＳＶＲ予測モジュール
４０決定部
５０データ取得部
６０データクラスタリング部
７０クラスタリング回帰部
８０比較部
９０判定部 10 Acquisition unit 20 Clustering unit 21 Extraction module 22 Analysis module 23 Clustering module 30 Regression unit 31 SVR prediction module 40 Decision unit 50 Data acquisition unit 60 Data clustering unit 70 Clustering regression unit 80 Comparison unit 90 Judgment unit

Claims

Used when the amount of data related to the loan target obtained from the blockchain is less than the preset amount,
The step of getting the first relevant data related to the loan target from the blockchain,
The step of inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
A step of performing regression prediction for each cluster acquired by the first clustering calculation by a preset method and obtaining the first prediction result, and
A method for forecasting short-term profit, which comprises a step of determining short-term profitability of a loan target according to the first forecast result.

The step of performing regression prediction by a preset method for each cluster acquired by the first clustering calculation described above is
The method for predicting short-term profit according to claim 1, further comprising a step of inputting each calculated cluster into a preset SVR prediction model to perform regression prediction.

The step of inputting the first related data into the K-means algorithm and performing the first clustering calculation is
The step of performing feature extraction on the first related data and
A step of performing a relevance analysis on the extracted feature data and obtaining uncorrelated feature data that is not related to other feature data in the feature data.
The first related data includes a step of clearing the target data corresponding to the uncorrelated feature data, inputting the target data into the K-means algorithm, and performing the first clustering calculation. How to forecast short-term profits described in.

The step of performing a relevance analysis on the extracted feature data and obtaining uncorrelated feature data not related to other feature data is
The method for predicting short-term profit according to claim 3, further comprising a step of creating the feature data as a scatter plot and recording the feature data corresponding to discrete points in the scatter plot as the uncorrelated feature data.

The step of performing a relevance analysis on the extracted feature data and obtaining uncorrelated feature data not related to other feature data is
The method for predicting short-term profit according to claim 3, further comprising a step of performing a related matrix analysis on the feature data and extracting the uncorrelated feature data that is not related to other feature data.

After the step of determining the short-term profitability of the loan target according to the first forecast result,
The step of acquiring the second related data related to the loan target on the non-blockchain, and
The step of inputting the second related data into the K-means algorithm and performing the second clustering calculation, and
A step of performing regression prediction for each cluster obtained by the second clustering calculation by a preset method and acquiring a second prediction result, and
A step of determining whether the difference between the first prediction result and the second prediction result is smaller than a preset threshold value, and
Claim 1 is characterized by comprising a step of determining that the result of determining the short-term profitability of the loan target according to the first prediction result is a usable result when the difference is smaller than the threshold value. How to forecast short-term profits described in.

Before the step of inputting the first related data into the K-means algorithm and performing the first clustering calculation,
A step of determining whether the amount of data of the first related data is larger than a preset data threshold, and
If so, the short-term profit prediction method according to claim 1, further comprising a step of inputting the first related data into a prediction algorithm based on preset big data to make a prediction. ..

Used when the amount of data related to the loan target obtained from the blockchain is less than the preset amount,
An acquisition method for acquiring the first related data related to the loan target from the blockchain,
A clustering means for inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
Regression means for obtaining the first prediction result by performing regression prediction for each cluster acquired by the first clustering calculation by a preset method, and
A short-term profit forecasting device comprising a determination means for determining the short-term profitability of a loan target according to the first forecast result.

The regression means
The short-term profit forecasting apparatus according to claim 8, further comprising an SVR forecasting module for inputting each calculated cluster into a preset SVR forecasting model to perform regression forecasting.

The clustering means
An extraction module for performing feature extraction on the first related data,
An analysis module for performing relevance analysis on the extracted feature data and obtaining uncorrelated feature data that is not related to other feature data,
After clearing the first related data corresponding to the uncorrelated feature data in the first related data, the clustering module for inputting to the K-means algorithm and performing the first clustering calculation is included. The short-term profit forecasting device according to claim 8, wherein the short-term profit forecasting device is characterized.

The analysis module
The short-term profit forecast according to claim 10, wherein a matrix analysis submodule for performing a related matrix analysis on the feature data and extracting the uncorrelated feature data not related to other feature data is included. apparatus.

Relevance analysis is performed on the extracted feature data to obtain uncorrelated feature data that is not related to other feature data.
The short-term profit forecasting apparatus according to claim 10, further comprising performing a relational matrix analysis on the feature data and extracting the uncorrelated feature data that is not related to other feature data.

An acquisition method for acquiring the first related data related to the loan target from the blockchain,
A clustering means for inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
Regression means for obtaining the first prediction result by performing regression prediction for each cluster acquired by the first clustering calculation by a preset method, and
A computer device comprising: a determination means for determining the short-term profitability of a loan object according to the first prediction result.

The function to acquire the first related data related to the loan target from the blockchain, and
The function of inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
A function to perform regression prediction for each cluster acquired by the first clustering calculation by a preset method and obtain the first prediction result, and
A program in which a computer executes a function of determining the short-term profitability of a loan target according to the first prediction result.

The function to acquire the first related data related to the loan target from the blockchain, and
The function of inputting the first related data into the K-means algorithm and performing the first clustering calculation, and
A function to perform regression prediction for each cluster acquired by the first clustering calculation by a preset method and obtain the first prediction result, and
A readable storage medium that stores a program that is executed by a computer and has a function of determining short-term profitability of a loan target according to the first prediction result.