CN113177596A

CN113177596A - Block chain address classification method and device

Info

Publication number: CN113177596A
Application number: CN202110480968.9A
Authority: CN
Inventors: 穆长春; 吕远; 卿苏德; 王艳辉; 吴浩; 刘睿
Original assignee: Digital Currency Institute of the Peoples Bank of China
Current assignee: Digital Currency Institute of the Peoples Bank of China
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-07-27
Anticipated expiration: 2041-04-30
Also published as: CN113177596B

Abstract

The invention discloses a block chain address classification method and device, and relates to the technical field of computers. One embodiment of the method comprises: iteratively training the selected classifier by utilizing a set of first block chain address samples, selecting one classifier in each iteration to train by taking the weighted average error rate minimization as a target, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and calculating the sample weight of the first block chain address samples in the current iteration based on the initial value of the sample weight or the sample weight in the last iteration; and determining a block chain address classification model based on each classifier after iterative training and the weight of the classifier so as to determine the class of the block chain address to be classified. The implementation method can improve the accuracy and reliability of block chain address classification, and overcomes the defects of error transmission problem, insufficient algorithm generalization, high requirements on hardware resources and time cost and the like in the prior art.

Description

Block chain address classification method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for block chain address classification.

Background

Currently, in the common classification method of blockchain addresses, the common input heuristic classification method is based on an important assumption that if a transaction has multiple input addresses at the same time, the input addresses are considered to be from the same entity, and in short, different addresses transferred to the same object are considered to be from one entity. When the basic assumption is not true, such as the presence of input from other entities in the same transaction, or the occurrence of a single input, the address labeling result will no longer be reliable and the labeling of the initial address label must be accurate, otherwise the results from subsequent irradiation will all be unreliable. The heuristic identification method of the change-giving address is a classical classification method aiming at a UTXO (unconsumed transaction output) model data structure. Once the UTXO is created it cannot be split so that "change" is often made in blockchain transactions where the "money" paid out by the payer (typically in cryptocurrency) is greater than the "money" billed by the recipient and the excess "money" is transferred to the change address. The algorithm firstly needs to judge whether the change-making action exists or not, and then judges which address is the change-making address, so that the error transmission problem exists, the dependency of the algorithm on rules or assumptions is too high, the generalization of the algorithm is insufficient, the algorithm needs to go through multiple traversals, and the requirements on hardware resources and time cost are high. The existing linear decision boundary classifier has good performance in a plurality of application scenes, but has unsatisfactory effect in block chain address classification, and has the problem of low classification accuracy caused by factors such as model misarrangement, constant sample weight, uneven distribution of each label sample and the like.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the block chain address classification method has the defects of low accuracy and poor reliability, error transmission, insufficient algorithm generalization, high requirements on hardware resources and time cost and the like.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for classifying a blockchain address, which can improve accuracy and reliability of blockchain address classification, and overcome the defects of the prior art, such as error propagation problem, insufficient algorithm generalization, and high requirements on hardware resources and time cost.

To achieve the above object, according to an aspect of an embodiment of the present invention, a method for sorting a block chain address is provided.

A method for block chain address classification, comprising: determining a set of first blockchain address samples, each of the first blockchain address samples including a characteristic representing a first number of blockchain addresses; iteratively training a selected classifier using the set of first blockchain address samples, wherein, for each iteration: selecting a classifier to train with the goal of minimizing weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the previous iteration; and determining a block chain address classification model based on each classifier after iterative training and the corresponding classifier weight, wherein the block chain address classification model is used for determining the category of the block chain address to be classified.

Optionally, the determining a set of first blockchain address samples comprises: obtaining a set of second blockchain address samples, each of the second blockchain address samples comprising a characteristic representing a second number of blockchain addresses, the second number being greater than or equal to the first number; performing a plurality of rounds of feature screening to select the first number of features from the second number of features, and constructing the corresponding first blockchain address sample according to the selected first number of features in each second blockchain address sample to determine the set of first blockchain address samples.

Optionally, the performing multiple rounds of feature screening to select the first number of features from the second number of features includes performing the following steps in each round of feature screening: training a simplified version classifier by using single features in a feature set to be screened, wherein the simplified version classifier corresponds to the features in the feature set to be screened one by one, and the feature set to be screened is a set formed by features which are not selected in the second number of features; calculating the error rate of each simplified version classifier to obtain the minimum error rate; if the minimum error rate is smaller than or equal to a preset threshold value, selecting the features used by the simplified version classifier corresponding to the minimum error rate, and updating the feature set to be screened; if the minimum error rate is greater than the preset threshold, ending the multi-round feature screening process to obtain the first number of features.

Optionally, the error rate of the simplified version classifier is calculated by: calculating a predicted value and a label value of each training sample corresponding to the simplified version classifier, wherein the training samples comprise the single feature used for training the simplified version classifier; and processing the absolute value of the difference between the predicted value and the label value of each training sample through a symbol function to obtain corresponding first symbol function values, and performing weighted summation on each first symbol function value according to the corresponding weight of the corresponding training sample in the round of screening to obtain the error rate of the simplified version classifier.

Optionally, if the minimum error rate is less than or equal to a preset threshold, after selecting the features used by the simplified version classifier corresponding to the minimum error rate, the method further includes: processing the absolute value of the difference between the label value and the predicted value of the training sample obtained by the simplified version classifier corresponding to the minimum error rate through a symbol function to obtain a second symbol function value, and calculating a coefficient for updating the weight of the training sample by using the minimum error rate calculated in the feature screening of the current round and the second symbol function value; and multiplying the corresponding weight of the training sample in the current screening with the coefficient so as to update the current weight of the training sample to the corresponding weight of the training sample in the next screening.

Optionally, before iteratively training the selected classifier using the set of first blockchain address samples, determining an optimal number of iterations of the iterative training by: dividing the set of the first block chain address samples into K subsets, training a classifier by using the K-1 subsets each time, and calculating a cross validation error rate estimation value after each training; calculating the variance of the cross validation error rate estimated value by using the cross validation error rate estimated value obtained by each training so as to select the target iteration times corresponding to the minimum cross validation error rate estimated value; and calculating the sum of the minimum cross validation error rate estimation value and the standard deviation of the cross validation error rate estimation value, and selecting the minimum value of each iteration number of which the corresponding cross validation error rate estimation value is not more than the sum from a preset iteration number set as the optimal iteration number.

Optionally, at each iteration, the weighted average error rate is calculated by: and processing the absolute value of the difference between the label value and the predicted value of each first block chain address sample through a sign function to obtain a corresponding third sign function value, performing weighted summation on each third sign function value according to the sample weight of the corresponding first block chain address sample in the current iteration to obtain a weighted sum, and taking the ratio of the weighted sum to the sum of all the sample weights in the current iteration as the weighted average error rate.

Optionally, when the current iteration is a first iteration, a sample weight of the first blockchain address sample in the current iteration is a sample weight initial value of the first blockchain address sample; under the condition that the current iteration is not the first iteration, the sample weight of the first block chain address sample in the current iteration is obtained by calculating the product of the following three terms: the sample weight of the first block chain address sample in the previous iteration, a value obtained by a classifier weight in the current iteration through a preset function operation, and the third sign function value corresponding to the first block chain address sample, wherein the preset function is an exponential function with e as a base.

Optionally, the classifiers selected for each iteration are the same or different linear decision boundary classifiers.

According to another aspect of the embodiments of the present invention, an apparatus for sorting block chain addresses is provided.

A block chain address classification apparatus, comprising: a first set of blockchain address samples determining module for determining a set of first blockchain address samples, each of the first blockchain address samples comprising a first number of features representing blockchain addresses; a classifier iterative training module to iteratively train a selected classifier using the set of first blockchain address samples, wherein for each iteration: selecting a classifier to train with the goal of minimizing weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the previous iteration; and the block chain address classification model determining module is used for determining a block chain address classification model based on each iteratively trained classifier and the corresponding classifier weight, and the block chain address classification model is used for determining the category of the block chain address to be classified.

Optionally, the first block chain address sample set determining module is further configured to: obtaining a set of second blockchain address samples, each of the second blockchain address samples comprising a characteristic representing a second number of blockchain addresses, the second number being greater than or equal to the first number; performing a plurality of rounds of feature screening to select the first number of features from the second number of features, and constructing the corresponding first blockchain address sample according to the selected first number of features in each second blockchain address sample to determine the set of first blockchain address samples.

Optionally, the first blockchain address sample set determining module includes a feature filtering sub-module, configured to: in each round of feature screening, the following steps are performed: training a simplified version classifier by using single features in a feature set to be screened, wherein the simplified version classifier corresponds to the features in the feature set to be screened one by one, and the feature set to be screened is a set formed by features which are not selected in the second number of features; calculating the error rate of each simplified version classifier to obtain the minimum error rate; if the minimum error rate is smaller than or equal to a preset threshold value, selecting the features used by the simplified version classifier corresponding to the minimum error rate, and updating the feature set to be screened; if the minimum error rate is greater than the preset threshold, ending the multi-round feature screening process to obtain the first number of features.

Optionally, the feature filtering sub-module calculates an error rate of the simplified version classifier by: calculating a predicted value and a label value of each training sample corresponding to the simplified version classifier, wherein the training samples comprise the single feature used for training the simplified version classifier; and processing the absolute value of the difference between the predicted value and the label value of each training sample through a symbol function to obtain corresponding first symbol function values, and performing weighted summation on each first symbol function value according to the corresponding weight of the corresponding training sample in the round of screening to obtain the error rate of the simplified version classifier.

Optionally, the first block chain address sample set determining module further includes a weight updating submodule, configured to: processing the absolute value of the difference between the label value and the predicted value of the training sample obtained by the simplified version classifier corresponding to the minimum error rate through a symbol function to obtain a second symbol function value, and calculating a coefficient for updating the weight of the training sample by using the minimum error rate calculated in the feature screening of the current round and the second symbol function value; and multiplying the corresponding weight of the training sample in the current screening with the coefficient so as to update the current weight of the training sample to the corresponding weight of the training sample in the next screening.

Optionally, the method further includes an optimal iteration number determining module, configured to determine an optimal iteration number of the iterative training by: dividing the set of the first block chain address samples into K subsets, training a classifier by using the K-1 subsets each time, and calculating a cross validation error rate estimation value after each training; calculating the variance of the cross validation error rate estimated value by using the cross validation error rate estimated value obtained by each training so as to select the target iteration times corresponding to the minimum cross validation error rate estimated value; and calculating the sum of the minimum cross validation error rate estimation value and the standard deviation of the cross validation error rate estimation value, and selecting the minimum value of each iteration number of which the corresponding cross validation error rate estimation value is not more than the sum from a preset iteration number set as the optimal iteration number.

Optionally, the classifier iterative training module calculates the weighted average error rate at each iteration by: and processing the absolute value of the difference between the label value and the predicted value of each first block chain address sample through a sign function to obtain a corresponding third sign function value, performing weighted summation on each third sign function value according to the sample weight of the corresponding first block chain address sample in the current iteration to obtain a weighted sum, and taking the ratio of the weighted sum to the sum of all the sample weights in the current iteration as the weighted average error rate.

Optionally, the classifier selected by the classifier iterative training module at each iteration is the same or a different linear decision boundary classifier.

According to yet another aspect of an embodiment of the present invention, an electronic device is provided.

An electronic device, comprising: one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for block chain address classification provided by embodiments of the present invention.

According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.

A computer readable medium, on which a computer program is stored, which when executed by a processor implements the method for block chain address classification provided by an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: iteratively training the selected classifier using a set of first blockchain address samples, wherein, for each iteration: selecting a classifier to train with the objective of minimizing the weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the last iteration; and determining a block chain address classification model based on each classifier after iterative training and the corresponding classifier weight so as to classify the block chain address to be classified. The method can improve the accuracy and reliability of block chain address classification, and overcome the defects of error transmission, insufficient algorithm generalization, high requirements on hardware resources and time cost and the like in the prior art.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a block chain address classification method according to an embodiment of the present invention;

FIG. 2 is a block chain address classification flow diagram according to an embodiment of the invention;

FIG. 3 is a block diagram of a device for sorting blockchain addresses according to an embodiment of the present invention;

FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

FIG. 5 is a schematic block diagram of a computer system suitable for use with a server implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram illustrating the main steps of a block chain address classification method according to an embodiment of the present invention.

As shown in fig. 1, the block chain address classification method according to an embodiment of the present invention mainly includes the following steps S101 to S103.

Step S101: a set of first blockchain address samples is determined, each first blockchain address sample including a characteristic representing a first number of blockchain addresses.

Step S102: iteratively training the selected classifier using a set of first blockchain address samples, wherein, for each iteration: selecting a classifier to train with the objective of minimizing the weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the last iteration;

step S103: and determining a block chain address classification model based on each classifier after iterative training and the corresponding classifier weight, wherein the block chain address classification model is used for determining the category of the block chain address to be classified.

The step of determining the set of first blockchain address samples may specifically include: obtaining a set of second blockchain address samples, each second blockchain address sample comprising a characteristic representing a second number of blockchain addresses, the second number being greater than or equal to the first number; and performing multiple rounds of feature screening to select a first number of features from the second number of features, and constructing corresponding first blockchain address samples according to the selected first number of features in each second blockchain address sample to determine a set of first blockchain address samples.

The step of performing a plurality of rounds of feature screening to select a first number of features from a second number of features may specifically include, in each round of feature screening, performing the steps of: training a simplified version classifier by using single features in the feature set to be screened, wherein the simplified version classifier corresponds to the features in the feature set to be screened one by one, and the feature set to be screened is a set formed by features which are not selected in the second number of features; calculating the error rate of each simplified version classifier to obtain the minimum error rate; if the minimum error rate is less than or equal to a preset threshold value, selecting the features used by the simplified version classifier corresponding to the minimum error rate, and updating a feature set to be screened; if the minimum error rate is greater than the preset threshold value, ending the multi-round feature screening process to obtain a first number of features.

The error rate of the simplified version classifier can be calculated as follows: calculating a predicted value and a label value of each training sample corresponding to the simplified version classifier, wherein the training samples comprise single characteristics used for training the simplified version classifier; and processing the absolute value of the difference between the predicted value and the label value of each training sample through a symbol function to obtain corresponding first symbol function values, and performing weighted summation on the first symbol function values according to corresponding weights of the corresponding training samples in the round of screening to obtain the error rate of the simplified version classifier.

If the obtained minimum error rate is less than or equal to the preset threshold, the step of selecting the features used by the simplified version classifier corresponding to the minimum error rate may further include: processing the absolute value of the difference between the label value and the predicted value of the training sample obtained by the simplified version classifier corresponding to the minimum error rate through a symbol function to obtain a second symbol function value, and calculating a coefficient for updating the weight of the training sample by using the minimum error rate and the second symbol function value calculated in the feature screening of the round; and multiplying the weight of the training sample corresponding to the current round of screening by the coefficient for updating the weight of the training sample so as to update the current weight of the training sample to the weight of the training sample corresponding to the next round of screening.

Using the set of first blockchain address samples, before the step of iteratively training the selected classifier, determining an optimal number of iterations of iterative training by: dividing a set of first block chain address samples into K subsets, training a classifier by using the K-1 subsets each time, and calculating a cross validation error rate estimation value after each training; calculating the variance of the cross validation error rate estimated value by using the cross validation error rate estimated value obtained by each training so as to select the target iteration times corresponding to the minimum cross validation error rate estimated value; and calculating the sum of the minimum cross validation error rate estimation value and the standard deviation of the cross validation error rate estimation value, and selecting the minimum value of each iteration number of which the corresponding cross validation error rate estimation value is not more than the sum from a preset iteration number set as the optimal iteration number.

At each iteration, the weighted average error rate may be calculated as follows: and processing the absolute value of the difference between the label value of each first block chain address sample and the predicted value through a sign function to obtain a corresponding third sign function value, performing weighted summation on each third sign function value according to the sample weight of the corresponding first block chain address sample in the current iteration to obtain a weighted sum, and taking the ratio of the weighted sum to the sum of all sample weights in the current iteration as a weighted average error rate.

For each iteration, under the condition that the current iteration is the first iteration, the sample weight of the first block chain address sample in the current iteration is the initial value of the sample weight of the first block chain address sample; under the condition that the current iteration is not the first iteration, the sample weight of the first block chain address sample in the current iteration is obtained by calculating the product of the following three terms: the sample weight of the first block chain address sample in the previous iteration, a value obtained by the classifier weight in the current iteration through the operation of a preset function and a third symbol function value corresponding to the first block chain address sample, wherein the preset function is an exponential function taking e as a base.

The classifiers selected for each iteration may be the same or different linear decision boundary classifiers.

Fig. 2 is a block chain address classification flow diagram according to an embodiment of the invention.

The block chain address classification process of the embodiment of the invention realizes the classification of the block chain address to be classified through the self-adaptive enhanced block chain address classification algorithm based on the linear decision boundary classifier provided by the embodiment of the invention, and mainly comprises the following steps: extracting a block chain address on a block chain, performing feature engineering to select more than one hundred features including cross section and time sequence features, and randomly dividing sample data into a training set and a test set according to a self-defined proportion; screening the features by using a training set to obtain a set of first block chain address samples; determining an optimal iteration number M by using K-fold cross validation by using the set of first block chain address samples; training a block chain address classification model by using the set of the first block chain address samples according to the optimal iteration times M; predicting the test set by using the trained block chain address classification model, and verifying the accuracy of the model; and classifying the blockchain address to be classified by using a blockchain address classification model.

The specific process of the block chain address classification according to the embodiment of the present invention is described in detail below.

In the data preparation step, relevant addresses (the addresses of the block chains which are marked by classification and need to be marked by classification) are extracted from the block chains, and characteristic engineering is carried out. Traversing the whole block chain, extracting features including cross section and time sequence information as much as possible from historical transaction information stored on the chain, and obtaining block chain address sample data.

And splitting the sample data of the block chain address into a training set and a test set, and performing feature screening, wherein the training set used in the feature screening stage is a set of second block chain address samples.

The following describes a specific process of feature screening according to an embodiment of the present invention.

For example, in a binary scenario, assume the blockchain address sample (i.e., the second blockchain address sample) is (x)₁,y₁),(x₂,y₂),...,(x_n,y_n) Wherein x is_j∈R^PN is a P-dimensional covariate for each blockchain address sample;

a tag representing a second blockchain address sample, which may be 1 or-1 in embodiments of the present invention; p (i.e., the second number) represents the characteristic number of second blockchain address samples (i.e., the dimension of the covariate); n represents the sample size of the training set; n is⁺For the number of positive samples in the training set, n^-The number of negative samples in the training set; w_i,jIndicating the jth sample in the ith iterationWeight at generation (i.e., round i screening); β represents the maximum error rate acceptable; h is_pThe linear classifier obtained by training when the P-th feature is used alone is shown, and P is 1, … and P.

Respectively for positive and negative labels as y_jSamples of 1, -1 are given initial weights

Namely: y is_jThe initial weight of the second blockchain address sample of 1 is

y_jThe initial weight of the second blockchain address sample of-1 is

The weight is normalized as follows:

for each feature p (single feature), training a simplified version classifier by using the feature, and calculating a simplified version linear classifier h only using the feature_p(i.e., reduced version classifier) training the reduced version classifier using training samples whose features only include a single feature in the second blockchain address sample, i.e., the label is y as described above_jWeight is W as described above_i,jIn the feature screening stage, the weight is called the training sample weight.

And calculates the following error rates:

ε_pcalled simplified version classifier h_pThe function of the form sign (x) is a sign function, wherein:

h_p(x_j) Is the predicted value of the jth training sample, y_jIs the label value of the jth training sample, sign (| h)_p(x_j)-y_jI) is the first sign function value.

And when each round of feature screening is carried out, the number of the trained simplified version classifiers is the same as that of the features to be screened in the feature set to be screened, and for each feature to be screened, a training sample containing the feature is used for training one simplified version classifier, namely, the simplified version classifiers correspond to the features in the feature set to be screened one by one, and the simplified version classifiers trained by using each feature correspond to an error rate, so that the minimum error rate can be selected.

In the first round of feature screening (i ═ 1), P error rates were calculated for P ═ 1, …, and P, respectively, and the minimum error rate min { epsilon ∈ among them was determined₁,...,ε_P-i+1Judging whether the minimum error rate is less than or equal to beta (namely a preset threshold value representing the acceptable maximum error rate), if so, selecting the error rate which can reach the minimum error rate

P of (2)_minIndividual features are selected and added to the feature list. Namely: a simplified version classifier that will correspond to the minimum error rate

The features used are selected as the simplified version of the classifier corresponding to the minimum error rate, i.e., the optimal linear decision boundary classifier.

After the feature is screened in this round, for j 1.

Wherein the content of the first and second substances,

representation classifier h_pminThe predicted value of (2).

I.e. the coefficients used to update the weights of the training samples,

the value of (a), i.e. the second sign function value, y_jIs the label value of the jth training sample.

And then entering the next round of feature screening, continuously and respectively using the single feature training simplified version classifier during the next round of feature screening, and carrying out feature screening according to the same way, namely screening the features by calculating the minimum error rate, wherein the training sample weight is updated once every iteration of the algorithm.

In each round of feature screening, if the minimum error rate min ∈ is set₁,...,ε_P-i+1}>And beta, stopping the calculation, and not continuing to perform the next round of feature screening, namely stopping the whole multi-round feature screening process.

After the above-mentioned multi-round feature screening process is completed, the number of features in the finally obtained feature list is denoted as P, i.e. the first number. Screening P characteristics, epsilon from P characteristics by characteristic screening₁,...,ε_p*The error rate corresponding to the screened features.

The following describes a specific process for determining the optimal iteration number of the iterative training according to the embodiment of the present invention.

Selecting the possible optimal solution M' ═ according to the requirements for computing resource, data size and model accuracy₁,M₂,...,M_U) And the set is a preset iteration number set. And traversing the iteration time set by using the training set data, and training a classifier by using the feature set selected by the feature screening process by adopting a K-fold cross validation method to select the optimal iteration time M.

Splitting the training set into K large blocks with the same size, and circulating for K times, wherein each time, except for the K large blocks, the training set is usedK-1 module training model

Model (model)

Models built for a single classifier, (M)₁,M₂,...,M_U) In the set, each M corresponds to a single classifier, i.e. M₁Corresponding to the first single classifier, M₂Corresponding to a second single classifier, … …, M_UCorresponding to the U-th single classifier. Training model

The characteristics of the sample are P × characteristics screened out, and the label is y_jThat is, the embodiment of the present invention trains the model by using the first set of blockchain address samples

In (1).

For the U-th single classifier (U ═ 1, 2.. U), a cross validation error rate estimate is calculated

Wherein the content of the first and second substances,

the loss function is determined according to data characteristics and analysis requirements, and may be embodied in various common forms of loss functions, such as least squares loss functions, and the like. block K denotes the samples in the kth chunk, | block K | denotes the number of samples in the kth chunk, K ═ 1, …, K.

For each single classifier, the cycle is repeated for K times, and finally a cross validation error rate estimation can be calculated according to the methodEvaluating value

Due to (M)₁,M₂,...,M_U) In the set, each M corresponds to a single classifier, so that U single classifiers are obtained in total

Utilize the U pieces

The variance of the cross-validation error rate estimate can be calculated, i.e.

Selecting optimal iteration times M, specifically selecting M with lowest cross validation error rate estimation value_{err_min}(i.e., target number of iterations), i.e.:

finding an estimate in set M' that satisfies a cross-validation error rate not higher than

And

the minimum M of the sum of (a) is M, wherein,

is M_{err_min}The corresponding cross-validation error rate estimate,

for cross validation error rate estimate standard deviation, M is given by:

M*＝min{M₁,...,M_U}s.t.

i.e., in the constraints (s.t.)

Next, M is found₁,...,M_UIn other words, in a preset iteration number set, the minimum value of the iteration numbers of which the corresponding cross validation error rate estimation value is not more than the sum (the sum of the minimum cross validation error rate estimation value and the standard deviation of the cross validation error rate estimation value) is selected as the optimal iteration number.

The following describes a specific process of determining a block chain address classification model according to an embodiment of the present invention.

n represents the sample size of the training set;

representing P x features (i.e., the first number of features of the first blockchain address sample) selected from P features (i.e., the second number of features of the second blockchain address sample) of the data start; based on

The block chain address samples are first block chain address samples, and each first block chain address sample comprises P characteristics obtained through characteristic screening;

a label representing a first blockchain address sample (corresponding to the same label as the second blockchain address sample of the feature screening stage above), where j is 1_i,jRepresents the weight of the jth sample at the ith iteration (which is the same as the weight of the second blockchain address sample in the feature screening stage above), and is referred to as the sample weight in the embodiment of the present invention; m represents the optimal iteration number; g_iDenotes a linear classifier (abbreviated classifier) selected at the i-th iteration, i 1The classifiers selected in different iterations may be the same type of classifier or different types of classifiers, for example, one iteration may select a classifier based on logistic regression, the next iteration may select a classifier based on logistic regression as well, or a classifier based on a support vector machine may be selected (the classifier type is merely an example); alpha is alpha_iRepresents the classifier g_iThe classifier weight of (1).

For the j-th sample, 1, …, n, an initial weight (i.e., sample weight initial value) is defined:

for the i-th iteration 1, …, M ×:

training the Linear classifier g by minimizing the weighted average error Rate_i：

From this equation, the weighted average error rate err of this iteration_iSample weight w of each first blockchain address sample in the current iteration_i,jIt is related. Wherein sign { | y_j-g_i(x_j) The value of i is the third sign function value, g_i(x_j) The predictor for the jth first blockchain address sample at the ith iteration.

I.e. the sum of all sample weights in the ith iteration.

And weighting and summing the third symbol function values according to the sample weights of the corresponding first block chain address samples in the ith iteration to obtain weighted sums.

Each iteration of the algorithm trains a linear decision boundary classifier (i.e., linear classifier) with the goal of minimizing the error rate, i.e., the weighted average error rate described above.

Calculate classifier g as follows_iIs weighted by the classifier of_i：

For j 1.., n, the sample weight is calculated:

W_i+1,j＝W_i,j×exp(α_i)×sign{|y_j-g_i(x_j)|},i＝1,...,M*

i.e. the sample weight W of the first blockchain address sample in a certain iteration_i+1,jIs based on the initial value W of the sample weight of the first blockchain address sample_1,jOr the sample weight W of the first blockchain address sample in the last iteration_i,jAnd (4) calculating. exp (alpha)_i) The value of (a) is calculated by a preset function, namely an exponential function with e as the base, by using the classifier weight in the ith iteration. The sample weights are updated once per iteration of the algorithm.

After M times of iteration is completed, based on each classifier after iterative training and the corresponding classifier weight, determining a block chain address classification model as follows:

and testing the accuracy of the model on the test set by using the trained block chain address classification model, and comparing the error rate of the test set with the error rate of the training set to confirm that the model has no overfitting problem.

In the above block chain address classification model, f (x) is an output of the block chain address classification model, which indicates the type of the block chain address to be classified, for example, two classifications may indicate that the type of the block chain address to be classified is an exchange block chain address or a non-exchange block chain address.

The classifiers used in the above stages (the stages of feature screening, determining the optimal iteration number, determining the block chain address classification model, etc.) of the embodiment of the present invention may be various linear decision boundary classifiers, including but not limited to logistic regression, decision tree, random forest, support vector machine, etc.

In machine learning, a large number of linear decision boundary classifiers such as linear kernel support vector machines, logistic regression, naive bayes and the like exist, although these methods have good performance in many application scenarios, the effect of applying these methods in block chain address classification in the prior art is not ideal, and the main reasons are as follows: firstly, many linear decision boundary classifiers need to assume a linear relationship between an address tag and a covariate; secondly, the XOR problem (namely the classification boundary is a nonlinear hyperplane) is obvious in a high-dimensional space, and the linear decision boundary cannot well distinguish different classes; thirdly, a plurality of classifiers need to assume that covariates are independent from each other, and each covariate is equally important for label division; fourthly, the classifier usually assumes that the weight of each sample is constant; and fifthly, most classifiers usually have better classification results when the number distribution of samples of each class is relatively balanced. The embodiment of the invention provides a self-adaptive enhanced block chain address classification method based on a linear decision boundary classification model, which overcomes the problems in the prior art. In addition, the sample weight of the embodiment of the invention is dynamically updated, so that mutual independence between covariates is not required to be assumed and each covariate is equally important for label division as in the prior art, the weight of each sample of the embodiment of the invention is not constant, and the embodiment of the invention determines the optimal iteration times through a K-fold cross validation method and then carries out iterative training, so that an accurate classification result can be obtained under the condition of no need of balanced quantity distribution of samples of various classes.

Fig. 3 is a schematic diagram of the main blocks of the device for sorting the blockchain address according to an embodiment of the present invention.

As shown in fig. 3, the apparatus 300 for sorting block chain addresses according to an embodiment of the present invention mainly includes: a first blockchain address sample set determining module 301, a classifier iterative training module 302, and a blockchain address classification model determining module 303.

A first set of blockchain address samples determining module 301 for determining a set of first blockchain address samples, each first blockchain address sample comprising a first number of features representing blockchain addresses;

a classifier iterative training module 302 to iteratively train a selected classifier using a set of first blockchain address samples, wherein for each iteration: selecting a classifier to train with the objective of minimizing the weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the last iteration;

and the block chain address classification model determining module 303 is configured to determine a block chain address classification model based on each iteratively trained classifier and a corresponding classifier weight, where the block chain address classification model is used to determine a category of a block chain address to be classified.

The first blockchain address sample set determining module 301 may be specifically configured to: obtaining a set of second blockchain address samples, each second blockchain address sample comprising a characteristic representing a second number of blockchain addresses, the second number being greater than or equal to the first number; and performing multiple rounds of feature screening to select a first number of features from the second number of features, and constructing corresponding first blockchain address samples according to the selected first number of features in each second blockchain address sample to determine a set of first blockchain address samples.

The first blockchain address sample set determination module 301 may include a feature screening submodule for: in each round of feature screening, the following steps are performed: training a simplified version classifier by using single features in the feature set to be screened, wherein the simplified version classifier corresponds to the features in the feature set to be screened one by one, and the feature set to be screened is a set formed by features which are not selected in the second number of features; calculating the error rate of each simplified version classifier to obtain the minimum error rate; if the minimum error rate is less than or equal to a preset threshold value, selecting the features used by the simplified version classifier corresponding to the minimum error rate, and updating a feature set to be screened; if the minimum error rate is greater than the preset threshold value, ending the multi-round feature screening process to obtain a first number of features.

The feature filtering sub-module may calculate the error rate of the simplified version classifier by: calculating a predicted value and a label value of each training sample corresponding to the simplified version classifier, wherein the training samples comprise single characteristics used for training the simplified version classifier; and processing the absolute value of the difference between the predicted value and the label value of each training sample through a symbol function to obtain corresponding first symbol function values, and performing weighted summation on the first symbol function values according to corresponding weights of the corresponding training samples in the round of screening to obtain the error rate of the simplified version classifier.

The first blockchain address sample set determination module may further include a weight update submodule for: processing the absolute value of the difference between the label value and the predicted value of the training sample obtained by the simplified version classifier corresponding to the minimum error rate through a symbol function to obtain a second symbol function value, and calculating a coefficient for updating the weight of the training sample by using the minimum error rate and the second symbol function value calculated in the feature screening of the round; and multiplying the corresponding weight of the training sample in the current round of screening by the coefficient so as to update the current weight of the training sample to the corresponding weight of the training sample in the next round of screening.

The block chain address classification apparatus 300 may further include an optimal iteration number determining module, configured to determine an optimal iteration number of the iterative training by: dividing a set of first block chain address samples into K subsets, training a classifier by using the K-1 subsets each time, and calculating a cross validation error rate estimation value after each training; calculating the variance of the cross validation error rate estimated value by using the cross validation error rate estimated value obtained by each training so as to select the target iteration times corresponding to the minimum cross validation error rate estimated value; and calculating the sum of the minimum cross validation error rate estimation value and the standard deviation of the cross validation error rate estimation value, and selecting the minimum value of each iteration number of which the corresponding cross validation error rate estimation value is not more than the sum from a preset iteration number set as the optimal iteration number.

Classifier iterative training module 302 calculates a weighted average error rate at each iteration by: and processing the absolute value of the difference between the label value of each first block chain address sample and the predicted value through a sign function to obtain a corresponding third sign function value, performing weighted summation on each third sign function value according to the sample weight of the corresponding first block chain address sample in the current iteration to obtain a weighted sum, and taking the ratio of the weighted sum to the sum of all sample weights in the current iteration as a weighted average error rate.

Under the condition that the current iteration is the first iteration, the sample weight of the first block chain address sample in the current iteration is the initial value of the sample weight of the first block chain address sample; under the condition that the current iteration is not the first iteration, the sample weight of the first block chain address sample in the current iteration is obtained by calculating the product of the following three terms: the method comprises the steps of obtaining a sample weight of a first block chain address sample in the previous iteration, a value obtained by the operation of a preset function by utilizing a classifier weight in the current iteration, and a third symbol function value corresponding to the first block chain address sample, wherein the preset function is an exponential function taking e as a base.

The classifier selected by the classifier iterative training module 302 at each iteration is the same or a different linear decision boundary classifier.

In addition, the detailed implementation of the block chain address classification apparatus in the embodiment of the present invention has been described in detail in the above block chain address classification method, and therefore, the repeated content is not described again.

Fig. 4 shows an exemplary system architecture 400 of a blockchain address classification method or a blockchain address classification apparatus to which embodiments of the present invention may be applied.

As shown in fig. 4, the system architecture 400 may include

terminal devices

401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the

terminal devices

401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The

terminal devices

401, 402, 403 may have installed thereon various communication client applications, such as a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

The

terminal devices

401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the block chain address classification request, and feed back a processing result (for example, an address category — just an example) to the terminal device.

It should be noted that the method for classifying a blockchain address provided by the embodiment of the present invention is generally performed by the server 405, and accordingly, the device for classifying a blockchain address is generally disposed in the server 405.

It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown. The server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a first blockchain address sample set determination module, a classifier iterative training module, and a blockchain address classification model determination module. Where the names of these modules do not in some cases constitute a limitation of the module itself, for example, the first blockchain address sample set determining module may also be described as a "module for determining a set of first blockchain address samples".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining a set of first blockchain address samples, each of the first blockchain address samples including a characteristic representing a first number of blockchain addresses; iteratively training a selected classifier using the set of first blockchain address samples, wherein, for each iteration: selecting a classifier to train with the goal of minimizing weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the previous iteration; and determining a block chain address classification model based on each classifier after iterative training and the corresponding classifier weight, wherein the block chain address classification model is used for determining the category of the block chain address to be classified.

According to the technical solution of the embodiment of the present invention, a selected classifier is iteratively trained using a set of first blockchain address samples, wherein, for each iteration: selecting a classifier to train with the objective of minimizing the weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the last iteration; and determining a block chain address classification model based on each classifier after iterative training and the corresponding classifier weight so as to classify the block chain address to be classified. The method can improve the accuracy and reliability of block chain address classification, and overcome the defects of error transmission, insufficient algorithm generalization, high requirements on hardware resources and time cost and the like in the prior art.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for block chain address classification, comprising:

determining a set of first blockchain address samples, each of the first blockchain address samples including a characteristic representing a first number of blockchain addresses;

iteratively training a selected classifier using the set of first blockchain address samples, wherein, for each iteration: selecting a classifier to train with the goal of minimizing weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the previous iteration;

and determining a block chain address classification model based on each classifier after iterative training and the corresponding classifier weight, wherein the block chain address classification model is used for determining the category of the block chain address to be classified.

2. The method of claim 1, wherein determining the set of first blockchain address samples comprises:

obtaining a set of second blockchain address samples, each of the second blockchain address samples comprising a characteristic representing a second number of blockchain addresses, the second number being greater than or equal to the first number;

performing a plurality of rounds of feature screening to select the first number of features from the second number of features, and constructing the corresponding first blockchain address sample according to the selected first number of features in each second blockchain address sample to determine the set of first blockchain address samples.

3. The method of claim 2, wherein performing multiple rounds of feature screening to select the first number of features from the second number of features comprises performing the following steps in each round of feature screening:

training a simplified version classifier by using single features in a feature set to be screened, wherein the simplified version classifier corresponds to the features in the feature set to be screened one by one, and the feature set to be screened is a set formed by features which are not selected in the second number of features;

calculating the error rate of each simplified version classifier to obtain the minimum error rate;

if the minimum error rate is smaller than or equal to a preset threshold value, selecting the features used by the simplified version classifier corresponding to the minimum error rate, and updating the feature set to be screened;

if the minimum error rate is greater than the preset threshold, ending the multi-round feature screening process to obtain the first number of features.

4. The method of claim 3, wherein the error rate of the reduced version classifier is calculated by:

calculating a predicted value and a label value of each training sample corresponding to the simplified version classifier, wherein the training samples comprise the single feature used for training the simplified version classifier;

and processing the absolute value of the difference between the predicted value and the label value of each training sample through a symbol function to obtain corresponding first symbol function values, and performing weighted summation on each first symbol function value according to the corresponding weight of the corresponding training sample in the round of screening to obtain the error rate of the simplified version classifier.

5. The method of claim 4, wherein the selecting the features used by the simplified version classifier corresponding to the minimum error rate if the minimum error rate is less than or equal to a predetermined threshold further comprises:

processing the absolute value of the difference between the label value and the predicted value of the training sample obtained by the simplified version classifier corresponding to the minimum error rate through a symbol function to obtain a second symbol function value, and calculating a coefficient for updating the weight of the training sample by using the minimum error rate calculated in the feature screening of the current round and the second symbol function value;

and multiplying the corresponding weight of the training sample in the current screening with the coefficient so as to update the current weight of the training sample to the corresponding weight of the training sample in the next screening.

6. The method of claim 1, wherein prior to iteratively training the selected classifier using the first set of blockchain address samples, comprising determining an optimal number of iterations of the iterative training by:

dividing the set of the first block chain address samples into K subsets, training a classifier by using the K-1 subsets each time, and calculating a cross validation error rate estimation value after each training;

calculating the variance of the cross validation error rate estimated value by using the cross validation error rate estimated value obtained by each training so as to select the target iteration times corresponding to the minimum cross validation error rate estimated value;

and calculating the sum of the minimum cross validation error rate estimation value and the standard deviation of the cross validation error rate estimation value, and selecting the minimum value of each iteration number of which the corresponding cross validation error rate estimation value is not more than the sum from a preset iteration number set as the optimal iteration number.

7. The method of claim 1, wherein at each iteration, the weighted average error rate is calculated by:

and processing the absolute value of the difference between the label value and the predicted value of each first block chain address sample through a sign function to obtain a corresponding third sign function value, performing weighted summation on each third sign function value according to the sample weight of the corresponding first block chain address sample in the current iteration to obtain a weighted sum, and taking the ratio of the weighted sum to the sum of all the sample weights in the current iteration as the weighted average error rate.

8. The method of claim 7,

under the condition that the current iteration is the first iteration, the sample weight of the first block chain address sample in the current iteration is the initial value of the sample weight of the first block chain address sample;

under the condition that the current iteration is not the first iteration, the sample weight of the first block chain address sample in the current iteration is obtained by calculating the product of the following three terms: the sample weight of the first block chain address sample in the previous iteration, a value obtained by a classifier weight in the current iteration through a preset function operation, and the third sign function value corresponding to the first block chain address sample, wherein the preset function is an exponential function with e as a base.

9. The method of claim 1, wherein the classifiers selected in each iteration are the same or different linear decision boundary classifiers.

10. An apparatus for block chain address classification, comprising:

a first set of blockchain address samples determining module for determining a set of first blockchain address samples, each of the first blockchain address samples comprising a first number of features representing blockchain addresses;

a classifier iterative training module to iteratively train a selected classifier using the set of first blockchain address samples, wherein for each iteration: selecting a classifier to train with the goal of minimizing weighted average error rate, wherein the weighted average error rate of the current iteration is related to the sample weight of each first block chain address sample in the current iteration, calculating the classifier weight of the classifier based on the weighted average error rate of the current iteration, and the sample weight of the first block chain address sample in the current iteration is calculated based on the initial value of the sample weight of the first block chain address sample or the sample weight of the first block chain address sample in the previous iteration;

and the block chain address classification model determining module is used for determining a block chain address classification model based on each iteratively trained classifier and the corresponding classifier weight, and the block chain address classification model is used for determining the category of the block chain address to be classified.

11. The apparatus of claim 10, wherein the first block chain address sample set determining module is further configured to:

12. The apparatus of claim 11, wherein the first block chain address sample set determining module comprises a feature filtering sub-module configured to: in each round of feature screening, the following steps are performed:

13. The apparatus of claim 12, wherein the feature filtering sub-module calculates the error rate of the reduced version classifier by:

14. The apparatus of claim 13, wherein the first block chain address sample set determination module further comprises a weight update submodule configured to:

15. The apparatus of claim 10, further comprising an optimal iteration number determining module configured to determine an optimal iteration number of the iterative training by:

16. The apparatus of claim 10, wherein the classifier iterative training module calculates the weighted average error rate at each iteration by:

17. The apparatus of claim 16,

18. The apparatus of claim 10, wherein the classifier selected by the iterative classifier training module at each iteration is the same or a different linear decision boundary classifier.

19. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-9.

20. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.