CN114429402A

CN114429402A - Risk identification method, device and equipment for accounts in the Ethereum blockchain

Info

Publication number: CN114429402A
Application number: CN202111607766.2A
Authority: CN
Inventors: 孙溢; 樊礼; 林昭文; 张引; 余恪平
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-05-03

Abstract

The embodiment of the specification particularly relates to a risk identification method, a risk identification device and risk identification equipment for an account in an ether house block chain. The method provided by the embodiment of the specification can be used for identifying and classifying the accounts in the Etherhouse block chain, and is favorable for further tracing deceased and remedying victims. The method provided by the embodiment of the specification has high effectiveness in detecting the risk account, and can rank the importance of each feature, so as to provide reference and inspiration for further improving the method or analyzing similar block chains.

Description

Risk identification method, device and equipment for accounts in the Ethereum blockchain

技术领域technical field

本发明涉及计算机技术领域，特别涉及一种以太坊区块链中账户的风险识别方法、装置及电子设备。The present invention relates to the field of computer technology, in particular to a risk identification method, device and electronic device for accounts in the Ethereum blockchain.

背景技术Background technique

自以太坊主网于2015年上线以来，其就受到广泛的关注。相比于其他区块链，开发者可以在以太坊上轻松编写去中心化的应用程序，这为许多应用场景提供了新的解决思路，使得以太坊在主网上线后迅速聚集起大量用户。截止到2021年2月，以太坊上已有超过约36万种Token合约，反映出其很高的流行程度。Since the Ethereum mainnet launched in 2015, it has received a lot of attention. Compared with other blockchains, developers can easily write decentralized applications on Ethereum, which provides new solutions for many application scenarios, allowing Ethereum to quickly gather a large number of users after the mainnet is launched. As of February 2021, there are over 360,000 token contracts on Ethereum, reflecting its high popularity.

然而任何新技术都有可能被用于非法活动，以太坊也不例外。近年来越来越多的公司开始支持使用虚拟货币支付，这使得包括以太坊在内的区块链技术进入到大众视野中，也进一步吸引了更多的非法牟利者，产生了很多基于以太坊的骗局。以太坊上存在的骗局活动有庞氏骗局、捐赠骗局、网络钓鱼、敲诈勒索等，这些非法活动使得大量受害者蒙受损失，让以太坊的声誉受到负面影响，成为了以太坊推广和发展的阻力之一。However, any new technology can potentially be used for illegal activities, and Ethereum is no exception. In recent years, more and more companies have begun to support the use of virtual currency for payment, which has brought blockchain technology including Ethereum into the public eye, and further attracted more illegal profit-seekers. 's scam. The scams that exist on Ethereum include Ponzi schemes, donation scams, phishing, extortion, etc. These illegal activities have caused losses to a large number of victims, negatively affected the reputation of Ethereum, and become resistance to the promotion and development of Ethereum. one.

现有技术中，主要是针对发生在以太坊上的庞氏骗局的识别与对反洗钱的识别上，缺少在多种骗局存在的情况下对于正常使用的账户与被用来进行骗局的账户的识别区分。这样可能会对正常账户的使用造成一定的影响，同时影响后续对于风险账户的追责问题。In the existing technology, it is mainly aimed at the identification of Ponzi schemes that occur on Ethereum and the identification of anti-money laundering. There is a lack of information on accounts that are normally used and accounts that are used for scams in the presence of multiple scams. Identify distinctions. This may have a certain impact on the use of normal accounts, and at the same time affect subsequent accountability for risky accounts.

因此，如何对以太坊区块链上的账户进行风险识别，提升以太坊区块链中账户的安全性成为目前亟需解决的技术问题。Therefore, how to identify the risks of the accounts on the Ethereum blockchain and improve the security of the accounts in the Ethereum blockchain has become a technical problem that needs to be solved urgently.

发明内容SUMMARY OF THE INVENTION

针对现有技术的上述问题，本文的目的在于，提供一种以太坊区块链中账户的风险识别方法、装置及电子设备，能够对以太坊区块链中的账户进行分类和风险识别，提升以太坊区块链账户的安全性。In view of the above problems in the prior art, the purpose of this paper is to provide a risk identification method, device and electronic device for accounts in the Ethereum blockchain, which can classify and identify risks in the accounts in the Ethereum blockchain, improve the Security of accounts on the Ethereum blockchain.

为了解决上述技术问题，本文的具体技术方案如下：In order to solve the above technical problems, the specific technical solutions in this paper are as follows:

一方面，本文提供一种以太坊区块链中账户的风险识别方法，所述方法包括：On the one hand, this paper provides a risk identification method for accounts in the Ethereum blockchain, the method includes:

根据以太坊区块链中待识别账户的目标账户地址，获取所述待识别账户的目标交易信息；According to the target account address of the account to be identified in the Ethereum blockchain, obtain the target transaction information of the account to be identified;

将所述目标交易信息输入到预先建立的账户风险识别模型，利用所述账户风险识别模型基于所述目标交易信息提取所述待识别账户的账户特征，并根据提取到的账户特征对所述待识别账户进行风险识别；其中，所述账户风险识别模型基于历史风险账户的交易信息以及历史正常账户的交易信息进行模型训练获得；Input the target transaction information into a pre-established account risk identification model, use the account risk identification model to extract the account characteristics of the account to be identified based on the target transaction information, and treat the account to be identified according to the extracted account characteristics. Identifying accounts for risk identification; wherein, the account risk identification model is obtained by model training based on transaction information of historical risk accounts and transaction information of historical normal accounts;

根据所述账户风险识别模型输出的风险识别结果，确定所述待识别账户是否属于风险账户。According to the risk identification result output by the account risk identification model, it is determined whether the account to be identified belongs to a risk account.

进一步地、所述账户风险识别模型的构建方法包括：Further, the construction method of the account risk identification model includes:

采集历史风险账户的账户地址和历史正常账户的账户地址；Collect the account address of the historical risk account and the account address of the historical normal account;

基于所述历史风险账户的账户地址获取历史风险账户的交易信息，基于所述历史正常账户的账户地址获取历史正常账户的交易信息；Obtain the transaction information of the historical risk account based on the account address of the historical risk account, and obtain the transaction information of the historical normal account based on the account address of the historical normal account;

对所述历史风险账户的交易信息进行特征提取，获得风险账户特征集合，对所述历史正常账户的交易信息进行特征提取，获得正常账户特征集合；Perform feature extraction on the transaction information of the historical risk account to obtain a risk account feature set, and perform feature extraction on the transaction information of the historical normal account to obtain a normal account feature set;

利用所述风险账户特征集合和所述正常账户特征集合对所述账户风险识别模型进行模型训练，获得所述账户风险识别模型。Model training is performed on the account risk identification model by using the risk account feature set and the normal account feature set to obtain the account risk identification model.

进一步地、所述方法还包括：Further, the method also includes:

所述利用所述风险账户特征集合和所述正常账户特征集合对所述账户风险识别模型进行模型训练，获得所述账户风险识别模型，包括：The performing model training on the account risk identification model by using the risk account feature set and the normal account feature set to obtain the account risk identification model, including:

将所述风险账户特征集合和所述正常账户特征集合输入到账户风险识别模型中，进行模型训练，利用网格搜索确定出所述账户风险识别模型的目标模型参数组合；Inputting the risk account feature set and the normal account feature set into the account risk identification model, carrying out model training, and using grid search to determine the target model parameter combination of the account risk identification model;

基于所述目标模型参数组合获得账户风险识别模型。An account risk identification model is obtained based on the target model parameter combination.

进一步地、所述采集历史风险账户的账户地址包括：Further, the account address of the collection historical risk account includes:

从风险账户数据库中获取风险账户的初选账户地址，或在风险行为监督平台或以太坊区块链浏览器中通过查询风险关键词，获取风险账户的初选账户地址；Obtain the primary account address of the risk account from the risk account database, or obtain the primary account address of the risk account by querying the risk keywords in the risk behavior monitoring platform or the Ethereum blockchain browser;

在获取到风险账户的初选账户地址后，对初选账户地址进行查重，将重复的初选账户地址删除，获得风险账户的账户地址。After obtaining the primary account address of the risk account, check the address of the primary account, delete the duplicate primary account address, and obtain the account address of the risk account.

进一步地、所述历史正常账户的账户地址的采集方法包括：Further, the method for collecting the account address of the historical normal account includes:

从以太坊区块中采集交易发送者的以太坊地址作为备用账户地址；Collect the Ethereum address of the transaction sender from the Ethereum block as the alternate account address;

将备用账户地址去重后与采集到的历史风险账户的账户地址进行比对，将与所述历史风险账户的账户地址相同的备用账户地址删除，剩余的备用账户地址作为所述历史正常账户的账户地址。After deduplication, the alternate account address is compared with the collected account address of the historical risk account, the alternate account address that is the same as the account address of the historical risk account is deleted, and the remaining alternate account addresses are used as the historical normal account. Account address.

进一步地、所述方法还包括：Further, the method also includes:

利用所述账户风险识别模型计算各个账户特征的平均增益率；Calculate the average gain rate of each account feature by using the account risk identification model;

根据各个账户特征的平均增益率从高到低对各个账户特征进行排序，将排序在前指定名次的账户特征作为风险识别账户特征；Sort each account feature according to the average gain rate of each account feature from high to low, and take the account feature with the top specified ranking as the risk identification account feature;

所述利用所述账户风险识别模型基于所述目标交易信息提取所述待识别账户的账户特征，并根据提取到的账户特征对所述待识别账户进行风险识别，包括：The use of the account risk identification model to extract the account characteristics of the to-be-identified account based on the target transaction information, and to carry out risk identification to the to-be-identified account according to the extracted account characteristics, includes:

利用所述账户风险识别模型基于所述目标交易信息提取所述待识别账户的账户特征后，筛选出所述风险识别账户特征中与所述风险识别账户特征相同的账户特征，所述账户风险识别模型利用筛选后的账户特征对所述待识别账户进行风险识别。After using the account risk identification model to extract the account characteristics of the to-be-identified account based on the target transaction information, screen out the account characteristics that are the same as the risk identification account characteristics in the risk identification account characteristics. The model uses the screened account characteristics to perform risk identification on the to-be-identified account.

进一步地、所述账户特征包括：统计特征和交易类型特征，所述统计特征包括：交易金额特征、交易次数特征、交易时间特征。Further, the account features include: statistical features and transaction type features, and the statistical features include: transaction amount features, transaction times features, and transaction time features.

另一方面，本文提供一种以太坊区块链中账户的风险识别装置，包括：On the other hand, this article provides a risk identification device for accounts in the Ethereum blockchain, including:

信息获取模块，用于根据以太坊区块链中待识别账户的目标账户地址，获取所述待识别账户的目标交易信息；an information acquisition module, configured to acquire the target transaction information of the to-be-identified account according to the target account address of the to-be-identified account in the Ethereum blockchain;

机器学习模型识别模块，用于将所述目标交易信息输入到预先建立的账户风险识别模型，利用所述账户风险识别模型基于所述目标交易信息提取所述待识别账户的账户特征，并根据提取到的账户特征对所述待识别账户进行风险识别；其中，所述账户风险识别模型基于历史风险账户的交易信息以及历史正常账户的交易信息进行模型训练获得；The machine learning model identification module is used for inputting the target transaction information into a pre-established account risk identification model, using the account risk identification model to extract the account characteristics of the account to be identified based on the target transaction information, and extracting The obtained account characteristics carry out risk identification on the account to be identified; wherein, the account risk identification model is obtained by model training based on the transaction information of the historical risk account and the transaction information of the historical normal account;

账户风险识别模块，用于根据所述账户风险识别模型输出的风险识别结果，确定所述待识别账户是否属于风险账户。An account risk identification module, configured to determine whether the to-be-identified account belongs to a risk account according to the risk identification result output by the account risk identification model.

进一步地、所述装置还包括模型构建模块，用于采用下述方法构建所述账户风险识别模型：Further, the device also includes a model building module for building the account risk identification model using the following method:

另一方面，本文还提供一种电子设备，所述电子设备包括处理器和存储器，所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述所述的使用轻量化网络中间件实现上述以太坊区块链中账户的风险识别方法。In another aspect, the present document also provides an electronic device, the electronic device includes a processor and a memory, the memory stores at least one instruction, at least a piece of program, a code set or an instruction set, the at least one instruction, the At least one section of program, the code set or the instruction set is loaded and executed by the processor to implement the above-mentioned method for implementing the risk identification method for an account in the Ethereum blockchain by using a lightweight network middleware as described above.

本文所述的一种以太坊区块链中账户的风险识别方法、装置及电子设备，利用历史风险账户和历史安全账户的交易信息进行模型训练，构建出账户风险识别模型，再利用构建出的账户风险识别模型对待识别账户的账户特征进行提取和风险预测，实现对以太坊区块链中风险账户的识别，提升了以太坊区块链中账户使用的安全性。利用本说明书实施例提供的方法可以实现对以太坊区块链中的账户进行识别分类，有助于进一步对行骗者的追责和对受害者的补救。本说明书实施例提供的方法能够在检测风险账户时具有较高的有效性，并且可以对各个特征的重要性进行排名，为进一步改进方法或对类似区块链做分析提供参考和启发。A risk identification method, device and electronic equipment for accounts in the Ethereum blockchain described in this paper use the transaction information of historical risk accounts and historical security accounts for model training, build an account risk identification model, and then use the constructed The account risk identification model extracts and predicts the account characteristics of the account to be identified, realizes the identification of risk accounts in the Ethereum blockchain, and improves the security of accounts in the Ethereum blockchain. Using the method provided by the embodiments of this specification can realize the identification and classification of the accounts in the Ethereum blockchain, which is helpful for further accountability of fraudsters and remediation of victims. The method provided by the embodiments of this specification can have high effectiveness in detecting risk accounts, and can rank the importance of each feature, providing reference and inspiration for further improvement of the method or analysis of similar blockchains.

为让本文的上述和其他目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附图式，做详细说明如下。In order to make the above-mentioned and other objects, features and advantages of this paper more obvious and easy to understand, preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本文实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本文的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only For some embodiments herein, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative effort.

图1是本说明书一个实施例中以太坊区块链中账户的风险识别方法的流程示意图；1 is a schematic flowchart of a method for risk identification of accounts in the Ethereum blockchain according to an embodiment of this specification;

图2是本说明书一个实施例中账户风险识别模型进行账户风险识别的流程示意图；FIG. 2 is a schematic flowchart of account risk identification performed by an account risk identification model in an embodiment of this specification;

图3是本说明书一个实施例中特征选取策略的示意图；3 is a schematic diagram of a feature selection strategy in an embodiment of this specification;

图4是本说明书一个实施例中以太坊区块链中账户的交易类型特征示意图；4 is a schematic diagram of the transaction type characteristics of an account in the Ethereum blockchain in an embodiment of this specification;

图5是本说明书一个实施例中以太坊区块链中账户的风险识别装置的结构示意图；5 is a schematic structural diagram of a risk identification device for an account in the Ethereum blockchain according to an embodiment of this specification;

图6示出了本文实施例提供的一种以太坊区块链中账户的风险识别的电子设备的结构示意图。FIG. 6 shows a schematic structural diagram of an electronic device for risk identification of an account in the Ethereum blockchain provided by an embodiment of this document.

具体实施方式Detailed ways

下面将结合本文实施例中的附图，对本文实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本文一部分实施例，而不是全部的实施例。基于本文中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本文保护的范围。The technical solutions in the embodiments herein will be clearly and completely described below with reference to the accompanying drawings in the embodiments herein. Obviously, the described embodiments are only a part of the embodiments herein, rather than all the embodiments. Based on the embodiments herein, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection herein.

需要说明的是，本文的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本文的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、装置、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims herein and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances such that the embodiments herein described can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, apparatus, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

随着科技的进步，区块链技术逐渐应用到各个领域中，但对于区块链技术中账户的安全性仍是需要重点研究的问题。区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链本质上是一个去中心化的数据库，同时作为比特币的底层技术，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了一批次比特币网络交易的信息，用于验证其信息的有效性(防伪)和生成下一个区块。以太坊(英文Ethereum)是一个开源的有智能合约功能的公共区块链平台，通过其专用加密货币以太币(Ether)提供去中心化的以太虚拟机(Ethereum Virtual Machine)来处理点对点合约。With the advancement of science and technology, blockchain technology has been gradually applied in various fields, but the security of accounts in blockchain technology is still a key issue that needs to be studied. Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain is essentially a decentralized database. At the same time, as the underlying technology of Bitcoin, it is a series of data blocks associated with cryptographic methods. Each data block contains a batch of Bitcoin network transactions. information to verify the validity of its information (anti-counterfeiting) and generate the next block. Ethereum (English) is an open-source public blockchain platform with smart contract functions. It provides a decentralized Ethereum Virtual Machine (EVM) through its dedicated cryptocurrency, Ether, to process peer-to-peer contracts.

本说明书实施例针对基于数据分析检测区分以太坊区块链中的风险账户与合法账户，利用机器学习模型对以太坊区块链中的账户进行风险识别，识别出其中的风险账户，以确保以太坊区块链中交易的安全性。The embodiments of this specification are aimed at distinguishing risk accounts and legal accounts in the Ethereum blockchain based on data analysis and detection, using machine learning models to identify risks in the accounts in the Ethereum blockchain, and identifying risk accounts in them to ensure that the ether Security of transactions in the blockchain.

图1是本说明书一个实施例中以太坊区块链中账户的风险识别方法的流程示意图，如图1所示，本说明书提供的以太坊区块链中账户的风险识别方法可以应用于服务器、客户端如：计算机、智能手机、智能穿戴设备、平板电脑等终端设备中，所述方法包括：Fig. 1 is a schematic flowchart of a risk identification method for an account in the Ethereum blockchain according to an embodiment of this specification. As shown in Fig. 1, the risk identification method for an account in the Ethereum blockchain provided in this specification can be applied to the server, In a client such as a terminal device such as a computer, a smart phone, a smart wearable device, a tablet computer, etc., the method includes:

步骤102、根据以太坊区块链中待识别账户的目标账户地址，获取所述待识别账户的目标交易信息。Step 102: Acquire target transaction information of the account to be identified according to the target account address of the account to be identified in the Ethereum blockchain.

在具体的实施过程中，在以太坊区块链中每个账户对应有一个账户地址，可以通过账户地址获取到该账户的所有交易信息。在对待识别账户进行风险识别时，可以先从以太坊区块链中获取该待识别账户的目标账户地址，再基于目标账户地址获取到待识别账户的目标交易信息。待识别账户可以理解为需要进行风险识别或检测的账户，目标交易信息可以是待识别账户在指定时间范围内的交易数据，可以根据实际应用需求设置获取目标交易信息的时间范围，在实际使用时，还可以设置以太坊区块链的账户风险识别频率，每隔指定时间对以太坊区块链中的账户依次进行风险识别，或者，针对特定的账户进行风险识别，本说明书实施例不做具体限定。In the specific implementation process, each account in the Ethereum blockchain corresponds to an account address, and all transaction information of the account can be obtained through the account address. When performing risk identification on an account to be identified, the target account address of the account to be identified can be obtained from the Ethereum blockchain first, and then the target transaction information of the account to be identified can be obtained based on the target account address. The account to be identified can be understood as an account that needs risk identification or detection. The target transaction information can be the transaction data of the account to be identified within a specified time range. The time range for obtaining the target transaction information can be set according to the actual application requirements. , you can also set the account risk identification frequency of the Ethereum blockchain, and perform risk identification on the accounts in the Ethereum blockchain every specified time in turn, or perform risk identification for a specific account, which is not specified in the embodiment of this specification. limited.

步骤104、将所述目标交易信息输入到预先建立的账户风险识别模型，利用所述账户风险识别模型基于所述目标交易信息提取所述待识别账户的账户特征，并根据提取到的账户特征对所述待识别账户进行风险识别；其中，所述账户风险识别模型基于历史风险账户的交易信息以及历史正常账户的交易信息进行模型训练获得。Step 104: Input the target transaction information into a pre-established account risk identification model, use the account risk identification model to extract the account characteristics of the account to be identified based on the target transaction information, and compare the account characteristics according to the extracted account characteristics. The account to be identified is subjected to risk identification; wherein, the account risk identification model is obtained through model training based on the transaction information of the historical risk account and the transaction information of the historical normal account.

在具体的实施过程中，本说明书实施例可以预先利用历史风险账户的交易信息以及历史正常账户的交易信息进行机器学习模型的训练，构建出能够对以太坊区块链中的账户进行风险识别的账户风险识别模型。在需要对待识别账户进行风险识别时，再使用训练好的账户风险识别模型对待识别账户进行风险识别。其中，账户风险识别模型使用的机器学习算法可以根据实际需要进行选择，如：可以使用神经网络算法、随机森林算法等，本说明书实施例不做具体限定。In the specific implementation process, the embodiment of this specification can use the transaction information of the historical risk account and the transaction information of the historical normal account to train the machine learning model in advance, and build a system that can identify the risks of the accounts in the Ethereum blockchain. Account risk identification model. When it is necessary to identify the risk of the account to be identified, the trained account risk identification model is used to identify the risk of the account to be identified. The machine learning algorithm used in the account risk identification model can be selected according to actual needs, for example, a neural network algorithm, a random forest algorithm, etc. can be used, which are not specifically limited in the embodiments of this specification.

本说明书实施例主要是识别以太坊区块链中账户是安全账户还是风险账户，基于此，本说明书实施例中的账户风险识别模型可以选择分类器模型如：XGBoost(ExtremeGradient Boosting，极端梯度提升)分类器，是梯度提升树算法的一种高效实现。在构建XGBoost分类器时，可以首先构造目标函数，然后使用泰勒级数对目标函数进行展开，把目标函数转换为与预测残差相关的多项式函数进行计算。其核心思想是通过不断向模型中添加基分类器，以此不断拟合前一次预测结果的残差，最后将所有基分类器的结果进行加权整合得到最终结果。当然，根据实际使用需求，还可以采用其他的机器学习算法构建账户风险识别模型，本说明书实施例不做具体限定。The embodiment of this specification is mainly to identify whether the account in the Ethereum blockchain is a safe account or a risk account. Based on this, the account risk identification model in the embodiment of this specification can select a classifier model such as: XGBoost (Extreme Gradient Boosting, extreme gradient boosting) A classifier is an efficient implementation of the gradient boosting tree algorithm. When constructing the XGBoost classifier, the objective function can be constructed first, and then the objective function can be expanded by using Taylor series, and the objective function can be converted into a polynomial function related to the prediction residual for calculation. The core idea is to continuously add base classifiers to the model to continuously fit the residuals of the previous prediction results, and finally weight the results of all base classifiers to obtain the final result. Of course, other machine learning algorithms may also be used to construct an account risk identification model according to actual usage requirements, which are not specifically limited in the embodiments of this specification.

本说明书一些实施例中，所述账户风险识别模型的构建方法可以包括：In some embodiments of this specification, the method for constructing the account risk identification model may include:

在具体的实施过程中，可以分别选择一部分历史风险账户和一部分历史正常账户作为样本，采集历史风险账户的账户地址和历史正常账户的账户地址，再基于账户地址获取对应的历史风险账户的交易信息和历史正常账户的交易信息。对获取到的交易信息进行特征提取，提取出风险账户特征集合和正常账户特征集合，分别作为正样本集合和负样本集合。利用提取到的风险账户特征集合和正常账户特征集合对账户风险识别模型进行模型训练，可以预先构建出账户风险识别模型的目标函数、约束条件等，再将风险账户特征集合和正常账户特征集合中的数据输入到模型中，对模型的目标函数或约束条件中的参数进行调整，直至模型的识别精度满足预设要求或训练次数满足预设要求，完成模型训练的过程，最后获得的模型记为账户风险识别模型。In the specific implementation process, a part of historical risk accounts and a part of historical normal accounts can be selected as samples respectively, the account addresses of historical risk accounts and the account addresses of historical normal accounts can be collected, and then the transaction information of the corresponding historical risk accounts can be obtained based on the account addresses. and historical normal account transaction information. Feature extraction is performed on the acquired transaction information, and a risk account feature set and a normal account feature set are extracted, which are respectively used as a positive sample set and a negative sample set. Using the extracted risk account feature set and normal account feature set to train the account risk identification model, the objective function, constraints, etc. of the account risk identification model can be constructed in advance, and then the risk account feature set and the normal account feature set are used. Input the data into the model, adjust the objective function of the model or the parameters in the constraints until the recognition accuracy of the model meets the preset requirements or the number of training times meets the preset requirements, and the process of model training is completed, and the final model obtained is recorded as Account risk identification model.

利用历史已知存在风险的账户和已经确认属于安全账户的数据作为样本进行机器学习模型的训练，再利用训练好的模型对待识别账户进行风险识别即可以实现对以太坊区块链中风险账户的识别，确保以太坊区块链中账户交易的安全性。Use historically known risk accounts and data that have been confirmed to belong to safe accounts as samples to train the machine learning model, and then use the trained model to identify the risk of the account to be identified, which can realize the risk account in the Ethereum blockchain. Identify and ensure the security of account transactions in the Ethereum blockchain.

本说明书一些实施例中，所述利用所述风险账户特征集合和所述正常账户特征集合对所述账户风险识别模型进行模型训练，获得所述账户风险识别模型，包括：In some embodiments of this specification, performing model training on the account risk identification model by using the risk account feature set and the normal account feature set to obtain the account risk identification model includes:

在具体的实施过程中，在进行模型训练时，将风险账户特征集合和正常账户特征集合中的数据输入到模型中后，可以以AUC(Area under the Curve of ROC)为模型性能评分标准，利用网格搜索对模型参数进行寻优，确定出最佳的模型参数组合，将最佳的模型参数组合带入到模型中即可以获得训练好的模型。网格搜索法可以指定参数值的一种穷举搜索方法，通过将估计函数的参数通过交叉验证的方法进行优化来得到最优的学习算法。即，将各个参数可能的取值进行排列组合，列出所有可能的组合结果生成“网格”，然后将各组合用于训练，并使用交叉验证对表现进行评估。在拟合函数尝试了所有的参数组合后，返回一个合适的分类器，自动调整至最佳参数组合。本说明书实施例中最佳的模型参数组合记为目标模型参数组合，模型参数可以理解为机器学习模型中的影响模型性能的参数。例如：对于XGBoost分类器的模型参数可以包括：学习率learning_rate、最大深度max_depth以及弱分类器数量n_estimators等重要参数。In the specific implementation process, during model training, after inputting the data in the risk account feature set and the normal account feature set into the model, the AUC (Area under the Curve of ROC) can be used as the model performance scoring standard, and using Grid search optimizes the model parameters, determines the best model parameter combination, and brings the best model parameter combination into the model to obtain a trained model. The grid search method can specify an exhaustive search method of parameter values, and the optimal learning algorithm is obtained by optimizing the parameters of the estimated function through cross-validation. That is, the possible values of each parameter are arranged and combined, the results of all possible combinations are listed to generate a "grid", then each combination is used for training, and the performance is evaluated using cross-validation. After the fitting function has tried all parameter combinations, it returns an appropriate classifier that automatically adjusts to the best parameter combination. The optimal model parameter combination in the embodiment of this specification is recorded as the target model parameter combination, and the model parameter can be understood as the parameter in the machine learning model that affects the performance of the model. For example, the model parameters for the XGBoost classifier may include important parameters such as learning rate learning_rate, maximum depth max_depth, and the number of weak classifiers n_estimators.

利用网格搜索确定最佳的模型参数组合从而获得识别结果最准确的账户风险识别模型，为后续以太坊区块链中账户的风险识别奠定了数据基础。Grid search is used to determine the best combination of model parameters to obtain the account risk identification model with the most accurate identification results, which lays a data foundation for the subsequent risk identification of accounts in the Ethereum blockchain.

本说明书一些实施例中，所述方法还包括：In some embodiments of this specification, the method further includes:

在具体的实施过程中，在模型训练结束时，可以根据平均增益率对输入模型的各个账户特征做重要性排序，平均增益率可以理解为该特征历次分裂树模型时的增益率之和的平均值。对于XGBoost分类器，可以通过XGBoost的Python包中的get_score来得到各个特征的重要性，通过设定importance_type＝’gain’即可得到各个特征的平均增益率排序。In the specific implementation process, at the end of the model training, the importance of each account feature of the input model can be ranked according to the average gain rate. The average gain rate can be understood as the average of the sum of the gain rates of the feature when the tree model is split in the past. value. For the XGBoost classifier, the importance of each feature can be obtained through get_score in the Python package of XGBoost, and the average gain rate ranking of each feature can be obtained by setting importance_type='gain'.

所得特征排序可以被用于改进特征的选取，如：可以将平均增益率从高到低对各个账户特征进行排序，将排序在前指定名次的账户特征作为风险识别账户特征。在运行性能较为有限的情况下，对账户进行判断时，可以选取排序较为靠前的账户特征即选择待识别账户的账户特征中的风险识别账户特征进行运算，去掉排名靠后的特征，以减少模型的计算时间。此外，排名靠前的特征是更多异常账户的共性特征，该特征排序可以用于对风险行为特征的分析。The resulting feature ranking can be used to improve feature selection. For example, the average gain rate can be ranked from high to low for each account feature, and the account feature ranked first in the specified ranking can be used as the risk identification account feature. In the case of relatively limited operating performance, when judging accounts, you can select the account features that are ranked higher, that is, select the risk-identifying account features in the account features of the account to be identified for operation, and remove the features ranked lower to reduce the risk. The computation time of the model. In addition, the top-ranked features are the common features of more abnormal accounts, and this feature ranking can be used to analyze the characteristics of risky behaviors.

通过平均增益率对账户特征进行筛选，减少了模型的计算量，提升了模型的计算速度，进而提升了以太坊区块链中账户风险识别的速度。Screening the account characteristics through the average gain rate reduces the calculation amount of the model, improves the calculation speed of the model, and further improves the speed of account risk identification in the Ethereum blockchain.

本说明书一些实施例中，所述采集历史风险账户的账户地址可以包括：In some embodiments of this specification, the account address of the historical risk account for the collection may include:

在具体的实施过程中，可以通过风险账户数据库来采集风险账户的账户地址，风险数据库可以理解为存储有存在非法行为账户信息的数据库，还可以通过专门对风险行为进行监控的风险行为监督平台上通过输入对应的风险关键词或者直接在以太坊区块链的浏览器中输入风险关键词进行风险账户的账户地址的查询。In the specific implementation process, the account address of the risk account can be collected through the risk account database. The risk database can be understood as the database that stores the account information of illegal behaviors. It can also be used on the risk behavior supervision platform specially designed to monitor the risk behavior. Query the account address of the risk account by entering the corresponding risk keyword or directly entering the risk keyword in the browser of the Ethereum blockchain.

例如：可以利用CryptoScamDB来采集风险账户的账户地址，CryptoScamDB(cryptoscamdb.org)是一个专门收集与加密货币相关的非法活动如：骗局活动报告的开源数据库，其会记录与加密货币骗局活动相关的URL等信息及相关联的加密货币地址，以太坊的相关骗局也被包含在其数据库中，如表1所示，表1是CryptoScamDB提供的骗局数据库中的一条骗局数据。通过该网站的接口可以收集到多个用于骗局活动的以太坊地址。或者，通过AlienVault平台来采集风险账户的账户地址，AlienVault是一个开放的网络威胁情报平台，其提供了大量关于骗局和攻击行为的报告，通过调用该平台API或直接在该平台上搜索Ethereum等关键字，可以获得关于以太坊的报告，从而得到骗局的以太坊地址即风险账户的账户地址。此外，可以在以太坊浏览器Etherscan上，搜索phish、heist、scam、hack等关键字来获得已经被标记了相关标签的恶意账户地址。For example: CryptoScamDB can be used to collect the account addresses of risky accounts. CryptoScamDB (cryptoscamdb.org) is an open source database that collects reports on illegal activities related to cryptocurrency, such as scam activity, which records URLs related to cryptocurrency scam activities. Other information and associated cryptocurrency addresses, Ethereum-related scams are also included in its database, as shown in Table 1, which is a piece of scam data in the scam database provided by CryptoScamDB. Multiple Ethereum addresses used for scam activities can be collected through the website's interface. Alternatively, collect the account addresses of risky accounts through the AlienVault platform, an open cyber threat intelligence platform that provides a large number of reports on scams and attacks, by calling the platform API or directly searching for key points such as Ethereum on the platform You can get a report on Ethereum, so you can get the Ethereum address of the scam, that is, the account address of the risk account. In addition, you can search for keywords such as phish, heist, scam, hack, etc. on the Ethereum browser Etherscan to obtain malicious account addresses that have been marked with relevant tags.

表1Table 1

通常，不同的骗局活动可能会使用同一个加密货币地址，在收集得到风险账户的相关地址后，可以对账户地址进行查重，对于重复的账户地址可以只保留一个，以得到不同的地址，将所有收集到的这些地址标记为风险账户地址，并加入到账户地址数据集中。Usually, different scam activities may use the same cryptocurrency address. After collecting the relevant addresses of the risky accounts, the account address can be checked for duplicates. For the duplicate account addresses, only one address can be reserved to obtain different addresses. All these addresses collected are marked as risk account addresses and added to the account address dataset.

通过风险账户数据库或相关的平台，可以快速得到已知风险的账户地址，为后续机器学习模型的训练提供了数据基础。Through the risk account database or related platforms, the account addresses of known risks can be quickly obtained, which provides a data basis for the training of subsequent machine learning models.

本说明书一些实施例中，所述历史正常账户的账户地址的采集方法包括：In some embodiments of this specification, the method for collecting the account address of the historical normal account includes:

在具体的实施过程中，对于正常账户的账户地址的采集可以直接在以太坊区块中收集，直接采集以太坊区块中交易发送者的以太坊地址作为备用账户地址，将备用账户地址去重后，与历史风险账户的账户地址进行比对，删除其中的风险账户的账户地址，剩余的备用账户地址记为历史正常账户的账户地址。此外，本说明书实施例中可能会涉及到ERC20交易的账户识别，而以太坊ERC20标准在2017年9月才被正式纳入以太坊规范中，在其被纳入规范后人们可能还需要一段时间去了解后才会使用，所以在收集正常使用的合法地址时应当尽量选择在这一时间节点之后仍然保持活跃的账户。可以利用Web3.py工具读取以太坊节点存储的信息，通过随机选择在2017年9月以后生成的以太坊区块，并将区块中记录的所有交易的发送者的以太坊地址收集起来作为备选的正常使用的合法账户地址，并去除其中重复的地址，以保证备选地址中每一个地址都是不同的。将这些备选地址与已经收集的被用作骗局的风险账户的账户地址进行比较，去除备选地址中出现的骗局地址，余下的所有地址即被认为是作为负样本使用的暂定合法账户即正常账户的账户地址。正常账户的账户地址的收集数量可以与风险账户的账户地址的数量相近，并且，可以将所收集的所有合法账户地址标记后，加入到账户地址数据集中，表2可以理解为账户地址集合，如表2所示，每个账户地址可以对应有一个标签，其中标记为0的是合法地址即正常账户的账户地址，标记为1的是骗局地址即风险账户的账户地址。In the specific implementation process, the collection of the account address of the normal account can be directly collected in the Ethereum block, the Ethereum address of the transaction sender in the Ethereum block is directly collected as the backup account address, and the backup account address is deduplicated. Then, compare it with the account address of the historical risk account, delete the account address of the risk account, and record the remaining spare account address as the account address of the historical normal account. In addition, the account identification of ERC20 transactions may be involved in the embodiments of this specification, and the Ethereum ERC20 standard was officially incorporated into the Ethereum specification in September 2017, and it may take some time for people to understand it after it is included in the specification. It will be used later, so when collecting legitimate addresses for normal use, you should try to select an account that is still active after this time node. You can use the Web3.py tool to read the information stored by the Ethereum node, by randomly selecting the Ethereum block generated after September 2017, and collecting the Ethereum address of the sender of all transactions recorded in the block as Alternative legitimate account addresses for normal use, and remove duplicate addresses to ensure that each address in the alternative addresses is different. Compare these alternative addresses with the already collected account addresses of risky accounts used as scams, remove the scam addresses that appear in the alternative addresses, and all remaining addresses are considered to be tentative legal accounts used as negative samples. The account address of the normal account. The collected number of account addresses of normal accounts can be similar to the number of account addresses of risk accounts, and all the collected legal account addresses can be marked and added to the account address data set. Table 2 can be understood as a collection of account addresses, such as As shown in Table 2, each account address can have a corresponding label. The one marked with 0 is the legal address, that is, the account address of the normal account, and the one marked with 1 is the account address of the fraudulent account, that is, the risk account.

表2Table 2

通过直接在区块链中采集账户地址，将采集到的账户地址与风险账户地址进行比较，删除其中的骗局地址，即获取到正常账户的账户地址，不需要专门对正常账户进行识别和获取，提升了数据采集的效率，为后续模型的训练提供了数据基础。By directly collecting the account address in the blockchain, comparing the collected account address with the risk account address, and deleting the fraudulent address, the account address of the normal account is obtained, and there is no need to identify and obtain the normal account. It improves the efficiency of data collection and provides a data basis for subsequent model training.

获取到账户地址后，可以基于账户地址获取对应账户的交易信息，以太坊浏览器Etherscan提供了大量API以查询浏览以太坊主网上的各类信息，其中包括对于以太坊账户历史数据的查询API，通过这些API可以收集对应各个以太坊账户地址的普通交易、内部交易、ERC20交易的相关信息。需要注意的是，相关API只返回目标地址最近的10000次特定类型的交易，不过10000次交易提供的信息足够本方法使用，该限制对于本方法的结果没有实质影响。After obtaining the account address, you can obtain the transaction information of the corresponding account based on the account address. The Ethereum browser Etherscan provides a large number of APIs to query and browse various types of information on the Ethereum mainnet, including the query API for the historical data of the Ethereum account. These APIs can collect information about common transactions, internal transactions, and ERC20 transactions corresponding to each Ethereum account address. It should be noted that the relevant API only returns the 10,000 most recent transactions of a specific type of the target address, but the information provided by the 10,000 transactions is enough for this method to be used, and this limitation has no substantial impact on the results of this method.

构建出账户风险识别模型后，在使用账户风险识别模型对待识别账户进行风险识别时，可以将获取到的待识别账户的目标交易信息输入到账户风险识别模型中，利用账户风险识别模型对目标交易信息进行特征提取，提取出待识别账户的账户特征，再利用风险识别模型基于提取出的账户特征对待识别账户进行风险识别。After the account risk identification model is constructed, when using the account risk identification model to identify the risk of the account to be identified, the obtained target transaction information of the account to be identified can be input into the account risk identification model, and the target transaction can be identified by the account risk identification model. The information is feature extraction, the account features of the account to be identified are extracted, and the risk identification model is used to identify the risk of the account to be identified based on the extracted account features.

步骤106、根据所述账户风险识别模型输出的风险识别结果，确定所述待识别账户是否属于风险账户。Step 106: Determine whether the account to be identified belongs to a risk account according to the risk identification result output by the account risk identification model.

在具体的实施过程中，图2是本说明书一个实施例中账户风险识别模型进行账户风险识别的流程示意图，如图2所示，账户风险识别模型主要有两个功能，一个是账户特征的提取，一个基于提取到的账户特征进行账户的分类识别。在获得待识别账户的目标账户地址后，可以将目标账户地址直接输入到账户风险识别模型中，或者也可以先基于账户地址获取到对应的交易信息，再利用账户风险识别模型进行特征提取和是否属于异常地址的判断，基于模型的输出结果，可以确定出待识别账户是否属于风险账户。模型可以直接输出待识别账户的风险标签，如：若输出为0则说明是正常账户，若输出1则说明是风险账户。账户风险识别模型还可以输出待识别账户的风险程度的分值，分值越高则说明属于风险账户的可能性越大，具体可以根据实际使用需求对模型进行配置，本说明书实施例不做具体限定。In the specific implementation process, FIG. 2 is a schematic flowchart of the account risk identification model for account risk identification in an embodiment of this specification. As shown in FIG. 2, the account risk identification model mainly has two functions, one is the extraction of account features , a classification and identification of accounts based on the extracted account features. After obtaining the target account address of the account to be identified, you can directly input the target account address into the account risk identification model, or you can first obtain the corresponding transaction information based on the account address, and then use the account risk identification model to perform feature extraction and determine whether For the judgment of abnormal addresses, based on the output results of the model, it can be determined whether the account to be identified is a risk account. The model can directly output the risk label of the account to be identified. For example, if the output is 0, it means a normal account, and if the output is 1, it means a risk account. The account risk identification model can also output the score of the risk degree of the account to be identified. The higher the score, the greater the possibility of belonging to the risk account. Specifically, the model can be configured according to the actual use requirements, which is not specified in the embodiment of this specification. limited.

本说明书实施例提供的以太坊区块链中账户的风险识别方法，利用历史风险账户和历史安全账户的交易信息进行模型训练，构建出账户风险识别模型，再利用构建出的账户风险识别模型对待识别账户的账户特征进行提取和风险预测，实现对以太坊区块链中风险账户的识别，提升了以太坊区块链中账户使用的安全性。利用本说明书实施例提供的方法可以实现对以太坊区块链中的账户进行识别分类，有助于进一步对行骗者的追责和对受害者的补救。本说明书实施例提供的方法能够在检测风险账户时具有较高的有效性，并且可以对各个特征的重要性进行排名，为进一步改进方法或对类似区块链做分析提供参考和启发。The risk identification method for accounts in the Ethereum blockchain provided by the embodiments of this specification uses the transaction information of historical risk accounts and historical security accounts for model training to construct an account risk identification model, and then uses the constructed account risk identification model to treat Identify the account characteristics of the account for extraction and risk prediction, realize the identification of risk accounts in the Ethereum blockchain, and improve the security of accounts in the Ethereum blockchain. Using the method provided by the embodiments of this specification can realize the identification and classification of the accounts in the Ethereum blockchain, which is helpful for further accountability of fraudsters and remediation of victims. The method provided by the embodiments of this specification can have high effectiveness in detecting risk accounts, and can rank the importance of each feature, providing reference and inspiration for further improvement of the method or analysis of similar blockchains.

此外，本说明书实施例中账户特征的提取中，提取到的账户特征主要可以包括统计特征和交易类型特征，所述统计特征包括：交易金额特征、交易次数特征、交易时间特征。In addition, in the extraction of account features in the embodiment of this specification, the extracted account features may mainly include statistical features and transaction type features, and the statistical features include: transaction amount features, transaction times features, and transaction time features.

在具体的实施过程中，本说明书实施例可以从两个维度分别提取出账户的特征，第一个维度为针对账户地址交易历史的统计特征，称为宽度-广度-时间特征选取策略(Width-Breadth-Time Strategy)即WBTS，表3是本说明书一些实施例中的部分账户特征，图3是本说明书一个实施例中特征选取策略的示意图，如图3以及表3所示，对于统计特征的提取主要可以从以下方面考虑：In the specific implementation process, the embodiment of this specification can extract the characteristics of the account from two dimensions respectively. The first dimension is the statistical characteristics of the transaction history of the account address, which is called the width-breadth-time feature selection strategy (Width- Breadth-Time Strategy) is WBTS, Table 3 shows some account features in some embodiments of this specification, and FIG. 3 is a schematic diagram of a feature selection strategy in an embodiment of this specification, as shown in FIG. 3 and Table 3, for statistical features Extraction can mainly be considered from the following aspects:

(1)账户发生的历史交易的宽度特征即交易金额特征。视账户为节点，交易为账户之间的连接通道，账户在交易时的转账金额的大小为交易宽度，则可以统计获得该账户的交易宽度类特征。(1) The breadth feature of historical transactions in the account is the feature of transaction amount. Considering an account as a node, a transaction as a connection channel between accounts, and the size of the transfer amount of an account during a transaction as the transaction width, the transaction width characteristics of the account can be obtained by statistics.

如：账户的余额，账户收到的总转账额，账户发送的总转账额，账户收到的最小转账额，账户发送的最小转账额，账户收到的最大转账额，账户发送的最大转账额，账户收到的平均转账额，账户发送的平均转账额，等等。Such as: account balance, total transfer amount received by account, total transfer amount sent by account, minimum transfer amount received by account, minimum transfer amount sent by account, maximum transfer amount received by account, maximum transfer amount sent by account , the average amount of transfers received by the account, the average amount of transfers sent by the account, and so on.

(2)账户发生的历史交易的广度特征即交易次数特征。视账户为节点，交易连接账户，则对于同一账户连接了多少不同账户进行统计可以得到该账户的交易广度类特征。(2) The breadth characteristic of the historical transactions occurred in the account is the characteristic of the number of transactions. Considering an account as a node and a transaction connected to an account, statistics on how many different accounts are connected to the same account can obtain the transaction breadth characteristics of the account.

如：账户收到的转账数，账户发送的转账数，账户共收到来自多少个不同地址的转账，账户共给多少个不同地址发送过转账，等等。For example: the number of transfers received by the account, the number of transfers sent by the account, how many transfers the account has received from different addresses, how many transfers the account has sent to different addresses, and so on.

(3)账户地址交易的时间特征即交易时间特征。对于不同账户，其发生交易的频率是不同的，账户的活跃时间范围也是不同的，由此统计账户发生交易的时间相关特征。(3) The time characteristics of account address transactions are transaction time characteristics. For different accounts, the frequency of transactions is different, and the active time range of the account is also different, so the time-related characteristics of transactions in the account are counted.

如：账户收到转账的平均时间，账户发出转账的平均时间，第一次收到转账到最后一次收到转账的时间跨度，第一次发送转账到最后一次发送转账的时间跨度，第一次发生转账到最后一次发生转账的时间跨度，等等。For example: the average time for the account to receive the transfer, the average time for the account to send out the transfer, the time span from the first transfer to the last transfer, the time span from the first transfer to the last transfer, the first transfer The time span from when the transfer occurred to the last time the transfer occurred, etc.

此外，还可以考虑其他类型特征，此部分特征主要用于补充前三类特征中不包括、但又有一定意义的特征。如：账户交易成功的次数，账户交易失败的次数，等等。In addition, other types of features can also be considered, and this part of the features is mainly used to supplement the features that are not included in the first three types of features, but have certain meanings. Such as: the number of successful account transactions, the number of failed account transactions, and so on.

表3table 3

一般的，被用于骗局的账户在骗局形成初期会在短时间内收到大量受害者的转账，而骗局地址在转出时通常会希望以较少的转账来转移赃款，以少支付gas费用。本说明书实施例，提取账户的交易金额特征、交易次数特征以及交易时间特征，可以分析出风险账户在使用时间、转账额大小、转账的出入度等方面与正常使用的账户存在差异，从而能够准确识别出风险账户。Generally, the account used for the scam will receive a large number of transfers from victims in a short period of time in the initial stage of the scam, and the scam address usually hopes to transfer the illicit money with fewer transfers to pay less gas fees when transferring out. . In the embodiment of this specification, by extracting the characteristics of the transaction amount, the number of transactions, and the transaction time of the account, it can be analyzed that there are differences between the risk account and the normally used account in terms of use time, transfer amount, and in and out of the transfer, so that the risk account can be accurately used. Risk accounts are identified.

第二个维度为针对以太坊的交易类型特征，图4是本说明书一个实施例中以太坊区块链中账户的交易类型特征示意图，如图4所示，根据以太坊自身的特性，可以把账户地址之间的交易类型划分为普通交易、内部交易、ERC20交易。表4是本说明书一些实施例中提取到的账户特征，如表4所示，结合第一个维度提取到的特征，对应三种不同的交易类型特征，还可以分别提取交易金额特征、交易次数广度特征、交易时间特征以及其他特征，从而形成十二类特征。The second dimension is the transaction type characteristics for Ethereum. Figure 4 is a schematic diagram of the transaction type characteristics of accounts in the Ethereum blockchain in an embodiment of this specification. As shown in Figure 4, according to the characteristics of Ethereum itself, it can be The transaction types between account addresses are divided into ordinary transactions, internal transactions, and ERC20 transactions. Table 4 is the account features extracted in some embodiments of this specification. As shown in Table 4, combined with the features extracted from the first dimension, corresponding to three different transaction type features, transaction amount features and transaction times can also be extracted respectively. Breadth features, transaction time features, and other features, resulting in twelve categories of features.

其中，ERC20其实就是一种货币协议，它能够发行很多的可代替性通行证，然后用这些东西就能够代表很多东西，比如说证件、积分或者是代币等等。ERC(Etherum Requestfor Comments)表示以太坊开发者提交的协议提案，而20表示的是议案的编号。Among them, ERC20 is actually a currency protocol, which can issue many substitutable passports, and then use these things to represent many things, such as certificates, points or tokens, etc. ERC (Etherum Request for Comments) represents the protocol proposal submitted by Ethereum developers, and 20 represents the number of the proposal.

表4Table 4

普通交易ordinary transaction 内部交易insider trading ERC20交易ERC20 transaction 交易宽度类Transaction width class 普通交易宽度特征Common Transaction Width Features 内部交易宽度特征Insider Transaction Width Features ERC20交易宽度特征ERC20 transaction width characteristics 交易广度类trade breadth 普通交易广度特征Common Transaction Breadth Features 内部交易广度特征Insider Transaction Breadth Features ERC20交易广度特征ERC20 transaction breadth characteristics 交易时间类trading hours 普通交易时间特征General Trading Hours Features 内部交易时间特征Insider Trading Time Features ERC20交易时间特征ERC20 transaction time characteristics 其他类other kind 普通交易其他特征Other features of ordinary transactions 内部交易其他特征Other features of insider trading ERC20交易其他特征Other features of ERC20 transactions

本说明书实施例从不同的维度对账户特征进行提取，全面覆盖风险账户的账户特征，进而能够准确学习风险账户的特性，为准确识别以太坊区块链中的风险账户奠定了数据基础。The embodiments of this specification extract account features from different dimensions, comprehensively cover the account features of risk accounts, and then accurately learn the features of risk accounts, laying a data foundation for accurately identifying risk accounts in the Ethereum blockchain.

为验证本说明书实施中提供的以太坊区块链中账户的风险识别方法的可用性，本说明书实施例还提供了具体应用的实验过程：实验在CPU为8核Inter Core i5-9300H2.40GHz，内存为8G，运行win10的Aspire A715-74G计算机上进行。In order to verify the availability of the risk identification method for accounts in the Ethereum blockchain provided in the implementation of this specification, the embodiment of this specification also provides an experimental process for specific applications: the experiment is performed when the CPU is an 8-core Inter Core i5-9300H 2.40GHz, and the memory is For 8G, it was performed on an Aspire A715-74G computer running win10.

数据采集部分采用CryptoScamDB提供的数据提取风险账户的账户地址，用Web3.py工具从Infura提供的以太坊远程节点获取备选合法地址。通过Etherscan对每个地址的查询交易记录得到历史数据，再根据本方法中的特征提取算法对每个账户提取账户特征，形成训练数据集。The data collection part uses the account address of the data extraction risk account provided by CryptoScamDB, and uses the Web3.py tool to obtain the alternative legal address from the Ethereum remote node provided by Infura. Historical data is obtained through Etherscan's query transaction records for each address, and then account features are extracted from each account according to the feature extraction algorithm in this method to form a training data set.

利用网格搜索，可以确定在本次实验数据集下学习率为0.2、最大深度为5、弱分类器个数为280时，10折交叉验证的平均AUC值最大，此时AUC值的评分如表1。平均训练时间为10.3756秒，平均AUC评分为0.992034，AUC评分标准差为0.00230191。表5是本说明书一个实施例中网格搜索的结果，如表5所示，可知本方法具有较高的准确性和可行性，能够较为准确地区分骗局地址与合法地址。Using grid search, it can be determined that when the learning rate is 0.2, the maximum depth is 5, and the number of weak classifiers is 280 in this experimental data set, the average AUC value of 10-fold cross-validation is the largest. At this time, the score of the AUC value is as follows Table 1. The average training time was 10.3756 seconds, the average AUC score was 0.992034, and the standard deviation of the AUC score was 0.00230191. Table 5 is the result of grid search in an embodiment of this specification. As shown in Table 5, it can be seen that this method has high accuracy and feasibility, and can more accurately distinguish scam addresses from legitimate addresses.

表5table 5

NameName AUCAUC split0_test_scoresplit0_test_score 0.9913370.991337 split1_test_scoresplit1_test_score 0.9951220.995122 split2_test_scoresplit2_test_score 0.9933430.993343 split3_test_scoresplit3_test_score 0.9872290.987229 split4_test_scoresplit4_test_score 0.9950070.995007 split5_test_scoresplit5_test_score 0.9891080.989108 split6_test_scoresplit6_test_score 0.9926010.992601 split7_test_scoresplit7_test_score 0.9920160.992016 split8_test_scoresplit8_test_score 0.9926840.992684 split9_test_scoresplit9_test_score 0.9918950.991895 mean_test_scoremean_test_score 0.9920340.992034

通过对比特征的平均增益率，可以得出各个特征对模型的重要性排序，每次运行的结果不同可能会导致各个特征的排名不完全相同，但各个特征的排名波动的大致区间基本一致。对特征重要性的排序可以用于分析风险账户的账户地址的行为特征，进一步改善特征的选取和改进方法。表6列举了实验排名前20的账户特征，可以看出，关于时间跨度的特征对于区分某账户是否为被用作骗局的账户时有很重要的意义。By comparing the average gain rate of the features, the importance of each feature to the model can be ranked. Different results of each run may lead to different rankings of each feature, but the approximate range of the ranking fluctuations of each feature is basically the same. The ranking of the feature importance can be used to analyze the behavior features of the account addresses of the risk account, and further improve the feature selection and improvement method. Table 6 lists the top 20 account characteristics of the experiment. It can be seen that the characteristics of the time span are of great significance for distinguishing whether an account is an account used as a scam.

表6Table 6

基于上述所述的以太坊区块链中账户的风险识别方法，本说明书一个或多个实施例还提供一种以太坊区块链中账户的风险识别装置。所述装置可以包括使用了本说明书实施例所述方法的装置(包括分布式系统)、软件(应用)、模块、插件、服务器、客户端等并结合必要的实施硬件的装置。基于同一创新构思，本说明书实施例提供的一个或多个实施例中的装置如下面的实施例所述。由于装置解决问题的实现方案与方法相似，因此本说明书实施例具体的装置的实施可以参考前述方法的实施，重复之处不再赘述。以下所使用的，术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现，但是硬件，或者软件和硬件的组合的实现也是可能并被构想的。Based on the above-described risk identification method for an account in the Ethereum blockchain, one or more embodiments of this specification further provide a risk identification device for an account in the Ethereum blockchain. The apparatuses may include apparatuses (including distributed systems), software (applications), modules, plug-ins, servers, clients, etc. that use the methods described in the embodiments of this specification in combination with apparatuses that implement necessary hardware. Based on the same innovative idea, the apparatuses in one or more embodiments provided by the embodiments of this specification are described in the following embodiments. Since the implementation solution of the device to solve the problem is similar to the method, the implementation of the specific device in the embodiments of the present specification may refer to the implementation of the foregoing method, and repeated descriptions will not be repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.

图5是本说明书一个实施例中以太坊区块链中账户的风险识别装置的结构示意图，如图5所示，所述装置包括：FIG. 5 is a schematic structural diagram of a risk identification device for an account in the Ethereum blockchain according to an embodiment of this specification. As shown in FIG. 5 , the device includes:

信息获取模块501，用于根据以太坊区块链中待识别账户的目标账户地址，获取所述待识别账户的目标交易信息；An information acquisition module 501, configured to acquire target transaction information of the to-be-identified account according to the target account address of the to-be-identified account in the Ethereum blockchain;

机器学习模型识别模块502，用于将所述目标交易信息输入到预先建立的账户风险识别模型，利用所述账户风险识别模型基于所述目标交易信息提取所述待识别账户的账户特征，并根据提取到的账户特征对所述待识别账户进行风险识别；其中，所述账户风险识别模型基于历史风险账户的交易信息以及历史正常账户的交易信息进行模型训练获得；The machine learning model identification module 502 is configured to input the target transaction information into a pre-established account risk identification model, and use the account risk identification model to extract the account characteristics of the to-be-identified account based on the target transaction information. The extracted account features perform risk identification on the account to be identified; wherein, the account risk identification model is obtained by model training based on the transaction information of the historical risk account and the transaction information of the historical normal account;

账户风险识别模块503，用于根据所述账户风险识别模型输出的风险识别结果，确定所述待识别账户是否属于风险账户。The account risk identification module 503 is configured to determine whether the account to be identified belongs to a risk account according to the risk identification result output by the account risk identification model.

此外，本说明书一些实施例中，所述装置还包括模型构建模块，用于采用下述方法构建所述账户风险识别模型：In addition, in some embodiments of this specification, the apparatus further includes a model building module, configured to build the account risk identification model by using the following method:

上述装置部分的实施例参考方法部分的实施例还可以有其他的实施例，此处不做过多赘述。The above-mentioned embodiments of the apparatus part refer to the embodiments of the method part, and there may also be other embodiments, which will not be repeated here.

另一方面，本说明书实施例一种计算机可读存储介质，所述存储介质中存储有至少一条指令或者至少一段程序，所述至少一条指令或者至少一段程序由处理器加载并执行以实现如上述所述的以太坊区块链中账户的风险识别方法。On the other hand, an embodiment of the present specification is a computer-readable storage medium, wherein the storage medium stores at least one instruction or at least one piece of program, and the at least one instruction or at least one piece of program is loaded and executed by a processor to achieve the above The described risk identification method for accounts in the Ethereum blockchain.

再一方面，本说明书实施例提供一种以太坊区块链中账户的风险识别的电子设备，图6示出了本文实施例提供的一种以太坊区块链中账户的风险识别的电子设备的结构示意图，如图6所示，所述设备包括处理器、存储器、通信接口以及总线，所述存储器中存储有至少一条指令或者至少一段程序，所述至少一条指令或者至少一段程序由所述处理器加载并执行以实现如上述任一所述的以太坊区块链中账户的风险识别方法。On the other hand, the embodiments of this specification provide an electronic device for risk identification of accounts in the Ethereum blockchain, and FIG. 6 shows an electronic device for risk identification of accounts in the Ethereum blockchain provided by the embodiments of this document. As shown in FIG. 6, the device includes a processor, a memory, a communication interface and a bus, and the memory stores at least one instruction or at least one program, and the at least one instruction or at least one program is executed by the The processor loads and executes to implement the risk identification method for an account in the Ethereum blockchain as described in any of the above.

需要说明的是，本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。本发明实施例所提供测试方法，其实现原理及产生的技术效果和前述系统实施例相同，为简要描述，方法实施例部分未提及之处，可参考前述系统实施例中相应内容。It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. For the same and similar parts among the various embodiments, refer to each other Can. The test method provided by the embodiment of the present invention has the same implementation principle and technical effect as the foregoing system embodiment. For a brief description, for the parts not mentioned in the method embodiment, reference may be made to the corresponding content in the foregoing system embodiment.

应理解，在本文的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本文实施例的实施过程构成任何限定。It should be understood that, in the various embodiments herein, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, rather than the implementation of the embodiments herein. The process constitutes any qualification.

还应理解，在本文实施例中，术语“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系。例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系。It should also be understood that, in the embodiments herein, the term "and/or" is only an association relationship for describing associated objects, indicating that there may be three kinds of relationships. For example, A and/or B can mean that A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本文的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this document.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

在本文所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接，也可以是电的，机械的或其它的形式连接。In the several embodiments provided herein, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本文实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions in the embodiments herein.

另外，在本文各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each of the embodiments herein may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本文的技术方案本质上或者说对现有技术做出贡献的部分，或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本文各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions in this article are essentially or make contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments herein. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

本文中应用了具体实施例对本文的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本文的方法及其核心思想；同时，对于本领域的一般技术人员，依据本文的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本文的限制。The principles and implementations of this paper are described by using specific examples in this paper. The descriptions of the above examples are only used to help understand the methods and core ideas of this paper; , there will be changes in the specific implementation manner and application scope. In summary, the content of this specification should not be construed as a limitation to this article.

Claims

1. A risk identification method for an account in the Ethereum blockchain, wherein the method comprises:

According to the target account address of the account to be identified in the Ethereum blockchain, obtain the target transaction information of the account to be identified;

Input the target transaction information into a pre-established account risk identification model, use the account risk identification model to extract the account characteristics of the account to be identified based on the target transaction information, and treat the account to be identified according to the extracted account characteristics. Identifying accounts for risk identification; wherein, the account risk identification model is obtained by model training based on transaction information of historical risk accounts and transaction information of historical normal accounts;

According to the risk identification result output by the account risk identification model, it is determined whether the account to be identified belongs to a risk account.

2. The risk identification method of an account in the Ethereum blockchain according to claim 1, wherein the construction method of the account risk identification model comprises:

Collect the account address of the historical risk account and the account address of the historical normal account;

Obtain the transaction information of the historical risk account based on the account address of the historical risk account, and obtain the transaction information of the historical normal account based on the account address of the historical normal account;

Perform feature extraction on the transaction information of the historical risk account to obtain a risk account feature set, and perform feature extraction on the transaction information of the historical normal account to obtain a normal account feature set;

Model training is performed on the account risk identification model by using the risk account feature set and the normal account feature set to obtain the account risk identification model.

3. The risk identification method for accounts in the Ethereum blockchain according to claim 2, wherein the account risk identification model is modeled by using the risk account feature set and the normal account feature set Training to obtain the account risk identification model, including:

Inputting the risk account feature set and the normal account feature set into the account risk identification model, carrying out model training, and using grid search to determine the target model parameter combination of the account risk identification model;

An account risk identification model is obtained based on the target model parameter combination.

4. The risk identification method of an account in the Ethereum blockchain according to claim 2, wherein the collection of the account address of the historical risk account comprises:

Obtain the primary account address of the risk account from the risk account database, or obtain the primary account address of the risk account by querying the risk keywords in the risk behavior monitoring platform or the Ethereum blockchain browser;

After obtaining the primary account address of the risk account, check the address of the primary account, delete the duplicate primary account address, and obtain the account address of the risk account.

5. The risk identification method of an account in the Ethereum blockchain according to claim 4, wherein the method for collecting the account address of the historical normal account comprises:

Collect the Ethereum address of the transaction sender from the Ethereum block as the alternate account address;

After deduplication, the alternate account address is compared with the collected account address of the historical risk account, the alternate account address that is the same as the account address of the historical risk account is deleted, and the remaining alternate account addresses are used as the historical normal account. Account address.

6. The risk identification method for accounts in the Ethereum blockchain according to claim 1, wherein the method further comprises:

Calculate the average gain rate of each account feature by using the account risk identification model;

Sort each account feature according to the average gain rate of each account feature from high to low, and take the account feature with the top specified ranking as the risk identification account feature;

The use of the account risk identification model to extract the account characteristics of the to-be-identified account based on the target transaction information, and to carry out risk identification to the to-be-identified account according to the extracted account characteristics, includes:

After using the account risk identification model to extract the account characteristics of the to-be-identified account based on the target transaction information, screen out the account characteristics that are the same as the risk identification account characteristics in the risk identification account characteristics. The model uses the screened account characteristics to perform risk identification on the to-be-identified account.

7. The risk identification method for accounts in the Ethereum blockchain according to claim 1, wherein the account features include: statistical features and transaction type features, and the statistical features include: transaction amount features, transaction times Features, transaction time features.

8. A risk identification device for accounts in the Ethereum blockchain, wherein the device comprises:

an information acquisition module, configured to acquire the target transaction information of the to-be-identified account according to the target account address of the to-be-identified account in the Ethereum blockchain;

The machine learning model identification module is used for inputting the target transaction information into a pre-established account risk identification model, using the account risk identification model to extract the account characteristics of the account to be identified based on the target transaction information, and extracting The obtained account characteristics carry out risk identification on the account to be identified; wherein, the account risk identification model is obtained by model training based on the transaction information of the historical risk account and the transaction information of the historical normal account;

An account risk identification module, configured to determine whether the to-be-identified account belongs to a risk account according to the risk identification result output by the account risk identification model.

9. The risk identification device for an account in the Ethereum blockchain according to claim 8, wherein the device further comprises a model building module for constructing the account risk identification model using the following method:

10. An electronic device comprising a processor and a memory, wherein at least one instruction, at least one program, code set or instruction set is stored in the memory, the at least one instruction, the at least one program, all the The set of codes or instructions is loaded and executed by the processor to implement the method of any of claims 1-7.