CN113052577A - Method and system for estimating category of virtual address of block chain digital currency - Google Patents

Method and system for estimating category of virtual address of block chain digital currency Download PDF

Info

Publication number
CN113052577A
CN113052577A CN202110272026.1A CN202110272026A CN113052577A CN 113052577 A CN113052577 A CN 113052577A CN 202110272026 A CN202110272026 A CN 202110272026A CN 113052577 A CN113052577 A CN 113052577A
Authority
CN
China
Prior art keywords
data set
address
data
class
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110272026.1A
Other languages
Chinese (zh)
Inventor
何泾沙
何琳
朱娜斐
薛瑞昕
常瑞天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110272026.1A priority Critical patent/CN113052577A/en
Publication of CN113052577A publication Critical patent/CN113052577A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/04Payment circuits
    • G06Q20/06Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme
    • G06Q20/065Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for presuming the category of a virtual address of block chain digital currency, wherein the method comprises the following steps: acquiring a known category address and an unknown category address of the digital currency; performing transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample type unbalance processing on the known type address to obtain a sample data set; dividing the sample data set into a training data set and a testing data set, and selecting an optimal model as a classifier for digital currency address class speculation after multiple iterations; carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown category address to obtain a data set to be classified; and carrying out classification calculation on the input data set to be classified based on the classifier to obtain the class to which the data set belongs. The method and the device learn the characteristics of the virtual addresses of the known classes so as to infer the class to which the virtual addresses of the unknown classes belong, and can solve the problem that most of the virtual addresses in the block chain network are in an unknown state.

Description

Method and system for estimating category of virtual address of block chain digital currency
Technical Field
The invention relates to the technical field of block chain digital currency, in particular to a method and a system for estimating the category of virtual addresses of block chain digital currency.
Background
With the development of the blockchain technology and the digital currency, more and more people are attracted by the characteristics of anonymity and decentralization of the blockchain technology and the high income of the digital currency, and pay attention to the blockchain digital currency such as bitcoin and ether house. The anonymity of the block chain digital currencies provides enough security guarantee for the user nodes, and simultaneously leads the whole network to be mixed with the fish and the dragon, and illegal transaction activities often occur. Therefore, entity type inference is carried out on the block chain digital currency under the condition that the anonymity of the user nodes is guaranteed, so that the category of the virtual address covered by each user node is framed, and the entity type inference has important value for monitoring the block chain digital currency network.
At present, research on the block chain digital currency network entity type inference mainly develops around several aspects of entity user address clustering, entity validity detection, entity user address classification, and the like:
1. aiming at the address clustering of entity users, the prior research mainly uses a heuristic clustering method to cluster transaction addresses, and input addresses in transactions are clustered into one class and identified as an entity.
2. For entity validity detection, the existing research mainly analyzes marked addresses and entities participating in illegal activities; the related illegal activities mainly include investment fraud, forbidden goods buying and selling, money laundering and the like.
3. For address classification of entity users, the existing research mainly includes obtaining address information provided in some forums, blogs and websites, formulating classification standards, and training classification models.
Although the scheme can realize the presumption of the entity type of the block chain digital currency network to a certain extent, most of the schemes rely on the information under the chain of the virtual addresses, although the method can accurately presume the category or the real information of the virtual addresses, not all the virtual addresses can be associated with the information under the chain, most of the virtual addresses in the block chain network are in an unknown state of the information, and the method needs to search a large amount of network resources, is time-consuming and labor-consuming, and has the problems of less data characteristic dimension, overhigh research cost, weak universality of the overall solution and the like.
Disclosure of Invention
In view of the above problems in the prior art, the present invention provides a method and system for estimating a category of virtual addresses of a block-chain digital currency with low cost and strong universality.
The invention discloses a method for speculating the category of a virtual address of block chain digital currency, which comprises the following steps:
acquiring a known category address and an unknown category address of the block chain digital currency;
performing transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample type unbalance processing on the known type address to obtain a sample data set;
dividing the sample data set into a training data set for model training and a testing data set for model evaluation, and selecting an optimal model as a classifier for digital currency address class conjecture after multiple iterations;
carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown category address to obtain a data set to be classified;
and carrying out classification calculation on the input data set to be classified based on the classifier to obtain the class of the data set to be classified.
As a further improvement of the invention, the transaction retrieval and feature extraction comprises the following steps:
obtaining all transaction information participated by the known category address or the unknown category address in the ledger data of the blockchain digital currency;
performing feature extraction on the transaction information to obtain a basic data set; wherein the basic data set comprises total transaction times, amount of each transaction, times as output addresses, times of participating in coinage transactions, time of first receiving bitcoin, time of first spending bitcoin, output address count of each transaction, input address count of each transaction;
and combining the characteristic data in the basic data set based on a method in characteristic engineering to obtain new data characteristics and generate a characteristic data set.
As a further improvement of the present invention, the data normalization comprises:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
As a further improvement of the invention, the feature contribution calculation and screening comprises the following steps:
aiming at the known class address, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics of which the characteristic contribution values are lower than a threshold value, recording the names of the screened-out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes;
and directly deleting the screened characteristic attributes recorded in the known class address from the data set after the data normalization aiming at the unknown class address, and forming a new characteristic data set by the rest characteristic attributes to be used as the data set to be classified.
As a further improvement of the present invention, the sample class imbalance processing includes:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority oversampling technology to obtain the sample data set.
The invention also discloses a system for presuming the category of the virtual address of the block chain digital currency, which comprises the following components:
a data processing module to:
acquiring a known category address and an unknown category address of the block chain digital currency;
performing transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample type unbalance processing on the known type address to obtain a sample data set;
carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown category address to obtain a data set to be classified;
a classifier generation module to:
dividing the sample data set into a training data set for model training and a testing data set for model evaluation, and selecting an optimal model as a classifier for digital currency address class conjecture after multiple iterations;
a classification module to:
and carrying out classification calculation on the input data set to be classified based on the classifier to obtain the class of the data set to be classified.
As a further improvement of the present invention, in the data processing module, the transaction retrieval and feature extraction includes:
obtaining all transaction information participated by the known category address or the unknown category address in the ledger data of the blockchain digital currency;
performing feature extraction on the transaction information to obtain a basic data set; wherein the basic data set comprises total transaction times, amount of each transaction, times as output addresses, times of participating in coinage transactions, time of first receiving bitcoin, time of first spending bitcoin, output address count of each transaction, input address count of each transaction;
and combining the characteristic data in the basic data set based on a method in characteristic engineering to obtain new data characteristics and generate a characteristic data set.
As a further improvement of the present invention, in the data processing module, the data normalization includes:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
As a further improvement of the present invention, in the data processing module, the feature contribution calculation and screening includes:
aiming at the known class address, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics of which the characteristic contribution values are lower than a threshold value, recording the names of the screened-out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes;
and directly deleting the screened characteristic attributes recorded in the known class address from the data set after the data normalization aiming at the unknown class address, and forming a new characteristic data set by the rest characteristic attributes to be used as the data set to be classified.
As a further improvement of the present invention, in the data processing module, the sample class imbalance processing includes:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority oversampling technology to obtain the sample data set.
Compared with the prior art, the invention has the beneficial effects that:
1. the method can solve the problem that most virtual addresses in the block chain network are in an unknown information state by learning the characteristics of the virtual addresses of the known classes so as to infer the classes to which the virtual addresses of the unknown classes belong;
2. according to the method, the features with contribution degrees lower than the threshold value are removed in a feature screening mode, so that feature dimensions are reduced, and the classification efficiency of the whole model is improved;
3. according to the invention, a cyclic iteration classification model training mode is adopted, and an optimal algorithm and the most appropriate parameters are selected in different classification scenes, so that the final generated classifier is ensured to be more accurate in the prediction of the current virtual address class;
4. the invention adopts a layer-by-layer distinguishing mode for the virtual address, and outputs the virtual address as other categories when the characteristics of the virtual address are not matched with the characteristics learned by the classifier.
Drawings
FIG. 1 is a flowchart of a method and system for category inference of blockchain digital currency virtual addresses according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention is described in further detail below with reference to the attached drawing figures:
the invention provides a method and a system for estimating the category of virtual addresses of blockchain digital currency, which firstly analyze and extract transaction information (such as transaction participation times, output times, amount of money of each transaction, time of transaction participation and the like) participated in the virtual addresses of the blockchain digital currency by using a data mining method to obtain characteristic data of the virtual addresses; then, screening and processing the characteristic data by adopting a characteristic engineering mode and constructing a data set; and performing analysis, feature comparison and other work on the feature data set to finish a scheme for dividing the virtual address category in the block chain digital currency.
Specifically, the method comprises the following steps:
example 1:
as shown in fig. 1, the present invention provides a method for estimating a category of a virtual address of a block chain digital currency, comprising:
step 1, acquiring a known category address and an unknown category address of a block chain digital currency;
specifically, the method comprises the following steps:
the method comprises the steps of taking the bit coins as a research object, obtaining known class bit coin addresses of different classes and obtaining unknown class bit coin addresses to be identified in a bit coin address class label website, and taking class labels of the known class bit coin addresses as output of subsequent classifier training.
Step 2, performing transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample type unbalance processing on the known type address to obtain a sample data set;
specifically, the method comprises the following steps:
transaction retrieval and feature extraction:
obtaining all transaction information participated by the known category address in the ledger data of the blockchain digital currency; performing feature extraction on the transaction information to obtain a basic data set; and combining the characteristic data in the basic data set based on a method in the characteristic engineering to obtain new data characteristics and generate a characteristic data set. Wherein the content of the first and second substances,
taking the bitcoin as a research object, and acquiring all transaction information participated by a given bitcoin address in the bitcoin official account book data in transaction retrieval; the characteristic extraction is to analyze the transaction information corresponding to the given bitcoin address and extract data such as the total transaction times, the amount of each transaction, the times as an output address, the times of participating in coinage transaction, the time of first receiving bitcoin, the time of first spending bitcoin, the input address count of each transaction and the like as a basic data set of the address; and then combining the feature data in the basic data set based on a method in feature engineering to obtain new data features and generate a feature data set.
Data normalization:
performing data normalization operation on the feature data set generated by feature extraction by adopting a maximum value normalization method, and limiting the processed data between 0 and 1, thereby eliminating adverse effects caused by singular sample data; wherein, the normalization method of maximum and minimum values is to use the attribute value xiMinus the minimum value min (X) in the property X and then divided by the difference between the maximum value max (X) and the minimum value min (X) of the property.
Calculating and screening characteristic contributions:
and aiming at the known class address, calculating the information gain value of each characteristic attribute contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics of which the characteristic contribution values are lower than a threshold value, recording the names of the screened-out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes.
Sample class imbalance processing:
using a boundary synthesis minority class oversampling technology (Borderline SMOTE) to process the sample imbalance problem aiming at the identified and classified bitcoin addresses and generating a sample data set; the Borderline SMOTE sampling process divides a few classes of samples into Safe, Danger and Noise. Wherein more than half of the periphery of the Safe sample is a few samples; more than half of the circumference of the Danger sample is a plurality of types of samples; there are a plurality of classes of samples around the Noise sample. Borderline SMOTE oversamples only a few classes of samples for Danger. The implementation of oversampling contains 3 steps in total: a. for each sample y in the minority class of Danger, calculating the distance from the sample y to all samples in the minority class sample set by taking the Euclidean distance as a standard to obtain the k neighbor of the sample y. b. Setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample y of the minority class, wherein the selected neighbors are assumed to be yn. c. For each randomly selected neighbor ynNew samples y constructed separately from the original samplesnewIs y and ynAny point on the connecting line.
Step 3, dividing the sample data set into a training data set for model training and a testing data set for model evaluation, and selecting an optimal model as a classifier for digital currency address class conjecture after multiple iterations;
specifically, the method comprises the following steps:
model training: in the step, a training data set is used for parameter adjustment training of a plurality of classification algorithms in machine learning, such as a K nearest neighbor algorithm, a Bayesian algorithm, a decision tree algorithm, a random forest algorithm, a gradient lifting tree algorithm and the like.
And (3) model evaluation: comparing and evaluating a classification algorithm related to model training by using a test data set according to multiple evaluation indexes such as accuracy, precision, recall rate, F1 scores and the like, and returning to the model training step to perform parameter adjustment training of the model again if no model reaches preset evaluation parameters; and if at least one model reaches each preset evaluation parameter, selecting the model with the best training effect to generate the classifier.
Step 4, carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown category address to obtain a data set to be classified;
specifically, the method comprises the following steps:
transaction retrieval and feature extraction:
the method is the same as the step 2;
data normalization:
the method is the same as the step 2;
and (3) feature contribution screening:
and directly deleting the screened characteristic attributes recorded in the known class addresses from the data set after the data normalization aiming at the unknown class addresses, and forming a new characteristic data set by the remaining characteristic attributes to be used as a data set to be classified.
Step 5, classifying and calculating the input data set to be classified based on the classifier to obtain the class of the input data set to be classified;
specifically, the method comprises the following steps:
the processed data set to be classified is used as the input of a classifier, and the classifier can complete class speculation on each bit currency address according to the feature data of each bit currency address; that is, the classifier calculates the input feature data set, classifies bitcoin addresses that are not recognized and classified into categories closer to the feature data thereof according to the calculation result, and outputs the categories as category estimations of the corresponding bitcoin addresses.
Example 2:
as shown in fig. 1, the present invention provides a system for estimating a category of virtual addresses of block-chain digital currency, comprising:
the data processing module is used for realizing the steps 1, 2 and 4;
a classifier generating module for implementing the step 3;
and the classification module is used for realizing the step 5.
Example 3:
the embodiment provides an electronic device, comprising a processor and a memory; wherein the content of the first and second substances,
the memory stores code;
the processor executes code for performing the class inference method of embodiment 1.
Example 4
The present embodiment provides a computer-readable storage medium storing a program including instructions that, when executed by a computer, cause the computer to execute the category estimation method of embodiment 1.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) that serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for estimating a category of a virtual address of a block chain digital currency, comprising:
acquiring a known category address and an unknown category address of the block chain digital currency;
performing transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample type unbalance processing on the known type address to obtain a sample data set;
dividing the sample data set into a training data set for model training and a testing data set for model evaluation, and selecting an optimal model as a classifier for digital currency address class conjecture after multiple iterations;
carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown category address to obtain a data set to be classified;
and carrying out classification calculation on the input data set to be classified based on the classifier to obtain the class of the data set to be classified.
2. The category inference method of claim 1, wherein the transaction retrieval and feature extraction comprises:
obtaining all transaction information participated by the known category address or the unknown category address in the ledger data of the blockchain digital currency;
performing feature extraction on the transaction information to obtain a basic data set; wherein the basic data set comprises total transaction times, amount of each transaction, times as output addresses, times of participating in coinage transactions, time of first receiving bitcoin, time of first spending bitcoin, output address count of each transaction, input address count of each transaction;
and combining the characteristic data in the basic data set based on a method in characteristic engineering to obtain new data characteristics and generate a characteristic data set.
3. The class inference method of claim 1 or 2, wherein the data normalization comprises:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
4. The class inference method of claim 1, wherein the feature contribution calculation and screening, comprises:
aiming at the known class address, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics of which the characteristic contribution values are lower than a threshold value, recording the names of the screened-out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes;
and directly deleting the screened characteristic attributes recorded in the known class address from the data set after the data normalization aiming at the unknown class address, and forming a new characteristic data set by the rest characteristic attributes to be used as the data set to be classified.
5. The class inference method of claim 1, wherein the sample class imbalance process comprises:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority oversampling technology to obtain the sample data set.
6. A system for class inference of virtual addresses in block-chain digital currency, comprising:
a data processing module to:
acquiring a known category address and an unknown category address of the block chain digital currency;
performing transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample type unbalance processing on the known type address to obtain a sample data set;
carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown category address to obtain a data set to be classified;
a classifier generation module to:
dividing the sample data set into a training data set for model training and a testing data set for model evaluation, and selecting an optimal model as a classifier for digital currency address class conjecture after multiple iterations;
a classification module to:
and carrying out classification calculation on the input data set to be classified based on the classifier to obtain the class of the data set to be classified.
7. The category inference system of claim 6, wherein in the data processing module, the transaction retrieval and feature extraction comprises:
obtaining all transaction information participated by the known category address or the unknown category address in the ledger data of the blockchain digital currency;
performing feature extraction on the transaction information to obtain a basic data set; wherein the basic data set comprises total transaction times, amount of each transaction, times as output addresses, times of participating in coinage transactions, time of first receiving bitcoin, time of first spending bitcoin, output address count of each transaction, input address count of each transaction;
and combining the characteristic data in the basic data set based on a method in characteristic engineering to obtain new data characteristics and generate a characteristic data set.
8. The class inference system of claim 6 or 7, wherein in the data processing module, the data normalization comprises:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
9. The class inference system of claim 6, wherein in the data processing module, the feature contribution calculation and screening, comprises:
aiming at the known class address, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics of which the characteristic contribution values are lower than a threshold value, recording the names of the screened-out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes;
and directly deleting the screened characteristic attributes recorded in the known class address from the data set after the data normalization aiming at the unknown class address, and forming a new characteristic data set by the rest characteristic attributes to be used as the data set to be classified.
10. The class inference system of claim 6, wherein the sample class is unequally processed in the data processing module, comprising:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority oversampling technology to obtain the sample data set.
CN202110272026.1A 2021-03-12 2021-03-12 Method and system for estimating category of virtual address of block chain digital currency Pending CN113052577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110272026.1A CN113052577A (en) 2021-03-12 2021-03-12 Method and system for estimating category of virtual address of block chain digital currency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110272026.1A CN113052577A (en) 2021-03-12 2021-03-12 Method and system for estimating category of virtual address of block chain digital currency

Publications (1)

Publication Number Publication Date
CN113052577A true CN113052577A (en) 2021-06-29

Family

ID=76512365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110272026.1A Pending CN113052577A (en) 2021-03-12 2021-03-12 Method and system for estimating category of virtual address of block chain digital currency

Country Status (1)

Country Link
CN (1) CN113052577A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114520739A (en) * 2022-02-14 2022-05-20 东南大学 Phishing address identification method based on cryptocurrency transaction network node classification
CN114615009A (en) * 2022-01-18 2022-06-10 北京邮电大学 Gateway flow-based digital currency detection method
CN115967525A (en) * 2022-10-25 2023-04-14 淮阴工学院 Virtual currency abnormal address detection method and device based on capsule network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918584A (en) * 2019-03-25 2019-06-21 中国科学院自动化研究所 Bit coin exchange Address Recognition method, system, device
CN111259924A (en) * 2020-01-07 2020-06-09 吉林大学 Boundary synthesis, mixed sampling, anomaly detection algorithm and data classification method
CN111444232A (en) * 2020-01-03 2020-07-24 上海宓猿信息技术有限公司 Method for mining digital currency exchange address and storage medium
CN111754345A (en) * 2020-06-18 2020-10-09 天津理工大学 Bit currency address classification method based on improved random forest
CN112435032A (en) * 2020-10-22 2021-03-02 江苏大学 Bit currency address incremental clustering method based on multi-input address clustering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918584A (en) * 2019-03-25 2019-06-21 中国科学院自动化研究所 Bit coin exchange Address Recognition method, system, device
CN111444232A (en) * 2020-01-03 2020-07-24 上海宓猿信息技术有限公司 Method for mining digital currency exchange address and storage medium
CN111259924A (en) * 2020-01-07 2020-06-09 吉林大学 Boundary synthesis, mixed sampling, anomaly detection algorithm and data classification method
CN111754345A (en) * 2020-06-18 2020-10-09 天津理工大学 Bit currency address classification method based on improved random forest
CN112435032A (en) * 2020-10-22 2021-03-02 江苏大学 Bit currency address incremental clustering method based on multi-input address clustering

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615009A (en) * 2022-01-18 2022-06-10 北京邮电大学 Gateway flow-based digital currency detection method
CN114520739A (en) * 2022-02-14 2022-05-20 东南大学 Phishing address identification method based on cryptocurrency transaction network node classification
CN115967525A (en) * 2022-10-25 2023-04-14 淮阴工学院 Virtual currency abnormal address detection method and device based on capsule network

Similar Documents

Publication Publication Date Title
CN109873812B (en) Anomaly detection method and device and computer equipment
CN107025596B (en) Risk assessment method and system
WO2019179403A1 (en) Fraud transaction detection method based on sequence width depth learning
CN113052577A (en) Method and system for estimating category of virtual address of block chain digital currency
CN106778241B (en) Malicious file identification method and device
CN108345794A (en) The detection method and device of Malware
CN116595463B (en) Construction method of electricity larceny identification model, and electricity larceny behavior identification method and device
CN108712453A (en) Detection method for injection attack, device and the server of logic-based regression algorithm
CN110069545B (en) Behavior data evaluation method and device
CN111915437A (en) RNN-based anti-money laundering model training method, device, equipment and medium
CN108022146A (en) Characteristic item processing method, device, the computer equipment of collage-credit data
CN114844840B (en) Method for detecting distributed external network flow data based on calculated likelihood ratio
CN110084609B (en) Transaction fraud behavior deep detection method based on characterization learning
CN110493262B (en) Classification-improved network attack detection method and system
CN111275416A (en) Digital currency abnormal transaction detection method and device, electronic equipment and medium
CN113657896A (en) Block chain transaction topological graph analysis method and device based on graph neural network
CN114048468A (en) Intrusion detection method, intrusion detection model training method, device and medium
CN114662602A (en) Outlier detection method and device, electronic equipment and storage medium
CN115577357A (en) Android malicious software detection method based on stacking integration technology
CN110009012B (en) Risk sample identification method and device and electronic equipment
CN111310531A (en) Image classification method and device, computer equipment and storage medium
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN113344469B (en) Fraud identification method and device, computer equipment and storage medium
CN114021716A (en) Model training method and system and electronic equipment
CN112766320B (en) Classification model training method and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination