CN113837303A

CN113837303A - Black product user identification method, TEE node and computer readable storage medium

Info

Publication number: CN113837303A
Application number: CN202111153184.1A
Authority: CN
Inventors: 史金雨; 徐雷; 陶冶; 高泽恺; 张立彤; 边林; 刘伟
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2021-12-24

Abstract

The invention provides a black product user identification method, a TEE node and a computer readable storage medium, wherein the method comprises the following steps: acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data; establishing a decision tree model based on the sample data set; and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified. The method, the TEE node and the computer readable storage medium can solve the problems that the existing black product user identification method only depends on the internal features of an operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low.

Description

Black product user identification method, TEE node and computer readable storage medium

Technical Field

The invention relates to the technical field of network security, in particular to a black product user identification method, a TEE node and a computer readable storage medium.

Background

In recent years, various types of blackouts for operator services are increasingly performed, which seriously affect the image of an operating brand and cause a great amount of economic loss for operators and users. Because of more types (such as wool, fraud phone, etc.), the current common mode is to perform modeling analysis on features available in an operator, and this mode only depends on features in the operator, so that the recognition result is inevitable, and the problems of inaccurate recognition or low recognition rate exist.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a black product user identification method, a TEE node, and a computer readable storage medium, aiming at the above defects in the prior art, so as to solve the problems that the existing black product user identification method only relies on the features inside the operator to perform modeling analysis, the identification result is inevitable, and the identification is inaccurate or the identification rate is not high.

In a first aspect, the present invention provides a black product user identification method, which is applied to any one TEE node in a TEE cluster in a trusted execution environment, and the method includes:

acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data;

establishing a decision tree model based on the sample data set;

and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.

Preferably, the characteristic data of the operator side comprises a mobile phone number, the number of mobile phone numbers owned by users corresponding to the mobile phone number, a first online frequency, a first average online time and an IP provincial crossing frequency; the bank side characteristic data comprises the mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;

the acquiring of the sample data set specifically includes:

aligning the characteristic data of the operator side and the characteristic data of the bank side according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm to obtain a sample data set;

wherein, the operator side characteristic data and the bank side characteristic data after alignment specifically include: the mobile phone number, the number of mobile phone numbers owned by a user corresponding to the mobile phone number, the first online times, the first average online time, the IP provincial crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.

Preferably, the TEE cluster at least comprises an operator-side TEE node and a bank-side TEE node, and any one TEE node is the operator-side TEE node or the bank-side TEE node.

Preferably, any one TEE node is an operator-side TEE node;

before aligning the operator side feature data and the bank side feature data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm, the method further comprises the following steps:

acquiring all fixed network data in one day, wherein the fixed network data comprises a mobile phone number, a user identification, an IP address, an online time, an offline time, an online time and a province;

acquiring the number of mobile phone numbers owned by a user corresponding to each mobile phone number according to the user identification in all the fixed network data;

counting the online times of each mobile phone number in the day according to all the fixed network data to obtain the corresponding first online times;

calculating the first average online time corresponding to each mobile phone number according to the following formula:

wherein, the Time _ online_iThe ith Time of on-line Time of the mobile phone number, Time _ offline_iThe ith off-line time of the mobile phone number is T _ DaysOnline, and the T _ DaysOnline is the first on-line times corresponding to the mobile phone number;

and acquiring the IP provincial crossing times corresponding to each mobile phone number according to the IP addresses and the provinces in which the IP addresses are located in all the fixed network data.

Preferably, any one TEE node is a bank-side TEE node;

acquiring bank APP data in one day, wherein the bank APP data comprises a mobile phone number, a user identifier, a registration date, online time, offline time, account balance and overdue times of a credit card;

counting the online times of each mobile phone number in the day according to the bank APP data to obtain the corresponding second online times;

calculating the second average online time corresponding to each mobile phone number according to the following formula:

wherein, the Time _ online_iThe ith Time of on-line Time of the mobile phone number, Time _ offline_iThe ith off-line time of the mobile phone number is B _ DaysOnline, and the B _ DaysOnline is the second on-line times corresponding to the mobile phone number;

calculating the registration days corresponding to each mobile phone number according to the following formula:

RegisterDays＝DateToday-RegisterDate

wherein datedate is the current date and register date is the registration date.

Preferably, the establishing a decision tree model based on the sample data set specifically includes:

traversing all the features in the sample data set, and calculating the information gain of the traversed features according to the following formula:

g(D,A)＝H(D)-H(D|A)

wherein the content of the first and second substances,

is the empirical entropy of the sample data set D, | D | represents the number of samples in the sample data set D, | C_kI represents the number of the partial data set samples with the category k;

is the empirical condition entropy of the traversed feature A on the sample data set D, and D can be divided into n subsets D according to the feature A₁，D₂，…，D_n，|D_iIs the subset D_iThe number of samples of (a);

and (4) dividing by using the characteristic with the maximum information gain, and repeating the process until all samples in the sample data set are divided or the maximum training times is reached to obtain the decision tree model.

In a second aspect, the present invention provides a TEE node, comprising:

the data set acquisition module is used for acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data;

the model establishing module is connected with the data set acquisition module and used for establishing a decision tree model based on the sample data set;

and the identification module is connected with the model establishing module and used for receiving the user data to be identified and inputting the user data to be identified into the decision tree model to obtain the identification result of the user data to be identified.

In a third aspect, the present invention provides a TEE node, including a memory and a processor, where the memory stores a computer program, and the processor is configured to run the computer program to implement the black user identification method according to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the black product user identification method according to the first aspect.

According to the black product user identification method, the TEE node and the computer readable storage medium, the sample data set is obtained, wherein the samples in the sample data set comprise black product user samples and normal user samples, each sample comprises the aligned operator side characteristic data and bank side characteristic data, a decision tree model is built based on the sample data set, the user data to be identified is received, the user data to be identified is input into the decision tree model, and the identification result of the user data to be identified can be obtained. Because the method is based on the trusted execution environment, the characteristic data of the operator side and the characteristic data of the bank side can be subjected to combined modeling analysis while the confidentiality and the integrity of the data are ensured, the training characteristic data are expanded, the accuracy and the coverage rate of the black product user identification are improved, and the problems that the existing black product user identification method only depends on the internal characteristics of the operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low are solved.

Drawings

FIG. 1: a flow chart of a black product user identification method in embodiment 1 of the present invention;

FIG. 2: the alignment diagram of the operator side characteristic data and the bank side characteristic data is shown in the embodiment of the invention;

FIG. 3: is a schematic structural diagram of a TEE node in embodiment 2 of the present invention;

FIG. 4: is a schematic structural diagram of a TEE node in embodiment 3 of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description will be made with reference to the accompanying drawings.

It is to be understood that the specific embodiments and figures described herein are merely illustrative of the invention and are not limiting of the invention.

It is to be understood that the embodiments and features of the embodiments can be combined with each other without conflict.

It is to be understood that, for the convenience of description, only parts related to the present invention are shown in the drawings of the present invention, and parts not related to the present invention are not shown in the drawings.

It should be understood that each unit and module related in the embodiments of the present invention may correspond to only one physical structure, may also be composed of multiple physical structures, or multiple units and modules may also be integrated into one physical structure.

It will be understood that, without conflict, the functions, steps, etc. noted in the flowchart and block diagrams of the present invention may occur in an order different from that noted in the figures.

It is to be understood that the flowchart and block diagrams of the present invention illustrate the architecture, functionality, and operation of possible implementations of systems, apparatus, devices and methods according to various embodiments of the present invention. Each block in the flowchart or block diagrams may represent a unit, module, segment, code, which comprises executable instructions for implementing the specified function(s). Furthermore, each block or combination of blocks in the block diagrams and flowchart illustrations can be implemented by a hardware-based system that performs the specified functions or by a combination of hardware and computer instructions.

It is to be understood that the units and modules involved in the embodiments of the present invention may be implemented by software, and may also be implemented by hardware, for example, the units and modules may be located in a processor.

Summary of the application

At present, a relatively common black product user identification method is a characteristic modeling analysis aiming at the internal acquirability of an operator, and as malicious users have the characteristics of dispersity, latency, complexity and the like, single data can hardly meet the requirement of black product user identification. The existing mode only depends on the internal characteristics of an operator to perform modeling analysis, so that the problems of inaccurate identification or low identification rate are inevitable.

In view of the above technical problems, the present application provides a black product user identification method, a TEE (Trusted Execution Environment) node, and a computer readable storage medium, in which feature data of an operator side and feature data of a bank side are subjected to joint modeling analysis in a Trusted Execution Environment, and accuracy and coverage rate of a final model for identifying black product users can be improved by expanding training feature data, so that industry can be better assisted in identifying black product users, network Environment can be purified, property loss of enterprise customers can be avoided, and confidentiality and integrity of the operator side feature data and the bank side feature data can be ensured in a process of establishing a model.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Example 1:

the embodiment provides a black product user identification method, which is applied to any one TEE node in a trusted execution environment TEE cluster, and as shown in fig. 1, the method includes:

step S102: and acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data.

It should be noted that the trusted execution environment is a secure area within the CPU. It runs in a separate environment and in parallel with the operating system. The CPU ensures that the confidentiality and integrity of the code and data in the TEE are protected. By using both hardware and software to protect data and code, TEE is more secure than operating systems. Trusted applications running in the TEE can access the full functionality of the device main processor and memory, while hardware isolation protects these components from user-installed applications running in the main operating system. Code and data running in the TEE are confidential and non-tamperable. The method for identifying the black product user can be applied to any TEE node in a TEE cluster, the TEE cluster comprises an operator side TEE node, a bank side TEE node and other TEE nodes, and the method is preferably applied to the operator side TEE node or the bank side TEE node in the TEE cluster.

Optionally, the operator-side feature data includes a mobile phone number, the number of mobile phones owned by users corresponding to the mobile phone number, a first online frequency, a first average online time length and an IP provincial crossing frequency; the bank side characteristic data comprises a mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;

the obtaining of the sample data set specifically comprises:

wherein, the operator side characteristic data and the bank side characteristic data after aligning specifically include: the mobile phone number, the number of the mobile phone numbers corresponding to the user, the first online times, the first average online time, the IP province crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.

In this embodiment, the operator side feature data is obtained by the operator side TEE node, the bank side feature data is obtained by the bank side TEE node, and both the operator side feature data and the bank side feature data after alignment can be obtained by the sample alignment mode based on the RSA algorithm.

Specifically, the operator deploys the TEE node locally, the operator-side TEE node stores original operator-side data, that is, user fixed network data, the operator-side TEE node obtains all fixed network data in one day from the operator-side TEE node, the fixed network data include the fixed network data of black products users and the fixed network data of normal users, the fixed network data can include mobile phone numbers, user identifications, IP addresses, online time, offline time, online duration, provinces and the like, the operator-side TEE node preprocesses some of the indexes, converts the indexes into discrete characteristics, and obtains corresponding operator-side characteristic data:

(1) the number of mobile phone numbers (T _ PhoneCount) owned by the user: the user identification can be the identification card number after fuzzification processing, the same identification card number can handle 10 mobile phone numbers at most nationwide, and the number of the mobile phone numbers owned by the user corresponding to each mobile phone number can be obtained according to the user identification in all fixed network data;

(2) first number of line entries (T _ daysnonle): the number of times that the user surfs the internet through the corresponding mobile phone number is obtained, and the number of times of surfing the internet of each mobile phone number in one day is counted according to all fixed network data, so that the first frequency of surfing the internet corresponding to each mobile phone number can be obtained.

(3) First average online time (T _ TimeAvg): specifically, the first average online time corresponding to each mobile phone number can be calculated according to the following formula, where the unit is minute:

wherein, the Time _ online_iThe ith on-line Time of the mobile phone number, Time _ offline_iThe time is the ith off-line time of the mobile phone number, and T _ DaysOnline is the first on-line times corresponding to the mobile phone number;

(4) IP stride count (T _ CrossCount): the IP province crossing times corresponding to each mobile phone number can be obtained according to the IP addresses and the provinces in all the fixed network data, namely, the provinces where the IP addresses of each mobile phone number appear in one day are counted, generally, the fixed network user can change the IP once when going on and off the line (switching a router once), the changed IP provinces are not fixed, and the fixed network user can be the local province or the external province. The number of times of getting on and off the line (switching router) of a normal user in one day is few, but the mobile phone number of a black product user generally changes the IP continuously (some apps limit the number of times of IP logging) to perform black production to a greater extent, such as activities like weeding wool, and therefore the number of times of skipping over the IP is taken as a characteristic.

The finally obtained characteristic data of the operator side comprises a mobile phone number T _ PhoneNumber of the operator side, the number T _ PhoneCount of the mobile phone number corresponding to the mobile phone number owned by the user, a first online time T _ DaysOnline, a first average online time T _ TimeAvg and an IP provincial crossing time T _ Cross count.

Specifically, a bank deploys a TEE node locally, the TEE node at the bank side stores original bank side data, the original bank side data uses data of a user in a bank APP, namely bank APP data, and the data specifically comprises a mobile phone number, a user identifier, a registration date, online time, offline time, account balance and overdue times of a credit card; the bank side TEE node preprocesses some indexes and converts the indexes into discrete characteristics to obtain corresponding bank side characteristic data:

1) second number of line entries (B _ daysnoline): the number of times that the user logs in the bank APP through the corresponding mobile phone number is determined, and the online number of each mobile phone number in one day is counted according to the bank APP data, so that the corresponding second online number can be obtained.

2) Second average online time (B _ TimeAvg): that is, the average online time of the corresponding mobile phone number in the bank APP, specifically, the second average online time corresponding to each mobile phone number may be calculated according to the following formula, where the unit is minute:

wherein, the Time _ online_iThe ith on-line Time of the mobile phone number, Time _ offline_iAnd B _ DaysOnline is the second online time corresponding to the mobile phone number.

3) Registration days (B _ RegisterDays): the unit is day, and the calculation mode is as follows:

RegisterDays＝DateToday-RegisterDate

wherein datedate is the current date and RegisterDate is the registration date.

The finally obtained bank side characteristic data comprises a mobile phone number B _ PhoneNumber at the bank side, a second online time B _ DaysONline, a second average online time B _ TimeAvg, registration days B _ RegisterDays, an account balance B _ AcountBalance and credit card overdue times B _ ODTimes, wherein the account balance B _ AcountBalance is the balance of the mobile phone number on a bank APP or a corresponding bank card, and the credit card overdue times B _ ODTimes are counted overdue times of the mobile phone number corresponding to the credit card (if any).

Specifically, because the feature data of the operator side and the feature data of the bank side are not completely unified, the sample alignment is performed by using the mobile phone number as a standard, and specifically, a sample alignment mode based on an RSA algorithm can be adopted, so that the common user data of the operator side and the bank side can be confirmed on the premise that the data of the operator side and the data of the bank side are not disclosed.

Suppose that the operator side is A and the characteristic data of the operator side is X_A＝{a₁,a₂,……，a_mAnd suppose the bank side is B and the characteristic data of the bank side is X_B＝{b₁,b₂,……，b_nThe sample alignment method based on the RSA algorithm may include the following steps:

(a) b generates a public key pair (n, e) and a private key pair (n, d) through an RSA algorithm, wherein the public key pair (n, e) is sent to A.

(b) A pair of its characteristic data X_AEncrypting X_AGenerates a corresponding random number r for each element a. At this time, the random number r is encrypted by the public key pair (n, e) to obtain r^eSubstituting a into hash function (hash of H (x): x) to obtain H (a), and recording the multiplication result as Y_A，Y_A＝r^eH(a)。A is to encrypt the characteristic data Y_ASent to B while retaining Y in A_AAnd X_AA relational mapping table for each element a in the set.

(c) B receives Y_ALater, since the encryption uses a hash function and a random number r added in the process, it is difficult to get from Y_ABack push out of X_A. B, the characteristic data Y after the A is encrypted through a private key pair (n, d)_AThe operation was carried out and the result obtained was noted as Z_A＝(Y_A)^d＝(r^eH(a))^d＝r*(H(a))^dThe deduction of Fermat's theorem and Euler's theorem is used in this process: r is^ed≡r(mod n)。

At the same time, B is to own characteristic data X_BCarrying out encryption to obtain Y_B，Y_B＝H(H(b)^d) Also retain Y_BAnd X_BA relational mapping table for each element b in the set.

After the above steps are completed, B is to Z_AAnd Y_BSent to a together.

(d) A receives Y_BLater, the same principle can not push out X_B. A pairs of received Z_AThe operation was carried out and the results obtained are noted

At this point, we can find D_AAnd Y_BThe operations performed on the data are the same, namely D power is taken first and then the hash operation is performed, so that D_AAnd Y_BIn (D)_A＝Y_BThe corresponding element a of the data of (a) is the sample alignment result of a, i.e. the common data of a and B. The operator side A corresponds the result to Y_BAnd sending the data to a bank side B, and picking out the same part after the B receives the data, wherein an element B corresponding to the same part is the alignment result of the B. As shown in fig. 2, finally, the aligned operator-side feature data and bank-side feature data specifically include: the mobile phone number PhoneNumber, the number of the mobile phone numbers T _ PhoneCount corresponding to the mobile phone number owned by the user, the first online time T _ DaysOnline, the first average online time T _ TimeAvg and the IP provincial crossing timeT _ Cross count, a second online time B _ DaysOnline, a second average online time B _ TimeAvg, a registration day B _ RegisterDays, an account balance B _ AcountBalane and a credit card expiration time B _ ODTimes.

Step S104: and establishing a decision tree model based on the sample data set.

In the present embodiment, the sample data set D { (x)₁,y₁),(x₂,y₂),…,(x_n,y_n) In which x_i(PhoneNumber, T _ PhoneCount, …, B _ ODTimes), i.e. the 10 features mentioned above, y_iThe method comprises the following steps of determining a decision tree model based on a sample data set D, wherein the element belongs to {0,1}, wherein 0 is a normal user, and 1 is a black user, and the decision tree model specifically comprises the following steps:

traversing all features in the sample data set, and calculating information gain of the traversed features according to the following formula:

g(D,A)＝H(D)-H(D|A)

wherein the content of the first and second substances,

is the empirical entropy of the sample data set D, K is equivalent to y_iA general representation of each value in (i.e., {0,1}, | D | represents the sample size of the sample data set D, i.e., the number of samples in the data set D, | C_kI represents the number of the partial data set samples with the category k;

is the empirical condition entropy of the traversed feature A to the sample data set D, calculates H (D | A) of each feature A for each feature A, and divides D into n subsets D according to the feature A₁，D₂，…，D_n，|D_iIs the subset D_iI.e. the number of samples.

And (4) dividing by using the characteristic with the maximum information gain, and repeating the process until all samples in the sample data set are divided or the maximum training times is reached to obtain a decision tree model.

And after the final model is obtained, the TEE node outputs the model, and both the data of the operation process and the data of the encryption and decryption are destroyed immediately in a safe region of the TEE node, so that the safety of the data in the whole process is ensured.

Step S106: and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.

In this embodiment, the TEE node may deploy the established decision tree model to the server, and provide a corresponding interface for the user to call, and specifically, the user may input user data to be identified (including a mobile phone number, the number of mobile phone numbers owned by the user corresponding to the mobile phone number, a first online time, a first average online time, an IP provincial number, a second online time, a second average online time, registration days, an account balance, and a credit card expiration number) into the decision tree model, so as to obtain a corresponding identification result.

According to the black product user identification method provided by the embodiment of the invention, a sample data set is obtained, wherein samples in the sample data set comprise black product user samples and normal user samples, each sample comprises aligned operator side characteristic data and bank side characteristic data, a decision tree model is established based on the sample data set, user data to be identified is received, and the user data to be identified is input into the decision tree model, so that the identification result of the user data to be identified can be obtained. Because the method is based on the trusted execution environment, the characteristic data of the operator side and the characteristic data of the bank side can be subjected to combined modeling analysis while the confidentiality and the integrity of the data are ensured, the training characteristic data are expanded, the accuracy and the coverage rate of the black product user identification are improved, and the problems that the existing black product user identification method only depends on the internal characteristics of the operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low are solved.

Example 2:

as shown in fig. 3, the present embodiment provides a TEE node, including:

the data set acquisition module 12 is configured to acquire a sample data set, where samples in the sample data set include black product user samples and normal user samples, and each sample includes aligned operator-side feature data and bank-side feature data;

the model establishing module 14 is connected with the data set acquiring module 12 and is used for establishing a decision tree model based on the sample data set;

and the identification module 16 is connected with the model establishing module 14 and is used for receiving the user data to be identified and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.

the data set obtaining module 12 is specifically configured to align the operator-side feature data and the bank-side feature data according to the mobile phone number by using a sample alignment mode based on an RSA algorithm to obtain a sample data set;

Optionally, the TEE cluster at least includes an operator-side TEE node and a bank-side TEE node, and any one of the TEE nodes is the operator-side TEE node or the bank-side TEE node.

Optionally, any one TEE node is an operator-side TEE node, and the operator-side TEE node may further include:

the fixed network data acquisition module is used for acquiring all fixed network data in one day, wherein the fixed network data comprises a mobile phone number, a user identifier, an IP address, an online time, an offline time, an online time and a province;

the mobile phone number acquisition module is used for acquiring the number of the mobile phone numbers owned by the user corresponding to each mobile phone number according to the user identification in all the fixed network data;

the first online time acquisition module is used for counting the online times of each mobile phone number in one day according to all the fixed network data to obtain corresponding first online times;

the first average online time length obtaining module is used for calculating the first average online time length corresponding to each mobile phone number according to the following formula:

and the IP provincial crossing times acquisition module is used for acquiring the IP provincial crossing times corresponding to each mobile phone number according to the IP addresses in all the fixed network data and the provinces where the fixed network data is located.

Optionally, any one TEE node is a bank-side TEE node, and the bank-side TEE node may further include:

the bank APP data acquisition module is used for acquiring bank APP data in one day, wherein the bank APP data comprises a mobile phone number, a user identifier, a registration date, an online time, an offline time, an account balance and the overdue times of a credit card;

the second online time obtaining module is used for counting the online times of each mobile phone number in one day according to the bank APP data to obtain corresponding second online times;

the second average online time obtaining module is used for calculating a second average online time corresponding to each mobile phone number according to the following formula:

wherein, the Time _ online_iFor the ith time of the mobile phone numberTime, Time _ offset_iThe time is the ith off-line time of the mobile phone number, and B _ DaysOnline is the second on-line times corresponding to the mobile phone number;

the registration day acquisition module is used for calculating the registration days corresponding to each mobile phone number according to the following formula:

RegisterDays＝DateToday-RegisterDate

wherein datedate is the current date and RegisterDate is the registration date.

Optionally, the model building module 14 is specifically configured to:

g(D,A)＝H(D)-H(D|A)

wherein the content of the first and second substances,

Example 3:

referring to fig. 4, the present embodiment provides a TEE node, including a memory 22 and a processor 24, where the memory 22 stores a computer program, and the processor 24 is configured to run the computer program to execute the black user identification method in embodiment 1.

The memory 22 is connected to the processor 24, the memory 22 may be a flash memory, a read-only memory or other memories, and the processor 24 may be a central processing unit or a single chip microcomputer.

Example 4:

the present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the black product user identification method in embodiment 1 described above.

The computer-readable storage media include volatile or nonvolatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. Computer-readable storage media include, but are not limited to, RAM (Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other Memory technology, CD-ROM (Compact disk Read-Only Memory), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.

Embodiments 2 to 4 provide the TEE node and the computer-readable storage medium, wherein a sample data set is obtained, where samples in the sample data set include black user samples and normal user samples, each sample includes aligned operator-side feature data and bank-side feature data, a decision tree model is established based on the sample data set, user data to be identified is received, and the user data to be identified is input into the decision tree model, so that an identification result of the user data to be identified can be obtained. Because the method is based on the trusted execution environment, the characteristic data of the operator side and the characteristic data of the bank side can be subjected to combined modeling analysis while the confidentiality and the integrity of the data are ensured, the training characteristic data are expanded, the accuracy and the coverage rate of the black product user identification are improved, and the problems that the existing black product user identification method only depends on the internal characteristics of the operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low are solved.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A method for identifying a black product user is applied to any TEE node in a TEE cluster of a trusted execution environment, and comprises the following steps:

establishing a decision tree model based on the sample data set;

2. The black product user identification method according to claim 1, wherein the operator-side feature data includes a mobile phone number, the number of mobile phones owned by the user corresponding to the mobile phone number, a first online number, a first average online duration, and an IP provincial crossing number; the bank side characteristic data comprises the mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;

the acquiring of the sample data set specifically includes:

3. The black product user identification method of claim 2, wherein the TEE cluster at least comprises an operator-side TEE node and a bank-side TEE node, and any one of the TEE nodes is the operator-side TEE node or the bank-side TEE node.

4. The black product user identification method according to claim 3, wherein the any TEE node is an operator-side TEE node;

5. The black product user identification method according to claim 3, wherein any one TEE node is a bank side TEE node;

RegisterDays＝DateToday-RegisterDate

6. The black product user identification method according to claim 2, wherein the establishing a decision tree model based on the sample data set specifically comprises:

g(D,A)＝H(D)-H(D|A)

wherein the content of the first and second substances,

7. A TEE node, comprising:

8. The TEE node of claim 7, wherein the operator-side characteristic data includes a phone number, a number of phones owned by a user corresponding to the phone number, a first number of online times, a first average online duration, and a number of IP provinces across provinces; the bank side characteristic data comprises the mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;

the data set acquisition module is specifically used for aligning the operator side characteristic data and the bank side characteristic data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm to obtain a sample data set;

9. A TEE node comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being configured to run the computer program to implement the black user identification method of any of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the black user identification method according to any one of claims 1 to 6.