CN113837303A - Black product user identification method, TEE node and computer readable storage medium - Google Patents

Black product user identification method, TEE node and computer readable storage medium Download PDF

Info

Publication number
CN113837303A
CN113837303A CN202111153184.1A CN202111153184A CN113837303A CN 113837303 A CN113837303 A CN 113837303A CN 202111153184 A CN202111153184 A CN 202111153184A CN 113837303 A CN113837303 A CN 113837303A
Authority
CN
China
Prior art keywords
mobile phone
phone number
data
online
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111153184.1A
Other languages
Chinese (zh)
Inventor
史金雨
徐雷
陶冶
高泽恺
张立彤
边林
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202111153184.1A priority Critical patent/CN113837303A/en
Publication of CN113837303A publication Critical patent/CN113837303A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products

Abstract

The invention provides a black product user identification method, a TEE node and a computer readable storage medium, wherein the method comprises the following steps: acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data; establishing a decision tree model based on the sample data set; and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified. The method, the TEE node and the computer readable storage medium can solve the problems that the existing black product user identification method only depends on the internal features of an operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low.

Description

Black product user identification method, TEE node and computer readable storage medium
Technical Field
The invention relates to the technical field of network security, in particular to a black product user identification method, a TEE node and a computer readable storage medium.
Background
In recent years, various types of blackouts for operator services are increasingly performed, which seriously affect the image of an operating brand and cause a great amount of economic loss for operators and users. Because of more types (such as wool, fraud phone, etc.), the current common mode is to perform modeling analysis on features available in an operator, and this mode only depends on features in the operator, so that the recognition result is inevitable, and the problems of inaccurate recognition or low recognition rate exist.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a black product user identification method, a TEE node, and a computer readable storage medium, aiming at the above defects in the prior art, so as to solve the problems that the existing black product user identification method only relies on the features inside the operator to perform modeling analysis, the identification result is inevitable, and the identification is inaccurate or the identification rate is not high.
In a first aspect, the present invention provides a black product user identification method, which is applied to any one TEE node in a TEE cluster in a trusted execution environment, and the method includes:
acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data;
establishing a decision tree model based on the sample data set;
and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.
Preferably, the characteristic data of the operator side comprises a mobile phone number, the number of mobile phone numbers owned by users corresponding to the mobile phone number, a first online frequency, a first average online time and an IP provincial crossing frequency; the bank side characteristic data comprises the mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;
the acquiring of the sample data set specifically includes:
aligning the characteristic data of the operator side and the characteristic data of the bank side according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm to obtain a sample data set;
wherein, the operator side characteristic data and the bank side characteristic data after alignment specifically include: the mobile phone number, the number of mobile phone numbers owned by a user corresponding to the mobile phone number, the first online times, the first average online time, the IP provincial crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.
Preferably, the TEE cluster at least comprises an operator-side TEE node and a bank-side TEE node, and any one TEE node is the operator-side TEE node or the bank-side TEE node.
Preferably, any one TEE node is an operator-side TEE node;
before aligning the operator side feature data and the bank side feature data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm, the method further comprises the following steps:
acquiring all fixed network data in one day, wherein the fixed network data comprises a mobile phone number, a user identification, an IP address, an online time, an offline time, an online time and a province;
acquiring the number of mobile phone numbers owned by a user corresponding to each mobile phone number according to the user identification in all the fixed network data;
counting the online times of each mobile phone number in the day according to all the fixed network data to obtain the corresponding first online times;
calculating the first average online time corresponding to each mobile phone number according to the following formula:
Figure BDA0003287836610000031
wherein, the Time _ onlineiThe ith Time of on-line Time of the mobile phone number, Time _ offlineiThe ith off-line time of the mobile phone number is T _ DaysOnline, and the T _ DaysOnline is the first on-line times corresponding to the mobile phone number;
and acquiring the IP provincial crossing times corresponding to each mobile phone number according to the IP addresses and the provinces in which the IP addresses are located in all the fixed network data.
Preferably, any one TEE node is a bank-side TEE node;
before aligning the operator side feature data and the bank side feature data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm, the method further comprises the following steps:
acquiring bank APP data in one day, wherein the bank APP data comprises a mobile phone number, a user identifier, a registration date, online time, offline time, account balance and overdue times of a credit card;
counting the online times of each mobile phone number in the day according to the bank APP data to obtain the corresponding second online times;
calculating the second average online time corresponding to each mobile phone number according to the following formula:
Figure BDA0003287836610000032
wherein, the Time _ onlineiThe ith Time of on-line Time of the mobile phone number, Time _ offlineiThe ith off-line time of the mobile phone number is B _ DaysOnline, and the B _ DaysOnline is the second on-line times corresponding to the mobile phone number;
calculating the registration days corresponding to each mobile phone number according to the following formula:
RegisterDays=DateToday-RegisterDate
wherein datedate is the current date and register date is the registration date.
Preferably, the establishing a decision tree model based on the sample data set specifically includes:
traversing all the features in the sample data set, and calculating the information gain of the traversed features according to the following formula:
g(D,A)=H(D)-H(D|A)
wherein the content of the first and second substances,
Figure BDA0003287836610000041
is the empirical entropy of the sample data set D, | D | represents the number of samples in the sample data set D, | CkI represents the number of the partial data set samples with the category k;
Figure BDA0003287836610000042
is the empirical condition entropy of the traversed feature A on the sample data set D, and D can be divided into n subsets D according to the feature A1,D2,…,Dn,|DiIs the subset DiThe number of samples of (a);
and (4) dividing by using the characteristic with the maximum information gain, and repeating the process until all samples in the sample data set are divided or the maximum training times is reached to obtain the decision tree model.
In a second aspect, the present invention provides a TEE node, comprising:
the data set acquisition module is used for acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data;
the model establishing module is connected with the data set acquisition module and used for establishing a decision tree model based on the sample data set;
and the identification module is connected with the model establishing module and used for receiving the user data to be identified and inputting the user data to be identified into the decision tree model to obtain the identification result of the user data to be identified.
In a third aspect, the present invention provides a TEE node, including a memory and a processor, where the memory stores a computer program, and the processor is configured to run the computer program to implement the black user identification method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the black product user identification method according to the first aspect.
According to the black product user identification method, the TEE node and the computer readable storage medium, the sample data set is obtained, wherein the samples in the sample data set comprise black product user samples and normal user samples, each sample comprises the aligned operator side characteristic data and bank side characteristic data, a decision tree model is built based on the sample data set, the user data to be identified is received, the user data to be identified is input into the decision tree model, and the identification result of the user data to be identified can be obtained. Because the method is based on the trusted execution environment, the characteristic data of the operator side and the characteristic data of the bank side can be subjected to combined modeling analysis while the confidentiality and the integrity of the data are ensured, the training characteristic data are expanded, the accuracy and the coverage rate of the black product user identification are improved, and the problems that the existing black product user identification method only depends on the internal characteristics of the operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low are solved.
Drawings
FIG. 1: a flow chart of a black product user identification method in embodiment 1 of the present invention;
FIG. 2: the alignment diagram of the operator side characteristic data and the bank side characteristic data is shown in the embodiment of the invention;
FIG. 3: is a schematic structural diagram of a TEE node in embodiment 2 of the present invention;
FIG. 4: is a schematic structural diagram of a TEE node in embodiment 3 of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description will be made with reference to the accompanying drawings.
It is to be understood that the specific embodiments and figures described herein are merely illustrative of the invention and are not limiting of the invention.
It is to be understood that the embodiments and features of the embodiments can be combined with each other without conflict.
It is to be understood that, for the convenience of description, only parts related to the present invention are shown in the drawings of the present invention, and parts not related to the present invention are not shown in the drawings.
It should be understood that each unit and module related in the embodiments of the present invention may correspond to only one physical structure, may also be composed of multiple physical structures, or multiple units and modules may also be integrated into one physical structure.
It will be understood that, without conflict, the functions, steps, etc. noted in the flowchart and block diagrams of the present invention may occur in an order different from that noted in the figures.
It is to be understood that the flowchart and block diagrams of the present invention illustrate the architecture, functionality, and operation of possible implementations of systems, apparatus, devices and methods according to various embodiments of the present invention. Each block in the flowchart or block diagrams may represent a unit, module, segment, code, which comprises executable instructions for implementing the specified function(s). Furthermore, each block or combination of blocks in the block diagrams and flowchart illustrations can be implemented by a hardware-based system that performs the specified functions or by a combination of hardware and computer instructions.
It is to be understood that the units and modules involved in the embodiments of the present invention may be implemented by software, and may also be implemented by hardware, for example, the units and modules may be located in a processor.
Summary of the application
At present, a relatively common black product user identification method is a characteristic modeling analysis aiming at the internal acquirability of an operator, and as malicious users have the characteristics of dispersity, latency, complexity and the like, single data can hardly meet the requirement of black product user identification. The existing mode only depends on the internal characteristics of an operator to perform modeling analysis, so that the problems of inaccurate identification or low identification rate are inevitable.
In view of the above technical problems, the present application provides a black product user identification method, a TEE (Trusted Execution Environment) node, and a computer readable storage medium, in which feature data of an operator side and feature data of a bank side are subjected to joint modeling analysis in a Trusted Execution Environment, and accuracy and coverage rate of a final model for identifying black product users can be improved by expanding training feature data, so that industry can be better assisted in identifying black product users, network Environment can be purified, property loss of enterprise customers can be avoided, and confidentiality and integrity of the operator side feature data and the bank side feature data can be ensured in a process of establishing a model.
Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.
Example 1:
the embodiment provides a black product user identification method, which is applied to any one TEE node in a trusted execution environment TEE cluster, and as shown in fig. 1, the method includes:
step S102: and acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data.
It should be noted that the trusted execution environment is a secure area within the CPU. It runs in a separate environment and in parallel with the operating system. The CPU ensures that the confidentiality and integrity of the code and data in the TEE are protected. By using both hardware and software to protect data and code, TEE is more secure than operating systems. Trusted applications running in the TEE can access the full functionality of the device main processor and memory, while hardware isolation protects these components from user-installed applications running in the main operating system. Code and data running in the TEE are confidential and non-tamperable. The method for identifying the black product user can be applied to any TEE node in a TEE cluster, the TEE cluster comprises an operator side TEE node, a bank side TEE node and other TEE nodes, and the method is preferably applied to the operator side TEE node or the bank side TEE node in the TEE cluster.
Optionally, the operator-side feature data includes a mobile phone number, the number of mobile phones owned by users corresponding to the mobile phone number, a first online frequency, a first average online time length and an IP provincial crossing frequency; the bank side characteristic data comprises a mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;
the obtaining of the sample data set specifically comprises:
aligning the characteristic data of the operator side and the characteristic data of the bank side according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm to obtain a sample data set;
wherein, the operator side characteristic data and the bank side characteristic data after aligning specifically include: the mobile phone number, the number of the mobile phone numbers corresponding to the user, the first online times, the first average online time, the IP province crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.
In this embodiment, the operator side feature data is obtained by the operator side TEE node, the bank side feature data is obtained by the bank side TEE node, and both the operator side feature data and the bank side feature data after alignment can be obtained by the sample alignment mode based on the RSA algorithm.
Specifically, the operator deploys the TEE node locally, the operator-side TEE node stores original operator-side data, that is, user fixed network data, the operator-side TEE node obtains all fixed network data in one day from the operator-side TEE node, the fixed network data include the fixed network data of black products users and the fixed network data of normal users, the fixed network data can include mobile phone numbers, user identifications, IP addresses, online time, offline time, online duration, provinces and the like, the operator-side TEE node preprocesses some of the indexes, converts the indexes into discrete characteristics, and obtains corresponding operator-side characteristic data:
(1) the number of mobile phone numbers (T _ PhoneCount) owned by the user: the user identification can be the identification card number after fuzzification processing, the same identification card number can handle 10 mobile phone numbers at most nationwide, and the number of the mobile phone numbers owned by the user corresponding to each mobile phone number can be obtained according to the user identification in all fixed network data;
(2) first number of line entries (T _ daysnonle): the number of times that the user surfs the internet through the corresponding mobile phone number is obtained, and the number of times of surfing the internet of each mobile phone number in one day is counted according to all fixed network data, so that the first frequency of surfing the internet corresponding to each mobile phone number can be obtained.
(3) First average online time (T _ TimeAvg): specifically, the first average online time corresponding to each mobile phone number can be calculated according to the following formula, where the unit is minute:
Figure BDA0003287836610000081
wherein, the Time _ onlineiThe ith on-line Time of the mobile phone number, Time _ offlineiThe time is the ith off-line time of the mobile phone number, and T _ DaysOnline is the first on-line times corresponding to the mobile phone number;
(4) IP stride count (T _ CrossCount): the IP province crossing times corresponding to each mobile phone number can be obtained according to the IP addresses and the provinces in all the fixed network data, namely, the provinces where the IP addresses of each mobile phone number appear in one day are counted, generally, the fixed network user can change the IP once when going on and off the line (switching a router once), the changed IP provinces are not fixed, and the fixed network user can be the local province or the external province. The number of times of getting on and off the line (switching router) of a normal user in one day is few, but the mobile phone number of a black product user generally changes the IP continuously (some apps limit the number of times of IP logging) to perform black production to a greater extent, such as activities like weeding wool, and therefore the number of times of skipping over the IP is taken as a characteristic.
The finally obtained characteristic data of the operator side comprises a mobile phone number T _ PhoneNumber of the operator side, the number T _ PhoneCount of the mobile phone number corresponding to the mobile phone number owned by the user, a first online time T _ DaysOnline, a first average online time T _ TimeAvg and an IP provincial crossing time T _ Cross count.
Specifically, a bank deploys a TEE node locally, the TEE node at the bank side stores original bank side data, the original bank side data uses data of a user in a bank APP, namely bank APP data, and the data specifically comprises a mobile phone number, a user identifier, a registration date, online time, offline time, account balance and overdue times of a credit card; the bank side TEE node preprocesses some indexes and converts the indexes into discrete characteristics to obtain corresponding bank side characteristic data:
1) second number of line entries (B _ daysnoline): the number of times that the user logs in the bank APP through the corresponding mobile phone number is determined, and the online number of each mobile phone number in one day is counted according to the bank APP data, so that the corresponding second online number can be obtained.
2) Second average online time (B _ TimeAvg): that is, the average online time of the corresponding mobile phone number in the bank APP, specifically, the second average online time corresponding to each mobile phone number may be calculated according to the following formula, where the unit is minute:
Figure BDA0003287836610000091
wherein, the Time _ onlineiThe ith on-line Time of the mobile phone number, Time _ offlineiAnd B _ DaysOnline is the second online time corresponding to the mobile phone number.
3) Registration days (B _ RegisterDays): the unit is day, and the calculation mode is as follows:
RegisterDays=DateToday-RegisterDate
wherein datedate is the current date and RegisterDate is the registration date.
The finally obtained bank side characteristic data comprises a mobile phone number B _ PhoneNumber at the bank side, a second online time B _ DaysONline, a second average online time B _ TimeAvg, registration days B _ RegisterDays, an account balance B _ AcountBalance and credit card overdue times B _ ODTimes, wherein the account balance B _ AcountBalance is the balance of the mobile phone number on a bank APP or a corresponding bank card, and the credit card overdue times B _ ODTimes are counted overdue times of the mobile phone number corresponding to the credit card (if any).
Specifically, because the feature data of the operator side and the feature data of the bank side are not completely unified, the sample alignment is performed by using the mobile phone number as a standard, and specifically, a sample alignment mode based on an RSA algorithm can be adopted, so that the common user data of the operator side and the bank side can be confirmed on the premise that the data of the operator side and the data of the bank side are not disclosed.
Suppose that the operator side is A and the characteristic data of the operator side is XA={a1,a2,……,amAnd suppose the bank side is B and the characteristic data of the bank side is XB={b1,b2,……,bnThe sample alignment method based on the RSA algorithm may include the following steps:
(a) b generates a public key pair (n, e) and a private key pair (n, d) through an RSA algorithm, wherein the public key pair (n, e) is sent to A.
(b) A pair of its characteristic data XAEncrypting XAGenerates a corresponding random number r for each element a. At this time, the random number r is encrypted by the public key pair (n, e) to obtain reSubstituting a into hash function (hash of H (x): x) to obtain H (a), and recording the multiplication result as YA,YA=reH(a)。A is to encrypt the characteristic data YASent to B while retaining Y in AAAnd XAA relational mapping table for each element a in the set.
(c) B receives YALater, since the encryption uses a hash function and a random number r added in the process, it is difficult to get from YABack push out of XA. B, the characteristic data Y after the A is encrypted through a private key pair (n, d)AThe operation was carried out and the result obtained was noted as ZA=(YA)d=(reH(a))d=r*(H(a))dThe deduction of Fermat's theorem and Euler's theorem is used in this process: r ised≡r(mod n)。
At the same time, B is to own characteristic data XBCarrying out encryption to obtain YB,YB=H(H(b)d) Also retain YBAnd XBA relational mapping table for each element b in the set.
After the above steps are completed, B is to ZAAnd YBSent to a together.
(d) A receives YBLater, the same principle can not push out XB. A pairs of received ZAThe operation was carried out and the results obtained are noted
Figure BDA0003287836610000101
At this point, we can find DAAnd YBThe operations performed on the data are the same, namely D power is taken first and then the hash operation is performed, so that DAAnd YBIn (D)A=YBThe corresponding element a of the data of (a) is the sample alignment result of a, i.e. the common data of a and B. The operator side A corresponds the result to YBAnd sending the data to a bank side B, and picking out the same part after the B receives the data, wherein an element B corresponding to the same part is the alignment result of the B. As shown in fig. 2, finally, the aligned operator-side feature data and bank-side feature data specifically include: the mobile phone number PhoneNumber, the number of the mobile phone numbers T _ PhoneCount corresponding to the mobile phone number owned by the user, the first online time T _ DaysOnline, the first average online time T _ TimeAvg and the IP provincial crossing timeT _ Cross count, a second online time B _ DaysOnline, a second average online time B _ TimeAvg, a registration day B _ RegisterDays, an account balance B _ AcountBalane and a credit card expiration time B _ ODTimes.
Step S104: and establishing a decision tree model based on the sample data set.
In the present embodiment, the sample data set D { (x)1,y1),(x2,y2),…,(xn,yn) In which xi(PhoneNumber, T _ PhoneCount, …, B _ ODTimes), i.e. the 10 features mentioned above, yiThe method comprises the following steps of determining a decision tree model based on a sample data set D, wherein the element belongs to {0,1}, wherein 0 is a normal user, and 1 is a black user, and the decision tree model specifically comprises the following steps:
traversing all features in the sample data set, and calculating information gain of the traversed features according to the following formula:
g(D,A)=H(D)-H(D|A)
wherein the content of the first and second substances,
Figure BDA0003287836610000111
is the empirical entropy of the sample data set D, K is equivalent to yiA general representation of each value in (i.e., {0,1}, | D | represents the sample size of the sample data set D, i.e., the number of samples in the data set D, | CkI represents the number of the partial data set samples with the category k;
Figure BDA0003287836610000112
is the empirical condition entropy of the traversed feature A to the sample data set D, calculates H (D | A) of each feature A for each feature A, and divides D into n subsets D according to the feature A1,D2,…,Dn,|DiIs the subset DiI.e. the number of samples.
And (4) dividing by using the characteristic with the maximum information gain, and repeating the process until all samples in the sample data set are divided or the maximum training times is reached to obtain a decision tree model.
And after the final model is obtained, the TEE node outputs the model, and both the data of the operation process and the data of the encryption and decryption are destroyed immediately in a safe region of the TEE node, so that the safety of the data in the whole process is ensured.
Step S106: and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.
In this embodiment, the TEE node may deploy the established decision tree model to the server, and provide a corresponding interface for the user to call, and specifically, the user may input user data to be identified (including a mobile phone number, the number of mobile phone numbers owned by the user corresponding to the mobile phone number, a first online time, a first average online time, an IP provincial number, a second online time, a second average online time, registration days, an account balance, and a credit card expiration number) into the decision tree model, so as to obtain a corresponding identification result.
According to the black product user identification method provided by the embodiment of the invention, a sample data set is obtained, wherein samples in the sample data set comprise black product user samples and normal user samples, each sample comprises aligned operator side characteristic data and bank side characteristic data, a decision tree model is established based on the sample data set, user data to be identified is received, and the user data to be identified is input into the decision tree model, so that the identification result of the user data to be identified can be obtained. Because the method is based on the trusted execution environment, the characteristic data of the operator side and the characteristic data of the bank side can be subjected to combined modeling analysis while the confidentiality and the integrity of the data are ensured, the training characteristic data are expanded, the accuracy and the coverage rate of the black product user identification are improved, and the problems that the existing black product user identification method only depends on the internal characteristics of the operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low are solved.
Example 2:
as shown in fig. 3, the present embodiment provides a TEE node, including:
the data set acquisition module 12 is configured to acquire a sample data set, where samples in the sample data set include black product user samples and normal user samples, and each sample includes aligned operator-side feature data and bank-side feature data;
the model establishing module 14 is connected with the data set acquiring module 12 and is used for establishing a decision tree model based on the sample data set;
and the identification module 16 is connected with the model establishing module 14 and is used for receiving the user data to be identified and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.
Optionally, the operator-side feature data includes a mobile phone number, the number of mobile phones owned by users corresponding to the mobile phone number, a first online frequency, a first average online time length and an IP provincial crossing frequency; the bank side characteristic data comprises a mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;
the data set obtaining module 12 is specifically configured to align the operator-side feature data and the bank-side feature data according to the mobile phone number by using a sample alignment mode based on an RSA algorithm to obtain a sample data set;
wherein, the operator side characteristic data and the bank side characteristic data after aligning specifically include: the mobile phone number, the number of the mobile phone numbers corresponding to the user, the first online times, the first average online time, the IP province crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.
Optionally, the TEE cluster at least includes an operator-side TEE node and a bank-side TEE node, and any one of the TEE nodes is the operator-side TEE node or the bank-side TEE node.
Optionally, any one TEE node is an operator-side TEE node, and the operator-side TEE node may further include:
the fixed network data acquisition module is used for acquiring all fixed network data in one day, wherein the fixed network data comprises a mobile phone number, a user identifier, an IP address, an online time, an offline time, an online time and a province;
the mobile phone number acquisition module is used for acquiring the number of the mobile phone numbers owned by the user corresponding to each mobile phone number according to the user identification in all the fixed network data;
the first online time acquisition module is used for counting the online times of each mobile phone number in one day according to all the fixed network data to obtain corresponding first online times;
the first average online time length obtaining module is used for calculating the first average online time length corresponding to each mobile phone number according to the following formula:
Figure BDA0003287836610000131
wherein, the Time _ onlineiThe ith on-line Time of the mobile phone number, Time _ offlineiThe time is the ith off-line time of the mobile phone number, and T _ DaysOnline is the first on-line times corresponding to the mobile phone number;
and the IP provincial crossing times acquisition module is used for acquiring the IP provincial crossing times corresponding to each mobile phone number according to the IP addresses in all the fixed network data and the provinces where the fixed network data is located.
Optionally, any one TEE node is a bank-side TEE node, and the bank-side TEE node may further include:
the bank APP data acquisition module is used for acquiring bank APP data in one day, wherein the bank APP data comprises a mobile phone number, a user identifier, a registration date, an online time, an offline time, an account balance and the overdue times of a credit card;
the second online time obtaining module is used for counting the online times of each mobile phone number in one day according to the bank APP data to obtain corresponding second online times;
the second average online time obtaining module is used for calculating a second average online time corresponding to each mobile phone number according to the following formula:
Figure BDA0003287836610000141
wherein, the Time _ onlineiFor the ith time of the mobile phone numberTime, Time _ offsetiThe time is the ith off-line time of the mobile phone number, and B _ DaysOnline is the second on-line times corresponding to the mobile phone number;
the registration day acquisition module is used for calculating the registration days corresponding to each mobile phone number according to the following formula:
RegisterDays=DateToday-RegisterDate
wherein datedate is the current date and RegisterDate is the registration date.
Optionally, the model building module 14 is specifically configured to:
traversing all features in the sample data set, and calculating information gain of the traversed features according to the following formula:
g(D,A)=H(D)-H(D|A)
wherein the content of the first and second substances,
Figure BDA0003287836610000142
is the empirical entropy of the sample data set D, | D | represents the number of samples in the sample data set D, | CkI represents the number of the partial data set samples with the category k;
Figure BDA0003287836610000143
is the empirical condition entropy of the traversed feature A on the sample data set D, and D can be divided into n subsets D according to the feature A1,D2,…,Dn,|DiIs the subset DiThe number of samples of (a);
and (4) dividing by using the characteristic with the maximum information gain, and repeating the process until all samples in the sample data set are divided or the maximum training times is reached to obtain a decision tree model.
Example 3:
referring to fig. 4, the present embodiment provides a TEE node, including a memory 22 and a processor 24, where the memory 22 stores a computer program, and the processor 24 is configured to run the computer program to execute the black user identification method in embodiment 1.
The memory 22 is connected to the processor 24, the memory 22 may be a flash memory, a read-only memory or other memories, and the processor 24 may be a central processing unit or a single chip microcomputer.
Example 4:
the present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the black product user identification method in embodiment 1 described above.
The computer-readable storage media include volatile or nonvolatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. Computer-readable storage media include, but are not limited to, RAM (Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other Memory technology, CD-ROM (Compact disk Read-Only Memory), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
Embodiments 2 to 4 provide the TEE node and the computer-readable storage medium, wherein a sample data set is obtained, where samples in the sample data set include black user samples and normal user samples, each sample includes aligned operator-side feature data and bank-side feature data, a decision tree model is established based on the sample data set, user data to be identified is received, and the user data to be identified is input into the decision tree model, so that an identification result of the user data to be identified can be obtained. Because the method is based on the trusted execution environment, the characteristic data of the operator side and the characteristic data of the bank side can be subjected to combined modeling analysis while the confidentiality and the integrity of the data are ensured, the training characteristic data are expanded, the accuracy and the coverage rate of the black product user identification are improved, and the problems that the existing black product user identification method only depends on the internal characteristics of the operator to perform modeling analysis, the identification result is difficult to avoid, and the identification is inaccurate or the identification rate is low are solved.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. A method for identifying a black product user is applied to any TEE node in a TEE cluster of a trusted execution environment, and comprises the following steps:
acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data;
establishing a decision tree model based on the sample data set;
and receiving user data to be identified, and inputting the user data to be identified into the decision tree model to obtain an identification result of the user data to be identified.
2. The black product user identification method according to claim 1, wherein the operator-side feature data includes a mobile phone number, the number of mobile phones owned by the user corresponding to the mobile phone number, a first online number, a first average online duration, and an IP provincial crossing number; the bank side characteristic data comprises the mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;
the acquiring of the sample data set specifically includes:
aligning the characteristic data of the operator side and the characteristic data of the bank side according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm to obtain a sample data set;
wherein, the operator side characteristic data and the bank side characteristic data after alignment specifically include: the mobile phone number, the number of mobile phone numbers owned by a user corresponding to the mobile phone number, the first online times, the first average online time, the IP provincial crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.
3. The black product user identification method of claim 2, wherein the TEE cluster at least comprises an operator-side TEE node and a bank-side TEE node, and any one of the TEE nodes is the operator-side TEE node or the bank-side TEE node.
4. The black product user identification method according to claim 3, wherein the any TEE node is an operator-side TEE node;
before aligning the operator side feature data and the bank side feature data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm, the method further comprises the following steps:
acquiring all fixed network data in one day, wherein the fixed network data comprises a mobile phone number, a user identification, an IP address, an online time, an offline time, an online time and a province;
acquiring the number of mobile phone numbers owned by a user corresponding to each mobile phone number according to the user identification in all the fixed network data;
counting the online times of each mobile phone number in the day according to all the fixed network data to obtain the corresponding first online times;
calculating the first average online time corresponding to each mobile phone number according to the following formula:
Figure FDA0003287836600000021
wherein, the Time _ onlineiThe ith Time of on-line Time of the mobile phone number, Time _ offlineiThe ith off-line time of the mobile phone number is T _ DaysOnline, and the T _ DaysOnline is the first on-line times corresponding to the mobile phone number;
and acquiring the IP provincial crossing times corresponding to each mobile phone number according to the IP addresses and the provinces in which the IP addresses are located in all the fixed network data.
5. The black product user identification method according to claim 3, wherein any one TEE node is a bank side TEE node;
before aligning the operator side feature data and the bank side feature data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm, the method further comprises the following steps:
acquiring bank APP data in one day, wherein the bank APP data comprises a mobile phone number, a user identifier, a registration date, online time, offline time, account balance and overdue times of a credit card;
counting the online times of each mobile phone number in the day according to the bank APP data to obtain the corresponding second online times;
calculating the second average online time corresponding to each mobile phone number according to the following formula:
Figure FDA0003287836600000031
wherein, the Time _ onlineiThe ith Time of on-line Time of the mobile phone number, Time _ offlineiThe ith off-line time of the mobile phone number is B _ DaysOnline, and the B _ DaysOnline is the second on-line times corresponding to the mobile phone number;
calculating the registration days corresponding to each mobile phone number according to the following formula:
RegisterDays=DateToday-RegisterDate
wherein datedate is the current date and register date is the registration date.
6. The black product user identification method according to claim 2, wherein the establishing a decision tree model based on the sample data set specifically comprises:
traversing all the features in the sample data set, and calculating the information gain of the traversed features according to the following formula:
g(D,A)=H(D)-H(D|A)
wherein the content of the first and second substances,
Figure FDA0003287836600000032
is the empirical entropy of the sample data set D, | D | represents the number of samples in the sample data set D, | CkI represents the number of the partial data set samples with the category k;
Figure FDA0003287836600000033
is the empirical condition entropy of the traversed feature A on the sample data set D, and D can be divided into n subsets D according to the feature A1,D2,…,Dn,|DiIs the subset DiThe number of samples of (a);
and (4) dividing by using the characteristic with the maximum information gain, and repeating the process until all samples in the sample data set are divided or the maximum training times is reached to obtain the decision tree model.
7. A TEE node, comprising:
the data set acquisition module is used for acquiring a sample data set, wherein samples in the sample data set comprise black product user samples and normal user samples, and each sample comprises aligned operator side characteristic data and aligned bank side characteristic data;
the model establishing module is connected with the data set acquisition module and used for establishing a decision tree model based on the sample data set;
and the identification module is connected with the model establishing module and used for receiving the user data to be identified and inputting the user data to be identified into the decision tree model to obtain the identification result of the user data to be identified.
8. The TEE node of claim 7, wherein the operator-side characteristic data includes a phone number, a number of phones owned by a user corresponding to the phone number, a first number of online times, a first average online duration, and a number of IP provinces across provinces; the bank side characteristic data comprises the mobile phone number, a second online frequency, a second average online time, registration days, account balance and credit card overdue frequency;
the data set acquisition module is specifically used for aligning the operator side characteristic data and the bank side characteristic data according to the mobile phone number by adopting a sample alignment mode based on an RSA algorithm to obtain a sample data set;
wherein, the operator side characteristic data and the bank side characteristic data after alignment specifically include: the mobile phone number, the number of mobile phone numbers owned by a user corresponding to the mobile phone number, the first online times, the first average online time, the IP provincial crossing times, the second online times, the second average online time, the registration days, the account balance and the overdue times of the credit card.
9. A TEE node comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being configured to run the computer program to implement the black user identification method of any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the black user identification method according to any one of claims 1 to 6.
CN202111153184.1A 2021-09-29 2021-09-29 Black product user identification method, TEE node and computer readable storage medium Pending CN113837303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111153184.1A CN113837303A (en) 2021-09-29 2021-09-29 Black product user identification method, TEE node and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111153184.1A CN113837303A (en) 2021-09-29 2021-09-29 Black product user identification method, TEE node and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113837303A true CN113837303A (en) 2021-12-24

Family

ID=78967512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111153184.1A Pending CN113837303A (en) 2021-09-29 2021-09-29 Black product user identification method, TEE node and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113837303A (en)

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811298A (en) * 2015-05-14 2015-07-29 中国联合网络通信集团有限公司 Method and device for realizing encryption
CN105868298A (en) * 2016-03-23 2016-08-17 华南理工大学 Mobile phone game recommendation method based on binary decision tree
CN107276854A (en) * 2017-07-27 2017-10-20 中兴软创科技股份有限公司 A kind of method of MOLAP statistical analyses under big data
CN108734380A (en) * 2018-04-08 2018-11-02 阿里巴巴集团控股有限公司 Adventure account determination method, device and computing device
CN109035003A (en) * 2018-07-04 2018-12-18 北京玖富普惠信息技术有限公司 Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning
CN109361643A (en) * 2018-06-22 2019-02-19 中国移动通信集团广东有限公司 A kind of depth source tracing method of malice sample
CN109525595A (en) * 2018-12-25 2019-03-26 广州华多网络科技有限公司 A kind of black production account recognition methods and equipment based on time flow feature
CN109544190A (en) * 2018-11-28 2019-03-29 北京芯盾时代科技有限公司 A kind of fraud identification model training method, fraud recognition methods and device
CN110147430A (en) * 2019-04-25 2019-08-20 上海欣方智能系统有限公司 Harassing call recognition methods and system based on random forests algorithm
CN111047146A (en) * 2019-11-19 2020-04-21 支付宝(杭州)信息技术有限公司 Risk identification method, device and equipment for enterprise users
CN111091408A (en) * 2019-10-30 2020-05-01 北京天元创新科技有限公司 User identification model creating method and device and identification method and device
CN111754337A (en) * 2020-06-30 2020-10-09 上海观安信息技术股份有限公司 Method and system for identifying credit card maintenance contract group
CN112184334A (en) * 2020-10-27 2021-01-05 北京嘀嘀无限科技发展有限公司 Method, apparatus, device and medium for determining problem users
CN112288094A (en) * 2020-10-09 2021-01-29 武汉大学 Federal network representation learning method and system
US20210042830A1 (en) * 2019-08-09 2021-02-11 Ruon Global Ltd User media platform server system
CN112380531A (en) * 2020-11-11 2021-02-19 平安科技(深圳)有限公司 Black product group partner identification method, device, equipment and storage medium
CN112488138A (en) * 2019-09-11 2021-03-12 中国移动通信集团广东有限公司 User category identification method and device, electronic equipment and storage medium
CN112533209A (en) * 2020-12-10 2021-03-19 中国联合网络通信集团有限公司 Black product identification method and black product identification device
CN112860951A (en) * 2019-11-28 2021-05-28 武汉斗鱼鱼乐网络科技有限公司 Method and system for identifying target account
CN112954685A (en) * 2021-01-29 2021-06-11 上海安恒时代信息技术有限公司 Method and system for identifying mobile phone number produced in black and grey
CN112950231A (en) * 2021-03-19 2021-06-11 广州瀚信通信科技股份有限公司 XGboost algorithm-based abnormal user identification method, device and computer-readable storage medium
CN112949760A (en) * 2021-03-30 2021-06-11 平安科技(深圳)有限公司 Model precision control method and device based on federal learning and storage medium
CN113014566A (en) * 2021-02-19 2021-06-22 腾讯科技(深圳)有限公司 Malicious registration detection method and device, computer readable medium and electronic device

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811298A (en) * 2015-05-14 2015-07-29 中国联合网络通信集团有限公司 Method and device for realizing encryption
CN105868298A (en) * 2016-03-23 2016-08-17 华南理工大学 Mobile phone game recommendation method based on binary decision tree
CN107276854A (en) * 2017-07-27 2017-10-20 中兴软创科技股份有限公司 A kind of method of MOLAP statistical analyses under big data
CN108734380A (en) * 2018-04-08 2018-11-02 阿里巴巴集团控股有限公司 Adventure account determination method, device and computing device
CN109361643A (en) * 2018-06-22 2019-02-19 中国移动通信集团广东有限公司 A kind of depth source tracing method of malice sample
CN109035003A (en) * 2018-07-04 2018-12-18 北京玖富普惠信息技术有限公司 Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning
CN109544190A (en) * 2018-11-28 2019-03-29 北京芯盾时代科技有限公司 A kind of fraud identification model training method, fraud recognition methods and device
CN109525595A (en) * 2018-12-25 2019-03-26 广州华多网络科技有限公司 A kind of black production account recognition methods and equipment based on time flow feature
CN110147430A (en) * 2019-04-25 2019-08-20 上海欣方智能系统有限公司 Harassing call recognition methods and system based on random forests algorithm
US20210042830A1 (en) * 2019-08-09 2021-02-11 Ruon Global Ltd User media platform server system
CN112488138A (en) * 2019-09-11 2021-03-12 中国移动通信集团广东有限公司 User category identification method and device, electronic equipment and storage medium
CN111091408A (en) * 2019-10-30 2020-05-01 北京天元创新科技有限公司 User identification model creating method and device and identification method and device
CN111047146A (en) * 2019-11-19 2020-04-21 支付宝(杭州)信息技术有限公司 Risk identification method, device and equipment for enterprise users
CN112860951A (en) * 2019-11-28 2021-05-28 武汉斗鱼鱼乐网络科技有限公司 Method and system for identifying target account
CN111754337A (en) * 2020-06-30 2020-10-09 上海观安信息技术股份有限公司 Method and system for identifying credit card maintenance contract group
CN112288094A (en) * 2020-10-09 2021-01-29 武汉大学 Federal network representation learning method and system
CN112184334A (en) * 2020-10-27 2021-01-05 北京嘀嘀无限科技发展有限公司 Method, apparatus, device and medium for determining problem users
CN112380531A (en) * 2020-11-11 2021-02-19 平安科技(深圳)有限公司 Black product group partner identification method, device, equipment and storage medium
CN112533209A (en) * 2020-12-10 2021-03-19 中国联合网络通信集团有限公司 Black product identification method and black product identification device
CN112954685A (en) * 2021-01-29 2021-06-11 上海安恒时代信息技术有限公司 Method and system for identifying mobile phone number produced in black and grey
CN113014566A (en) * 2021-02-19 2021-06-22 腾讯科技(深圳)有限公司 Malicious registration detection method and device, computer readable medium and electronic device
CN112950231A (en) * 2021-03-19 2021-06-11 广州瀚信通信科技股份有限公司 XGboost algorithm-based abnormal user identification method, device and computer-readable storage medium
CN112949760A (en) * 2021-03-30 2021-06-11 平安科技(深圳)有限公司 Model precision control method and device based on federal learning and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALTEXSOFT: "Fraud Detection: How Machine Learning Systems Help Reveal Scams in Fintech, Healthcare, and eCommerce", pages 1 - 24, Retrieved from the Internet <URL:《https://www.altexsoft.com》> *
孙梓翔 等: "网络黑产犯罪现状分析与打击策略", 《云南警官学院学报》, no. 3, pages 94 - 100 *
理想主义者: "纵向联邦学习", pages 1 - 9, Retrieved from the Internet <URL:《https://zhuanlan.zhihu.com/p/391189097》> *

Similar Documents

Publication Publication Date Title
US11582040B2 (en) Permissions from entities to access information
US11546373B2 (en) Cryptocurrency based malware and ransomware detection systems and methods
CN112926092A (en) Privacy-protecting identity information storage and identity authentication method and device
Mukta et al. Blockchain-based verifiable credential sharing with selective disclosure
Yang et al. Publicly verifiable data transfer and deletion scheme for cloud storage
Yi et al. Privacy-preserving user profile matching in social networks
Qin et al. Applying private information retrieval to lightweight bitcoin clients
Tu et al. Privacy-preserving outsourced auditing scheme for dynamic data storage in cloud
Yu et al. Veridedup: A verifiable cloud data deduplication scheme with integrity and duplication proof
CN116032667A (en) Online trace query method, system and related equipment supporting efficient update
US7543333B2 (en) Enhanced computer intrusion detection methods and systems
Meshram et al. An efficient authentication with key agreement procedure using Mittag–Leffler–Chebyshev summation chaotic map under the multi-server architecture
CN113010904A (en) Data processing method and device and electronic equipment
CN115118520B (en) Data processing method, device and server
CN113837303A (en) Black product user identification method, TEE node and computer readable storage medium
CN113254989B (en) Fusion method and device of target data and server
CN111030930B (en) Decentralized network data fragment transmission method, device, equipment and medium
US11522842B2 (en) Central trust hub for interconnectivity device registration and data provenance
CN110502915B (en) Data processing method, device and system
CN107743066B (en) Monitorable anonymous signature method and system
CN113489669A (en) User data protection method and device
CN113064899B (en) Method and device for storing asset securities type general evidence transaction certificates and electronic equipment
Ghunaim et al. Secure kNN query of outsourced spatial data using two-cloud architecture
CN108616593A (en) Method by the way that storage of linked list high in the clouds certificate can be traced
CN116346488B (en) Unauthorized access detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination