WO2022021977A1 - Underground industry account detection method and apparatus, computer device, and medium - Google Patents

Underground industry account detection method and apparatus, computer device, and medium Download PDF

Info

Publication number
WO2022021977A1
WO2022021977A1 PCT/CN2021/090947 CN2021090947W WO2022021977A1 WO 2022021977 A1 WO2022021977 A1 WO 2022021977A1 CN 2021090947 W CN2021090947 W CN 2021090947W WO 2022021977 A1 WO2022021977 A1 WO 2022021977A1
Authority
WO
WIPO (PCT)
Prior art keywords
account
field data
data document
word
weight
Prior art date
Application number
PCT/CN2021/090947
Other languages
French (fr)
Chinese (zh)
Inventor
孙家棣
马宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022021977A1 publication Critical patent/WO2022021977A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment, and computer-readable storage medium for detecting a fraudulent account based on artificial intelligence.
  • the inventor realizes that, at present, business risk identification needs to crack down on illegal activities and identify and crack down on fake illegal accounts.
  • the industry currently mainly uses expert rules of thumb to identify and combat fake accounts.
  • the identification surface of expert rules is relatively simple and narrow, mainly targeted and accurate identification and attack, because the logic is relatively simple, and it is easy to be identified and bypassed by illegal behavior.
  • the purpose of this application is to provide a method, device, computer equipment and computer-readable storage medium for detecting a fraudulent account based on artificial intelligence.
  • an artificial intelligence-based black production account detection method including:
  • the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject;
  • a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
  • an artificial intelligence-based black production account detection device including:
  • an acquisition module configured to acquire an account attribute data set of the account when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, and the mobile phone number is derived from the account database of the target subject;
  • a building module used to use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject;
  • a clustering module configured to perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram, to obtain a plurality of account clusters
  • the generating module is used for using the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and to obtain the second field of the whitelist account corresponding to the target subject data files;
  • a calculation module configured to calculate the weight of each word in the first field data document according to the first field data document and the second field data document, and the weight indicates that each of the words is in the first field data document. importance in a field data document relative to the second field data document;
  • the determining module is configured to determine the black production account clusters based on the weight of each of the words, and obtain the black production account groups associated with the target subject.
  • a computer device including a memory and a processor, where the memory is configured to store a program for detecting a fraudulent account based on artificial intelligence of the processor, and the processor is configured to execute the program based on the artificial intelligence
  • the artificial intelligence black product account detection program performs the following processing: when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject; The attribute field data in the account attribute data set is used as a connection edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject; based on the attribute field data of the connection edge in the account detection graph, the account Perform graph clustering on the accounts in the detection graph to obtain a plurality of account clusters; use the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and obtain all the account clusters.
  • a computer-readable storage medium storing computer-readable instructions, on which is stored a program for detecting a fraudulent account based on artificial intelligence, and the program for detecting a fraudulent account based on artificial intelligence is processed
  • the device When the device is executed, the following processing is implemented: when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject; the account attribute data set is collected.
  • the attribute field data is used as the connection edge, and the mobile phone number is used as the vertex to construct the account detection graph of the target subject; based on the attribute field data of the connection edge in the account detection graph, the account in the account detection graph is graphed. to obtain a plurality of account clusters; use the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and obtain the whitelist corresponding to the target subject.
  • the second field data document of the account according to the first field data document and the second field data document, calculate the weight of each word in the first field data document, and the weight indicates that each word is in the The importance in the first field data document relative to the second field data document; determine the black production account cluster cluster based on the weight of each of the words, and obtain the black production account group associated with the target subject .
  • the above-mentioned artificial intelligence-based black production account detection method, device, computer equipment and computer-readable storage medium first, when it is determined that the number of accounts bound to the mobile phone number derived from the account database of the target subject exceeds a predetermined number, the account number of the account is obtained.
  • Attribute data set preliminary screening of accounts in the target subject, excluding accounts whose mobile phone numbers are bound to accounts less than a predetermined number, obtain the account attribute data set to be detected, narrow the detection range and improve the detection reliability; then, the account attribute data
  • the centralized attribute field data is used as a connection edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject; based on the attribute field data of the connection edge in the account detection graph, graph clustering is performed on the accounts in the account detection graph to obtain multiple Account clustering cluster; build a graph by taking the mobile phone number of the associated account as a vertex, and then clustering the account clustering cluster based on the attribute field data graph, and reliable clustering to obtain the account gang; then, using the attribute field of each account clustering cluster data, generate the first field data document of each account cluster, and obtain the second field data document of the whitelist account corresponding to the target subject; it is convenient to perform data analysis based on the data document, and at the same time pass the normal second
  • FIG. 1 schematically shows a flow chart of a method for detecting a fraudulent account based on artificial intelligence.
  • FIG. 2 schematically shows an example diagram of an application scenario of an artificial intelligence-based black production account detection method.
  • FIG. 3 schematically shows a flow chart of a method for acquiring an account attribute data set of an account.
  • FIG. 4 schematically shows a block diagram of an artificial intelligence-based black product account detection device.
  • FIG. 5 schematically shows an example block diagram of a computer device for implementing the above-mentioned artificial intelligence-based black production account detection method.
  • FIG. 6 schematically shows a computer-readable storage medium for implementing the above-mentioned artificial intelligence-based black product account detection method.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
  • the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • numerous specific details are provided in order to give a thorough understanding of the embodiments of the present application.
  • those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be employed.
  • well-known solutions have not been shown or described in detail to avoid obscuring aspects of the application.
  • the artificial intelligence-based black production account detection method can be run on a server, a server cluster or a cloud server, etc.
  • the method of the present application can also be executed on other platforms according to requirements, which is not particularly limited in this exemplary embodiment.
  • the artificial intelligence-based black production account detection method may include the following steps:
  • Step S110 when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, obtain the account attribute data set of the account, and the mobile phone number is derived from the account database of the target subject;
  • Step S120 using the attribute field data in the account attribute data set as a connection edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject;
  • Step S130 performing graph clustering on the accounts in the account detection diagram based on the attribute field data of the connected edges in the account detection diagram, to obtain a plurality of account clusters;
  • Step S140 using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject ;
  • Step S150 calculate the weight of each word in the first field data document, the weight indicating that each word is in the first field importance in the data document relative to the second field in the data document;
  • Step S160 Determine a cluster of black product accounts based on the weight of each of the words, and obtain a black product account group associated with the target subject.
  • the account attribute data set of the account is obtained; Preliminary screening, excluding accounts whose mobile phone numbers are bound to an account number smaller than a predetermined number, obtains the account attribute data set to be detected, narrows the detection range and improves the detection reliability.
  • the attribute field data in the account attribute data set is used as the connecting edge, and the mobile phone number is used as the vertex to construct the account detection graph of the target subject;
  • Clustering is performed to obtain multiple account clusters; a graph is constructed by taking the mobile phone numbers of the associated accounts as vertices, and then based on attribute field data graph clustering, account clusters are obtained, and account groups are obtained by reliable clustering.
  • the weight of each word in the first field data document is calculated, so as to determine the black product account clusters based on the weight of each said word, the weight indicating each The importance of a word in the first field data document relative to the second field data document. Whether the account cluster is a black-producing account group can be reliably determined by the importance of each word in the first field data document relative to the second field data document.
  • step S110 when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, an account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject.
  • the server 210 may obtain the account attribute data set of the account associated with the target subject from the server 220; then, the server 210 may determine that the number of accounts bound to the user's mobile phone number exceeds When the predetermined number is reached, the account attribute data sets of all accounts corresponding to the mobile phone numbers whose number of bound accounts exceeds the predetermined number are acquired.
  • the server 210 and the server 220 may be various terminal devices with an instruction processing function and a data storage function, such as a computer and a mobile phone, which are not specially limited herein.
  • the server 210 and the server 220 are node servers in the blockchain, and based on the immutability and security of the data in the blockchain, the server 210 can safely and reliably obtain the association of the target subject from the server 220 The account attribute data set of the account.
  • the account attribute data set of each account includes field data of account-related attribute fields, which may include field data of related attribute fields such as mobile phone number, device, network environment, and login password, such as account password, mobile phone number, and login device id.
  • the target subject can be any enterprise or platform.
  • the predetermined number can be set according to the actual situation and is associated with a preset mobile phone number.
  • the standard number of accounts, and the number of accounts associated with a mobile phone number exceeds this threshold, indicating that there is a suspicion of black production, for example, it can be 5 and so on.
  • account attribute data such as the network environment, device parameters, and registration passwords collected by the application can be used.
  • Network black products may disguise dimensions such as network environment, device parameters, registration passwords, etc., but it is impossible to bypass the account indicator requirements for registering and binding the same user's mobile phone number.
  • the target entity's requirement for the user's mobile phone number to register and bind the account is to take effect at the end of each month, it can first obtain accounts according to the predetermined number set by the target entity's bound account number within one month of binding the same user's mobile phone number. attribute dataset.
  • the account attribute data set of the account is obtained, and all accounts associated with the target subject can be preliminarily screened, and accounts whose mobile phone number is bound to an account number less than the predetermined number are excluded. , to obtain the remaining account attribute data sets to be detected, which reduces the detection range and improves the detection accuracy.
  • acquiring the account attribute data set of the account includes:
  • Step S310 obtain the business association condition between the target entity and the mobile phone number, the business association condition indicates the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity ;
  • Step S320 when the account number bound with the mobile phone number exceeds the number threshold, acquire the account attribute data set of the account number.
  • the business association condition indicates the threshold of the number of accounts that can be bound to the user's mobile phone number in the target business, that is, the threshold of the number of accounts that can be bound to the mobile phone number set in a certain business activity held by the target entity, which is suitable for the target business. Realize accurate monitoring of suspected black-produced accounts according to different businesses.
  • step S120 the attribute field data in the account attribute data set is used as a connection edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject.
  • the attribute field data in the account attribute data set is used as the connection edge and the mobile phone number is used as the vertex to construct the detection graph, that is, the mobile phone number associated with the account is used as the vertex, and the association relationship between the accounts according to the field data, Using the field as a connection edge, connect the associated accounts to obtain a detection graph, which can include various associations between the acquired accounts.
  • the attribute field data in the account attribute data set is used as a connecting edge
  • the mobile phone number is used as a vertex to construct an account detection graph of the target subject, including:
  • the fingerprint type field at least includes the login device ID, login password and the boot time of the login device
  • the category type field at least includes the login device model, system version, The total storage space of the device, the login network address, and the physical address of the wireless network card
  • the combination of the first predetermined number of the fingerprint-type fields and the combination of the second predetermined number of the category-type fields in the account attribute data set is used as a connection edge, and the corresponding field data combination is combined.
  • the mobile phone number is used as a vertex to construct an account detection graph.
  • fingerprint-type fields any first predetermined number of field data can be used as a connection edge of the detection graph; while a category type field requires a second predetermined number of field data, which can be put together as a connection edge of the detection map.
  • the first predetermined number is 2, and the second predetermined number is greater than or equal to 3 and less than or equal to 5.
  • a single field of the fingerprint field can be used as a connection edge for filtering, or two combinations can be used as edges together, which can effectively avoid accidental collisions.
  • the field data of a certain fingerprint field changed by a black product happened to be the same as that of a normal account.
  • the two were combined together and used as a connection edge, which reduced the probability of accidental injury and collision.
  • multiple categorical fields are put together to filter data more accurately.
  • Ios system fingerprint type variables are the logon device identification id, the logon password, and the boottime boottime of the logon device.
  • the correspondence between single field data and the number of mobile phone numbers is as follows (a1-a3):
  • categorical variables include login device model, system version, total device storage space, login network address ip, physical address of wireless network card wifimac, etc.
  • the corresponding relationship between the single field data and the number of mobile phone numbers is (b1-b2): (b1) The relationship between the number of models and the number of mobile phone numbers is 1:28470.36, and the total number of models is usually 70. (b2) The relationship between the total storage space of the device and the number of mobile phone numbers is 1:134.34. Through the combination, the number of corresponding mobile phone numbers can be effectively reduced.
  • Step S130 Perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters.
  • the account detection graph can be graph-clustered using the existing graph clustering method to obtain account clusters.
  • the relationship network of accounts can be constructed based on the construction of the account detection graph, and the accounts can be clustered based on the attribute field data to obtain similar account clusters.
  • graph clustering is performed on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters, including:
  • the account detection graph is subjected to graph clustering processing using the Connected Component algorithm to obtain multiple account groups;
  • account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
  • account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
  • the first account group combination and the second account group combination are determined as the account cluster.
  • the mobile phone number is the vertex, and the connected edges defined in the above steps are calculated by graph clustering using the Connected Component algorithm to obtain multiple node clusters.
  • the Connected Components algorithm that is, the connected body algorithm labels each connected body (multiple account groups) in the graph with an ID, and uses the ID of the vertex with the smallest sequence number in the connected body as the ID of the connected body. If there is a path between any two vertices (mobile phone numbers) in the graph G, then G is called a connected graph, otherwise the graph is called a non-connected graph, and the maximally connected subgraph is called a connected body.
  • a second graph clustering taking the group number (identification id) of the first clustering result as the vertex, first, from the multiple account groups, obtain the number of mobile phone numbers that are greater than or equal to a predetermined number and are associated with Obtain a first account group combination from account groups with the same login network address. For example, acquire account groups that include a number of mobile phone numbers greater than or equal to 3 and are associated with the same login network address to obtain the first account group combination.
  • the account groups that include mobile phone numbers greater than or equal to the predetermined number and are associated with the physical address of the same wireless network card, and obtain a second account group combination, for example, (obtaining The number of accounts is greater than or equal to 3 and is associated with the account group of the physical address of the same wireless network card, and the second account group combination is obtained.
  • the use of the quadratic graph clustering mainly corresponds to the second dial dynamic ip (login network address, the physical address of the wireless network card ) and merging small groups that were supposed to be the same gang.
  • the black product will disguise the ip of several mobile phone numbers and change the ip or wifimac.
  • Step S140 using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject .
  • the first field data document and the second field data document may be text documents or tables.
  • the whitelist account corresponding to the target subject can be the account attribute data set of the internal user of the subject corresponding to the target subject.
  • the account-related data of an employee of an organization can be determined as non-black data. .
  • the second field data document of the whitelist account corresponding to the target subject can be generated from the attribute field data of the whitelist account.
  • the normal second field data file is used as the comparison of the first field data file to ensure the accuracy of the detection of the black product account.
  • Step S150 calculate the weight of each word in the first field data document, the weight indicating that each word is in the first field Importance in the data document relative to the second field in the data document.
  • the first field data document of each account cluster can be obtained by calculating a weight indicating the importance of each word in the first field data document relative to the second field data document Words with "unique" weights (that is, attribute field data), and furthermore, if there is "unique" attribute field data of the group in the account gang, there is a high probability that it is a simulator parameter modified by black products.
  • calculating the weight of each word in the first field data document according to the first field data document and the second field data document including:
  • the product of the first frequency and the second frequency is used as the weight of each of the words.
  • Calculate the first frequency that each word appears in the first field data document, and the importance of each word in the first field data document to be detected can be obtained; then, calculate the first field data document and the first field data document of each word.
  • the second frequency that appears simultaneously in the two-field data document can obtain the global importance of each word.
  • the product of the first frequency and the second frequency is used as the weight of each word, which can be obtained from the perspective of the global data set through the weight. Indicates the importance of each word in the first field data document relative to the second field data document.
  • calculating the weight of each word in the first field data document according to the first field data document and the second field data document including:
  • the TF-IDF algorithm can accurately and efficiently identify the account of the black gang (
  • There are words with large TF-IDF weights in the gang account indicating that there are words that are "unique" for the gang account in the gang account, and the high probability is the simulator parameter.
  • detection resources can be saved.
  • more black gang accounts can be fished out according to the TF-IDF weight sorting. Experiments have shown that by sorting according to this standard, more black gang accounts can be found under the same number of detections.
  • Step S160 Determine a cluster of black product accounts based on the weight of each of the words, and obtain a black product account group associated with the target subject.
  • the weight of each described word is used to determine the clusters of black production accounts, including:
  • the account cluster corresponding to the black product data document is determined as a black product account group.
  • the predetermined weight can be set according to the actual situation. There are words with weights higher than the predetermined weights, indicating that the data in the first field data document from which the words with weights higher than the predetermined weights come from are abnormal, and it is determined as a black production data document, and then, yes, the account corresponding to the black production data document is The cluster is determined to be a black-produced account gang.
  • the weight of each described word is used to determine the clusters of black production accounts, including:
  • the account cluster corresponding to the black product data document is determined as a black product account group.
  • the weights of all words can be comprehensively considered, and the abnormal situation of account clusters can be considered globally based on the first field data document. Furthermore, the first field data document whose weight average value is higher than the predetermined average value is determined as the black product data document, and the black product account gang can be reliably detected globally.
  • the present application also provides an artificial intelligence-based black production account detection device.
  • the artificial intelligence-based black production account detection device may include an acquisition module 410 , a construction module 420 , a clustering module 430 , a generation module 440 , a calculation module 450 and a determination module 460 . in:
  • the acquisition module 410 can be used to determine that when the number of accounts bound to the user's mobile phone number exceeds a predetermined number, acquire the account attribute data set of the account, and the user is associated with the target subject;
  • the building module 420 can be configured to use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject;
  • the clustering module 430 may be configured to perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram, to obtain a plurality of account clusters;
  • the generating module 440 can be configured to use the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and obtain the second data of the whitelist account corresponding to the target subject. field data document;
  • the calculation module 450 may be configured to calculate the weight of each word in the first field data document according to the first field data document and the second field data document, the weight indicating that each word is in the importance in the first field data document relative to the second field data document;
  • the determining module 460 may be configured to determine a cluster of black product accounts based on the weight of each of the words, and obtain a black product account group associated with the target subject.
  • the obtaining module is further configured to:
  • the business association condition indicates the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity;
  • the account attribute data set of the account number is acquired.
  • the clustering module is further configured to:
  • the account detection graph is subjected to graph clustering processing using the Connected Component algorithm to obtain multiple account groups;
  • account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
  • account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
  • the first account group combination and the second account group combination are determined as the account cluster.
  • the computing module is further configured to:
  • the product of the first frequency and the second frequency is used as the weight of each of the words.
  • the computing module is further configured to:
  • the determining module is further configured to:
  • the account cluster corresponding to the black product data document is determined as a black product account group.
  • the determining module is further configured to:
  • the account cluster corresponding to the black product data document is determined as a black product account group.
  • a computer device which performs all or part of the steps of any of the above-mentioned artificial intelligence-based methods for detecting fraudulent accounts.
  • the computer equipment includes:
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute as illustrated in any of the above-described exemplary embodiments
  • the artificial intelligence-based black production account detection method is not limited to:
  • aspects of the present application may be implemented as a system, method or program product. Therefore, various aspects of the present application can be embodied in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", “module” or "system”.
  • a computer device 500 according to this embodiment of the present application is described below with reference to FIG. 5 .
  • the computer device 500 shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • computer device 500 takes the form of a general-purpose computing device.
  • Components of the computer device 500 may include, but are not limited to, the above-mentioned at least one processing unit 510 , the above-mentioned at least one storage unit 520 , and a bus 530 connecting different system components (including the storage unit 520 and the processing unit 510 ).
  • the storage unit stores program codes, and the program codes can be executed by the processing unit 510, so that the processing unit 510 executes various exemplary methods according to the present application described in the above-mentioned “Methods of Embodiments” of this specification. Implementation steps.
  • the processing unit 510 may execute step S110 as shown in FIG.
  • Step S120 use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject
  • Step S130 based on the account detection graph Perform graph clustering on the accounts in the account detection diagram by using the attribute field data of the connecting edges in the middle to obtain a plurality of account clusters
  • step S140 use the attribute field data of each of the account clusters to generate each of the account clusters.
  • Step S150 The first field data file of the account cluster, and the second field data file of the whitelist account corresponding to the target subject is obtained;
  • Step S150 according to the first field data file and the second field data file, calculate The weight of each word in the first field data document, the weight indicating the importance of each word in the first field data document relative to the second field data document;
  • Step S160 Based on the weight of each of the words, a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
  • the storage unit 520 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 5201 and/or a cache storage unit 5202 , and may further include a read only storage unit (ROM) 5203 .
  • RAM random access storage unit
  • ROM read only storage unit
  • the storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, An implementation of a network environment may be included in each or some combination of these examples.
  • the bus 530 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
  • Computer device 500 may also communicate with one or more external devices 700 (eg, keyboards, pointing devices, Bluetooth devices, etc.), may also communicate with one or more devices that enable a user to interact with the computer device 500, and/or communicate with Any device (eg, router, modem, etc.) that enables the computer device 500 to communicate with one or more other computer devices. Such communication may take place through an input/output (I/O) interface 550 , which may also include a display unit 540 coupled to the input/output (I/O) interface 550 . Also, the computer device 500 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 560 .
  • networks eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet
  • network adapter 560 communicates with other modules of computer device 500 via bus 530 .
  • other hardware and/or software modules may be used in conjunction with computer device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.
  • the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present application may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computer device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiment of the present application.
  • a computer device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium on which a program product capable of implementing the above-mentioned method of the present specification is stored, and the computer-readable storage medium may be non-volatile or easily accessible. loss of sex.
  • various aspects of the present application can also be implemented in the form of a program product, which includes program code, which is used to cause the program product to run on a terminal device when the program product is executed.
  • the terminal device performs the steps according to various exemplary embodiments of the present application described in the above-mentioned "Example Method" section of this specification.
  • a program product 600 for implementing the above method according to an embodiment of the present application is described, which can adopt a portable compact disk read only memory (CD-ROM) and include program codes, and can be used in a terminal device, For example running on a personal computer.
  • CD-ROM portable compact disk read only memory
  • the program product of the present application is not limited thereto, and in this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for carrying out the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer device, partly on the user's computer device, as a stand-alone software package, partly on the user's computer device and partly on a remote computer device, or entirely on the remote computer device or execute on the server.
  • the remote computer equipment may be connected to the user computer equipment via any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to external computer equipment (eg, using an Internet service provider business via an Internet connection).
  • LAN local area network
  • WAN wide area network

Abstract

An artificial intelligence-based underground industry account detection method and a related apparatus. The method comprises: upon determining that the number of accounts bound to a mobile phone number of a user exceeds a predetermined number, obtaining the account attribute data set of an account, wherein the user is associated with a target subject (S110); using attribute field data in the account attribute data set as a connection edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject (S120); performing graph clustering on the accounts in the account detection graph to obtain a plurality of account clusters (S130); using the attribute field data of each account cluster to generate a first field data document, and obtaining a second field data document of a whitelist account corresponding to the target subject (S140); calculating the weight of each word in the first field data document (S150); and determining an underground industry account cluster according to the weight of each word (S160). The solution further relates to the field of blockchains, and the account attribute data set can be stored in a blockchain, thereby effectively improving the accuracy of underground industry account detection.

Description

黑产账号检测方法、装置、计算机设备和介质Black production account detection method, device, computer equipment and medium
本申请要求于2020年7月31日提交中国专利局、申请号为CN 202010763020.X,发明名称为“基于人工智能的黑产账号检测方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on July 31, 2020 with the application number CN 202010763020.X, and the invention title is "artificial intelligence-based black product account detection method and related device", all of which The contents are incorporated herein by reference.
技术领域technical field
本申请涉及人工智能技术领域,特别是涉及一种基于人工智能的黑产账号检测方法、装置、计算机设备和计算机可读存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment, and computer-readable storage medium for detecting a fraudulent account based on artificial intelligence.
背景技术Background technique
一般地,随着互联网大众化应用越来越广泛,正常用户在享受网络便利的同时,也伴随着网络黑产带来的危险。目前,网络黑产已经规模化、链条化,黑产形式越来越多样,防作弊难度越来越大。黑产已逐利为目的,有需求,就有市场。In general, as the popularization of the Internet becomes more and more widespread, normal users enjoy the convenience of the Internet, but at the same time, they are also accompanied by the dangers brought by the black production of the Internet. At present, the network black production has been scaled and chained, and the forms of black production are becoming more and more diverse, and it is more and more difficult to prevent cheating. The purpose of black production has been profit-seeking. If there is demand, there will be a market.
随着技术的进步,黑产攻击已经成为各大公司非常重视的问题,无时无刻不在面临着黑产的攻击。黑产无论是如何变现,都需要先注册大量的虚假账号,以量攻击。With the advancement of technology, hacking attacks have become a problem that major companies attach great importance to, and they are facing hacking attacks all the time. No matter how the black product is realized, it needs to register a large number of fake accounts first to attack.
发明人意识到,目前,业务风险识别需要打击黑产的行为,识别打击虚假黑产账号。业内目前主要是通过专家经验规则来识别和打击虚假账号。专家规则识别面比较单一,比较窄,主要是定向精准识别和打击,因为逻辑较简单,容易被黑产行为识别和绕过。The inventor realizes that, at present, business risk identification needs to crack down on illegal activities and identify and crack down on fake illegal accounts. The industry currently mainly uses expert rules of thumb to identify and combat fake accounts. The identification surface of expert rules is relatively simple and narrow, mainly targeted and accurate identification and attack, because the logic is relatively simple, and it is easy to be identified and bypassed by illegal behavior.
发明内容SUMMARY OF THE INVENTION
在人工智能技术领域,为了解决上述技术问题,本申请的目的在于提供一种基于人工智能的黑产账号检测方法、装置、计算机设备和计算机可读存储介质。In the field of artificial intelligence technology, in order to solve the above technical problems, the purpose of this application is to provide a method, device, computer equipment and computer-readable storage medium for detecting a fraudulent account based on artificial intelligence.
第一方面,提供了一种基于人工智能的黑产账号检测方法,包括:In the first aspect, an artificial intelligence-based black production account detection method is provided, including:
确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;When it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject;
将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;Using the attribute field data in the account attribute data set as a connecting edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject;
基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;Perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connected edges in the account detection diagram to obtain a plurality of account clusters;
利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;Using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject;
根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;From the first field data document and the second field data document, calculate a weight for each word in the first field data document, the weight indicating that each of the words is in the first field data document relative to the importance in the second field data document;
基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Based on the weight of each of the words, a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
第二方面,提供了一种基于人工智能的黑产账号检测装置,包括:In the second aspect, an artificial intelligence-based black production account detection device is provided, including:
获取模块,用于确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;an acquisition module, configured to acquire an account attribute data set of the account when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, and the mobile phone number is derived from the account database of the target subject;
构建模块,用于将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;A building module, used to use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject;
聚类模块,用于基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;a clustering module, configured to perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram, to obtain a plurality of account clusters;
生成模块,用于利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;The generating module is used for using the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and to obtain the second field of the whitelist account corresponding to the target subject data files;
计算模块,用于根据所述第一字段数据文档及所述第二字段数据文档,计算所述第 一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;A calculation module, configured to calculate the weight of each word in the first field data document according to the first field data document and the second field data document, and the weight indicates that each of the words is in the first field data document. importance in a field data document relative to the second field data document;
确定模块,用于基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。The determining module is configured to determine the black production account clusters based on the weight of each of the words, and obtain the black production account groups associated with the target subject.
第三方面,提供了一种计算机设备,包括存储器和处理器,所述存储器用于存储所述处理器的基于人工智能的黑产账号检测的程序,所述处理器配置为经由执行所述基于人工智能的黑产账号检测的程序来执行以下处理:确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。In a third aspect, a computer device is provided, including a memory and a processor, where the memory is configured to store a program for detecting a fraudulent account based on artificial intelligence of the processor, and the processor is configured to execute the program based on the artificial intelligence The artificial intelligence black product account detection program performs the following processing: when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject; The attribute field data in the account attribute data set is used as a connection edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject; based on the attribute field data of the connection edge in the account detection graph, the account Perform graph clustering on the accounts in the detection graph to obtain a plurality of account clusters; use the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and obtain all the account clusters. The second field data document of the whitelist account corresponding to the target subject; according to the first field data document and the second field data document, the weight of each word in the first field data document is calculated, and the weight Indicate the importance of each of the words in the first field data document relative to the second field data document; determine the black product account cluster based on the weight of each of the words, and obtain an association with all the words. Describe the target group of black-produced accounts.
第四方面,提供了一种存储有计算机可读指令的计算机可读存储介质,其上存储有基于人工智能的黑产账号检测的程序,所述基于人工智能的黑产账号检测的程序被处理器执行时实现以下处理:确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。In a fourth aspect, there is provided a computer-readable storage medium storing computer-readable instructions, on which is stored a program for detecting a fraudulent account based on artificial intelligence, and the program for detecting a fraudulent account based on artificial intelligence is processed When the device is executed, the following processing is implemented: when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject; the account attribute data set is collected. The attribute field data is used as the connection edge, and the mobile phone number is used as the vertex to construct the account detection graph of the target subject; based on the attribute field data of the connection edge in the account detection graph, the account in the account detection graph is graphed. to obtain a plurality of account clusters; use the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and obtain the whitelist corresponding to the target subject The second field data document of the account; according to the first field data document and the second field data document, calculate the weight of each word in the first field data document, and the weight indicates that each word is in the The importance in the first field data document relative to the second field data document; determine the black production account cluster cluster based on the weight of each of the words, and obtain the black production account group associated with the target subject .
上述基于人工智能的黑产账号检测方法、装置、计算机设备和计算机可读存储介质,首先,确定来源于目标主体的账号数据库的手机号所绑定账号的数目超过预定数目时,获取账号的账号属性数据集;对目标主体中账号进行初步筛选,排除手机号所绑定账号小于预定数目的账号,得到待检测的账号属性数据集,缩小检测范围同时提升检测可靠性;然后,将账号属性数据集中的属性字段数据作为连接边,并将手机号作为顶点构建所述目标主体的账号检测图;基于账号检测图中连接边的属性字段数据对账号检测图中账号进行图聚类,得到多个账号聚类簇;通过将关联账号的手机号作为顶点构建图,然后基于属性字段数据图聚类得到账号聚类簇,可靠聚类得到账号团伙;然后,利用每个账号聚类簇的属性字段数据,生成每个账号聚类簇的第一字段数据文档,并获取目标主体所对应白名单账号的第二字段数据文档;可以便于基于数据文档进行数据分析,同时通过正常的第二字段数据文档作为第一字段数据文档的对照,保证黑产账号检测的准确性;最后,根据第一字段数据文档及第二字段数据文档,计算第一字段数据文档中每个词的权重,以基于每个所述词的权重确定黑产账号聚类簇,该权重指示每个词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性。可以通过每个词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性,准确判断账号聚类簇是否黑产账号团伙。The above-mentioned artificial intelligence-based black production account detection method, device, computer equipment and computer-readable storage medium, first, when it is determined that the number of accounts bound to the mobile phone number derived from the account database of the target subject exceeds a predetermined number, the account number of the account is obtained. Attribute data set; preliminary screening of accounts in the target subject, excluding accounts whose mobile phone numbers are bound to accounts less than a predetermined number, obtain the account attribute data set to be detected, narrow the detection range and improve the detection reliability; then, the account attribute data The centralized attribute field data is used as a connection edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject; based on the attribute field data of the connection edge in the account detection graph, graph clustering is performed on the accounts in the account detection graph to obtain multiple Account clustering cluster; build a graph by taking the mobile phone number of the associated account as a vertex, and then clustering the account clustering cluster based on the attribute field data graph, and reliable clustering to obtain the account gang; then, using the attribute field of each account clustering cluster data, generate the first field data document of each account cluster, and obtain the second field data document of the whitelist account corresponding to the target subject; it is convenient to perform data analysis based on the data document, and at the same time pass the normal second field data document As a comparison of the first field data document, the accuracy of the detection of the black product account is guaranteed; finally, according to the first field data document and the second field data document, the weight of each word in the first field data document is calculated to be based on each The weights of the words determine the clusters of black product accounts, the weights indicating the importance of each word in the first field data document relative to the second field data document. It can be accurately determined whether the account cluster is a black-producing account group according to the importance of each word in the first field data document relative to the second field data document.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申 请。It is to be understood that the foregoing general description and the following detailed description are exemplary only and do not limit the present application.
附图说明Description of drawings
图1示意性示出一种基于人工智能的黑产账号检测方法的流程图。FIG. 1 schematically shows a flow chart of a method for detecting a fraudulent account based on artificial intelligence.
图2示意性示出一种基于人工智能的黑产账号检测方法的应用场景示例图。FIG. 2 schematically shows an example diagram of an application scenario of an artificial intelligence-based black production account detection method.
图3示意性示出一种获取账号的账号属性数据集的方法流程图。FIG. 3 schematically shows a flow chart of a method for acquiring an account attribute data set of an account.
图4示意性示出一种基于人工智能的黑产账号检测装置的方框图。FIG. 4 schematically shows a block diagram of an artificial intelligence-based black product account detection device.
图5示意性示出一种用于实现上述基于人工智能的黑产账号检测方法的计算机设备的示例框图。FIG. 5 schematically shows an example block diagram of a computer device for implementing the above-mentioned artificial intelligence-based black production account detection method.
图6示意性示出一种用于实现上述基于人工智能的黑产账号检测方法的计算机可读存储介质。FIG. 6 schematically shows a computer-readable storage medium for implementing the above-mentioned artificial intelligence-based black product account detection method.
具体实施方式detailed description
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本申请的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本申请的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments, however, can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of the embodiments of the present application. However, those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be employed. In other instances, well-known solutions have not been shown or described in detail to avoid obscuring aspects of the application.
此外,附图仅为本申请的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repeated descriptions will be omitted. Some of the block diagrams shown in the figures are functional entities that do not necessarily necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
本示例实施方式中首先提供了基于人工智能的黑产账号检测方法,该基于人工智能的黑产账号检测方法可以运行于服务器,也可以运行于服务器集群或云服务器等,当然,本领域技术人员也可以根据需求在其他平台运行本申请的方法,本示例性实施例中对此不做特殊限定。参考图1所示,该基于人工智能的黑产账号检测方法可以包括以下步骤:This example embodiment first provides an artificial intelligence-based black production account detection method. The artificial intelligence-based black production account detection method can be run on a server, a server cluster or a cloud server, etc. Of course, those skilled in the art The method of the present application can also be executed on other platforms according to requirements, which is not particularly limited in this exemplary embodiment. Referring to Fig. 1, the artificial intelligence-based black production account detection method may include the following steps:
步骤S110,确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;Step S110, when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, obtain the account attribute data set of the account, and the mobile phone number is derived from the account database of the target subject;
步骤S120,将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;Step S120, using the attribute field data in the account attribute data set as a connection edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject;
步骤S130,基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;Step S130, performing graph clustering on the accounts in the account detection diagram based on the attribute field data of the connected edges in the account detection diagram, to obtain a plurality of account clusters;
步骤S140,利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;Step S140, using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject ;
步骤S150,根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;Step S150, according to the first field data document and the second field data document, calculate the weight of each word in the first field data document, the weight indicating that each word is in the first field importance in the data document relative to the second field in the data document;
步骤S160,基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Step S160: Determine a cluster of black product accounts based on the weight of each of the words, and obtain a black product account group associated with the target subject.
上述基于人工智能的黑产账号检测方法中,首先,确定来源于目标主体的账号数据库的手机号所绑定账号的数目超过预定数目时,获取账号的账号属性数据集;对目标主体中账号进行初步筛选,排除手机号所绑定账号小于预定数目的账号,得到待检测的账 号属性数据集,缩小检测范围同时提升检测可靠性。In the above-mentioned artificial intelligence-based black production account detection method, first, when it is determined that the number of accounts bound to the mobile phone number derived from the account database of the target subject exceeds a predetermined number, the account attribute data set of the account is obtained; Preliminary screening, excluding accounts whose mobile phone numbers are bound to an account number smaller than a predetermined number, obtains the account attribute data set to be detected, narrows the detection range and improves the detection reliability.
然后,将账号属性数据集中的属性字段数据作为连接边,并将手机号作为顶点构建所述目标主体的账号检测图;基于账号检测图中连接边的属性字段数据对账号检测图中账号进行图聚类,得到多个账号聚类簇;通过将关联账号的手机号作为顶点构建图,然后基于属性字段数据图聚类得到账号聚类簇,可靠聚类得到账号团伙。Then, the attribute field data in the account attribute data set is used as the connecting edge, and the mobile phone number is used as the vertex to construct the account detection graph of the target subject; Clustering is performed to obtain multiple account clusters; a graph is constructed by taking the mobile phone numbers of the associated accounts as vertices, and then based on attribute field data graph clustering, account clusters are obtained, and account groups are obtained by reliable clustering.
然后,利用每个账号聚类簇的属性字段数据,生成每个账号聚类簇的第一字段数据文档,并获取目标主体所对应白名单账号的第二字段数据文档;可以便于基于数据文档进行数据分析,同时通过正常的第二字段数据文档作为第一字段数据文档的对照,保证黑产账号检测的准确性。Then, use the attribute field data of each account cluster to generate the first field data document of each account cluster, and obtain the second field data document of the whitelist account corresponding to the target subject; Data analysis, and at the same time, the normal second field data document is used as the comparison of the first field data document to ensure the accuracy of black production account detection.
最后,根据第一字段数据文档及第二字段数据文档,计算第一字段数据文档中每个词的权重,以基于每个所述词的权重确定黑产账号聚类簇,该权重指示每个词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性。可以通过每个词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性,可靠判断账号聚类簇是否黑产账号团伙。Finally, according to the first field data document and the second field data document, the weight of each word in the first field data document is calculated, so as to determine the black product account clusters based on the weight of each said word, the weight indicating each The importance of a word in the first field data document relative to the second field data document. Whether the account cluster is a black-producing account group can be reliably determined by the importance of each word in the first field data document relative to the second field data document.
下面,将结合附图对本示例实施方式中上述基于人工智能的黑产账号检测方法中的各步骤进行详细的解释以及说明。Hereinafter, each step in the above-mentioned artificial intelligence-based black product account detection method in this exemplary embodiment will be explained and described in detail with reference to the accompanying drawings.
在步骤S110,确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库。In step S110, when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, an account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject.
在本示例的实施方式中,参考图2所示,服务器210可以从服务器220中获取目标主体所关联账号的账号属性数据集;然后,服务器210可以确定用户的手机号所绑定账号的数目超过预定数目时,获取绑定账号的数目超过预定数目的手机号所对应的所有账号的账号属性数据集。其中,服务器210、服务器220可以是电脑、手机等各种具有指令处理功能、数据存储功能的终端设备,在此不做特殊限定。In the implementation of this example, referring to FIG. 2 , the server 210 may obtain the account attribute data set of the account associated with the target subject from the server 220; then, the server 210 may determine that the number of accounts bound to the user's mobile phone number exceeds When the predetermined number is reached, the account attribute data sets of all accounts corresponding to the mobile phone numbers whose number of bound accounts exceeds the predetermined number are acquired. Wherein, the server 210 and the server 220 may be various terminal devices with an instruction processing function and a data storage function, such as a computer and a mobile phone, which are not specially limited herein.
本示例的事实方式中,服务器210与服务器220为区块链中的节点服务器,进而基于区块链中数据不可更改以及安全性,服务器210可以安全可靠地从服务器220中获取到目标主体所关联账号的账号属性数据集。In the factual manner of this example, the server 210 and the server 220 are node servers in the blockchain, and based on the immutability and security of the data in the blockchain, the server 210 can safely and reliably obtain the association of the target subject from the server 220 The account attribute data set of the account.
每个账号的账号属性数据集中包括账号相关属性字段的字段数据,可以包括手机号、设备、网络环境、登录密码等相关属性字段的字段数据,例如,账号密码、手机号及登录设备id等。目标主体可以是任意的企业或者平台等。The account attribute data set of each account includes field data of account-related attribute fields, which may include field data of related attribute fields such as mobile phone number, device, network environment, and login password, such as account password, mobile phone number, and login device id. The target subject can be any enterprise or platform.
用户的手机号所绑定账号的数目超过预定数目时,说明用户的手机号绑定的账号具有黑产行为嫌疑,其中,预定数目可以根据实际情况设定,为预设的一个手机号所关联的账号的标准数量,一个手机号所关联的账号超过该阈值说明具有黑产嫌疑,例如可以是5个等。When the number of accounts bound to the user's mobile phone number exceeds the predetermined number, it means that the account bound to the user's mobile phone number is suspected of being a fraudster. The predetermined number can be set according to the actual situation and is associated with a preset mobile phone number. The standard number of accounts, and the number of accounts associated with a mobile phone number exceeds this threshold, indicating that there is a suspicion of black production, for example, it can be 5 and so on.
一示例中,可以通过应用端采集到的网络环境、设备参数、注册密码等账号属性数据。网络黑产可能伪装网络环境、设备参数、注册密码等维度,但注册绑定同一用户手机号的账号指标需求是不可能绕过的。例如,目标主体对于用户手机号的注册绑定账号要求是每个月底生效时,可以先根据绑定同一用户手机号一个月内不小于目标主体绑定账号所设定的预定数目,进行捞取账号属性数据集。In an example, account attribute data such as the network environment, device parameters, and registration passwords collected by the application can be used. Network black products may disguise dimensions such as network environment, device parameters, registration passwords, etc., but it is impossible to bypass the account indicator requirements for registering and binding the same user's mobile phone number. For example, when the target entity's requirement for the user's mobile phone number to register and bind the account is to take effect at the end of each month, it can first obtain accounts according to the predetermined number set by the target entity's bound account number within one month of binding the same user's mobile phone number. attribute dataset.
确定用户的手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,可以对目标主体关联的所有账号进行初步筛选,排除手机号所绑定账号小于预定数目的账号,得到剩余的待检测的账号属性数据集,缩小检测范围的同时提升检测准确性。When it is determined that the number of accounts bound to the user's mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and all accounts associated with the target subject can be preliminarily screened, and accounts whose mobile phone number is bound to an account number less than the predetermined number are excluded. , to obtain the remaining account attribute data sets to be detected, which reduces the detection range and improves the detection accuracy.
一种实施例中,参考图3所示,确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集包括:In one embodiment, referring to FIG. 3 , when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, acquiring the account attribute data set of the account includes:
步骤S310,获取所述目标主体与所述手机号的业务关联条件,所述业务关联条件指 示所述手机号在目标业务中可以绑定账号的数目阈值,所述目标业务来源于所述目标主体;Step S310, obtain the business association condition between the target entity and the mobile phone number, the business association condition indicates the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity ;
步骤S320,当绑定了所述手机号的账号超过所述数目阈值时,获取所述账号的账号属性数据集。Step S320, when the account number bound with the mobile phone number exceeds the number threshold, acquire the account attribute data set of the account number.
业务关联条件指示用户的手机号在目标业务中可以绑定账号的数目阈值,即在目标主体举行的某次业务活动中设定的手机号可以绑定账号的数目阈值,与目标业务相适应,实现根据不同的业务精准监控黑产嫌疑账号。The business association condition indicates the threshold of the number of accounts that can be bound to the user's mobile phone number in the target business, that is, the threshold of the number of accounts that can be bound to the mobile phone number set in a certain business activity held by the target entity, which is suitable for the target business. Realize accurate monitoring of suspected black-produced accounts according to different businesses.
步骤S120,将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图。In step S120, the attribute field data in the account attribute data set is used as a connection edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject.
在本示例的实施方式中,将账号属性数据集中的属性字段数据作为连接边及将手机号作为顶点构建检测图,即将账号关联的手机号作为顶点,各账号之间根据字段数据的关联关系,将字段作为连接边,连接各关联账号,得到检测图,可以包含获取的账号之间的各种关联关系。In the implementation of this example, the attribute field data in the account attribute data set is used as the connection edge and the mobile phone number is used as the vertex to construct the detection graph, that is, the mobile phone number associated with the account is used as the vertex, and the association relationship between the accounts according to the field data, Using the field as a connection edge, connect the associated accounts to obtain a detection graph, which can include various associations between the acquired accounts.
一种实施例中,将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图,包括:In an embodiment, the attribute field data in the account attribute data set is used as a connecting edge, and the mobile phone number is used as a vertex to construct an account detection graph of the target subject, including:
获取所述账号属性数据集中的指纹型字段和类别型字段,所述指纹型字段至少包括登录设备标识、登录密码和登录设备开机时间,所述类别型字段至少包括登录设备机型、系统版本、设备总存储空间、登录网络地址、无线网卡的物理地址;Obtain a fingerprint type field and a category type field in the account attribute data set, the fingerprint type field at least includes the login device ID, login password and the boot time of the login device, and the category type field at least includes the login device model, system version, The total storage space of the device, the login network address, and the physical address of the wireless network card;
将第一预定数目个所述指纹型字段的组合及第二预定数目个所述类别型字段的组合在所述账号属性数据集中的字段数据组合作为连接边,并将所述字段数据组合对应的手机号作为顶点构建账号检测图。The combination of the first predetermined number of the fingerprint-type fields and the combination of the second predetermined number of the category-type fields in the account attribute data set is used as a connection edge, and the corresponding field data combination is combined. The mobile phone number is used as a vertex to construct an account detection graph.
将字段定义成两类:指纹型字段和类别型字段。对于指纹型字段,任意第一预定数目个字段数据放在一起可作为检测图的连接边;而类别型字段需要第二预定数目个字段数据、放在一起作为检测图的连接边。Define fields into two categories: fingerprint-type fields and category-type fields. For a fingerprint type field, any first predetermined number of field data can be used as a connection edge of the detection graph; while a category type field requires a second predetermined number of field data, which can be put together as a connection edge of the detection map.
一种实施例中,所述第一预定数目为2,所述第二预定数目大于等于3且小于等于5。In an embodiment, the first predetermined number is 2, and the second predetermined number is greater than or equal to 3 and less than or equal to 5.
指纹型字段单独一个字段作为连接边进行筛选,也可以两个组合作为边在一起,可以有效避免误伤碰撞的情况。例如,黑产改某个指纹型字段的字段数据碰巧和正常账号的一样了,两个放在一起组合作为连接边使用,减少了误伤碰撞的概率。同样,类别型字段多个放在一起也是更加精准的筛选数据。A single field of the fingerprint field can be used as a connection edge for filtering, or two combinations can be used as edges together, which can effectively avoid accidental collisions. For example, the field data of a certain fingerprint field changed by a black product happened to be the same as that of a normal account. The two were combined together and used as a connection edge, which reduced the probability of accidental injury and collision. Similarly, multiple categorical fields are put together to filter data more accurately.
例如,(a)Ios系统指纹型变量是登录设备标识id、登录密码和登录设备开机时间boottime。单个字段数据与手机号个数的对应关系如(a1-a3):For example, (a) Ios system fingerprint type variables are the logon device identification id, the logon password, and the boottime boottime of the logon device. The correspondence between single field data and the number of mobile phone numbers is as follows (a1-a3):
(a1)设备id个数和手机号个数对应关系是1:1.06。(a2)登录密码个数和手机号个数关系是1:1.51。(a3)boottime个数和手机号个数关系是1:1.18。(a1) The corresponding relationship between the number of device IDs and the number of mobile phone numbers is 1:1.06. (a2) The relationship between the number of login passwords and the number of mobile phone numbers is 1:1.51. (a3) The relationship between the number of boottimes and the number of mobile phone numbers is 1:1.18.
而如(a4-a6)中两两组合起来,两个字段数据组合和手机号个数几乎是一对一的关系:(a4)设备id和boottime放在一起和手机号个数关系是1:1.04。(a5)登录密码和boottime放在一起和手机号个数关系是1:1.01。(a6)设备id和登录密码放在一起和手机号个数关系是1:1.02。For example, if (a4-a6) are combined in pairs, there is almost a one-to-one relationship between the data combination of the two fields and the number of mobile phone numbers: (a4) The relationship between the device id and boottime and the number of mobile phone numbers is 1: 1.04. (a5) The relationship between the login password and boottime and the number of mobile phone numbers is 1:1.01. (a6) The relationship between the device id and the login password and the number of mobile phone numbers is 1:1.02.
例如,类别型变量包括登录设备机型、系统版本、设备总存储空间、登录网络地址ip、无线网卡的物理地址wifimac等,通常,除了上述指纹型变量以外,都可以归结为类别型变量。单个字段数据与手机号个数的对应关系如(b1-b2):(b1)机型个数和手机号个数关系是1:28470.36,且机型数目总数通常为70种。(b2)设备总存储空间个数和手机号个数关系是1:134.34。通过组合可以有效减少对应手机号个数。For example, categorical variables include login device model, system version, total device storage space, login network address ip, physical address of wireless network card wifimac, etc. Usually, except for the above fingerprint variables, all can be attributed to categorical variables. The corresponding relationship between the single field data and the number of mobile phone numbers is (b1-b2): (b1) The relationship between the number of models and the number of mobile phone numbers is 1:28470.36, and the total number of models is usually 70. (b2) The relationship between the total storage space of the device and the number of mobile phone numbers is 1:134.34. Through the combination, the number of corresponding mobile phone numbers can be effectively reduced.
步骤S130,基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇。Step S130: Perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters.
在本示例的实施方式中,可以使用现有的图聚类方法对账号检测图进行图聚类,得 到账号聚类簇。这样可以基于构建账号检测图构建出账号的关系网,并基于属性字段数据进行账号的聚类,得到相似的账号聚类簇。In the implementation of this example, the account detection graph can be graph-clustered using the existing graph clustering method to obtain account clusters. In this way, the relationship network of accounts can be constructed based on the construction of the account detection graph, and the accounts can be clustered based on the attribute field data to obtain similar account clusters.
一种实施例中,基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇,包括:In one embodiment, graph clustering is performed on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters, including:
基于所述连接边的属性字段数据,对所述账号检测图利用Connected Component算法进行图聚类处理,得到多个账号群体;Based on the attribute field data of the connected edges, the account detection graph is subjected to graph clustering processing using the Connected Component algorithm to obtain multiple account groups;
从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一登录网络地址的账号群体,得到第一账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
将所述第一账号群体组合及所述第二账号群体组合确定为所述账号聚类簇。The first account group combination and the second account group combination are determined as the account cluster.
手机号为顶点,上述步骤中定义的连接边,使用Connected Component算法图聚类计算,得到多个节点簇。The mobile phone number is the vertex, and the connected edges defined in the above steps are calculated by graph clustering using the Connected Component algorithm to obtain multiple node clusters.
Connected Components算法,即连通体算法用一标识id标注图中每个连通体(多个账号群体),将连通体中序号最小的顶点的标识id作为连通体的标识id。如果在图G中,任意2个顶点(手机号)之间都存在路径,那么称G为连通图,否则称该图为非连通图,则其中的极大连通子图称为连通体。The Connected Components algorithm, that is, the connected body algorithm labels each connected body (multiple account groups) in the graph with an ID, and uses the ID of the vertex with the smallest sequence number in the connected body as the ID of the connected body. If there is a path between any two vertices (mobile phone numbers) in the graph G, then G is called a connected graph, otherwise the graph is called a non-connected graph, and the maximally connected subgraph is called a connected body.
然后,再二次图聚类,以第一次聚类结果的群编号(标识id)为顶点,首先,从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一登录网络地址的账号群体,得到第一账号群体组合,例如,获取包含手机号个数大于等于3且关联于同一登录网络地址的账号群体,得到第一账号群体组合。然后,从所述多个账号群体中,获取包含手机号个数大于等于所述预定个数且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合,例如,(获取包含手机号个数大于等于3且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合。二次图聚类的使用主要应对于秒拨动态ip(登录网络地址、无线网卡的物理地址)和将本应该是同一团伙的小群体合并。Then, perform a second graph clustering, taking the group number (identification id) of the first clustering result as the vertex, first, from the multiple account groups, obtain the number of mobile phone numbers that are greater than or equal to a predetermined number and are associated with Obtain a first account group combination from account groups with the same login network address. For example, acquire account groups that include a number of mobile phone numbers greater than or equal to 3 and are associated with the same login network address to obtain the first account group combination. Then, from the plurality of account groups, obtain account groups that include mobile phone numbers greater than or equal to the predetermined number and are associated with the physical address of the same wireless network card, and obtain a second account group combination, for example, (obtaining The number of accounts is greater than or equal to 3 and is associated with the account group of the physical address of the same wireless network card, and the second account group combination is obtained. The use of the quadratic graph clustering mainly corresponds to the second dial dynamic ip (login network address, the physical address of the wireless network card ) and merging small groups that were supposed to be the same gang.
例如,首先,当A、B及C三个账号群体基于ip连接;然后,当A、D及E三个群体连接;这样的话:A、B及C、D、E则是共同算一个团伙账号聚类簇。For example, first, when the three account groups A, B and C are connected based on ip; then, when the three groups A, D and E are connected; in this case: A, B and C, D, E are jointly counted as a group account cluster clusters.
黑产会对ip进行伪装几个手机号换一次ip或者wifimac,这样,在一次图聚类结果中,存在手机号个数比较少的群组,而群组中ip或wifimac是一样的,将这些群组id作为顶点,ip或wifimac作为连接边实现二次聚类。The black product will disguise the ip of several mobile phone numbers and change the ip or wifimac. In this way, in a graph clustering result, there are groups with a relatively small number of mobile phone numbers, and the ip or wifimac in the group are the same. These group ids are used as vertices, and ip or wifimac are used as connecting edges to achieve quadratic clustering.
步骤S140,利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档。Step S140, using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject .
在本示例的实施方式中,第一字段数据文档及第二字段数据文档可以是文本文档或者表格等。In the implementation of this example, the first field data document and the second field data document may be text documents or tables.
目标主体所对应白名单账号,即白名单用户的账号,可以为目标主体对应的主体内部用户的账号属性数据集,例如,某个机构的员工的账号相关数据,可以确定为非黑产的数据。The whitelist account corresponding to the target subject, that is, the account of the whitelisted user, can be the account attribute data set of the internal user of the subject corresponding to the target subject. For example, the account-related data of an employee of an organization can be determined as non-black data. .
目标主体所对应白名单账号的第二字段数据文档,可以通过的白名单账号的属性字段数据生成。The second field data document of the whitelist account corresponding to the target subject can be generated from the attribute field data of the whitelist account.
通过生成第一字段数据文档及第二字段数据文档可以便于基于数据文档进行数据分析,同时通过正常的第二字段数据文档作为第一字段数据文档的对照,保证黑产账号检测的准确性。By generating the first field data file and the second field data file, it is convenient to perform data analysis based on the data file, and at the same time, the normal second field data file is used as the comparison of the first field data file to ensure the accuracy of the detection of the black product account.
步骤S150,根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性。Step S150, according to the first field data document and the second field data document, calculate the weight of each word in the first field data document, the weight indicating that each word is in the first field Importance in the data document relative to the second field in the data document.
在本示例的实施方式中,通过计算指示每个词在第一字段数据文档中相对于在第二字段数据文档中的重要性的权重,可以获取每个账号聚类簇的第一字段数据文档中权重“独特”的词(即属性字段数据),进而,账号团伙中存在该团伙“独特”的属性字段数据,大概率是黑产修改的模拟器参数。In the implementation of this example, the first field data document of each account cluster can be obtained by calculating a weight indicating the importance of each word in the first field data document relative to the second field data document Words with "unique" weights (that is, attribute field data), and furthermore, if there is "unique" attribute field data of the group in the account gang, there is a high probability that it is a simulator parameter modified by black products.
一种实施例中,根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:In one embodiment, calculating the weight of each word in the first field data document according to the first field data document and the second field data document, including:
对于所述第一字段数据文档中的每个词,计算每个词在所述第一字段数据文档中出现的第一频率;For each word in the first field data document, calculating the first frequency of occurrence of each word in the first field data document;
对于所述第一字段数据文档中的每个词,计算每个所述词在所述第一字段数据文档及所述第二字段数据文档中同时出现的第二频率;For each word in the first field data document, calculating a second frequency of simultaneous occurrence of each of the words in the first field data document and the second field data document;
将所述第一频率与所述第二频率的乘积,作为每个所述词的权重。The product of the first frequency and the second frequency is used as the weight of each of the words.
计算每个词在第一字段数据文档中出现的第一频率,可以得到每个词在待检测的第一字段数据文档中的重要性;然后,计算每个词在第一字段数据文档及第二字段数据文档中同时出现的第二频率,可以得到每个词的全局重要性,最后将第一频率与第二频率的乘积,作为每个词的权重,可以通过权重从全局数据集的角度指示每个词在第一字段数据文档中相对于在第二字段数据文档中的重要性。Calculate the first frequency that each word appears in the first field data document, and the importance of each word in the first field data document to be detected can be obtained; then, calculate the first field data document and the first field data document of each word. The second frequency that appears simultaneously in the two-field data document can obtain the global importance of each word. Finally, the product of the first frequency and the second frequency is used as the weight of each word, which can be obtained from the perspective of the global data set through the weight. Indicates the importance of each word in the first field data document relative to the second field data document.
一种实施例中,根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:In one embodiment, calculating the weight of each word in the first field data document according to the first field data document and the second field data document, including:
基于公式tf-idf(t,d)=tf(t,d)×idf(t),idf(t)=log(N/e)计算每个词的权重,其中,所述tf-id(tf,d)为权重,所述t为词,所述d为所述第一字段数据文档,所述tf(t,d)为所述词在所述第一字段数据文档中出现的频数,所述idf(t)是文档中词的逆向文本频率,所述N为所述第一字段数据文档及所述第二字段数据文档的总数,e是所述第一字段数据文档及所述第二字段数据文档中出现所述词的文档数量。The weight of each word is calculated based on the formula tf-idf(t,d)=tf(t,d)×idf(t), idf(t)=log(N/e), wherein the tf-id(tf , d) is the weight, the t is the word, the d is the first field data document, the tf(t, d) is the frequency of the word appearing in the first field data document, so The idf(t) is the inverse text frequency of the word in the document, the N is the total number of the first field data document and the second field data document, and e is the first field data document and the second field data document. The number of documents in which the word appears in the field data document.
通过公式tf-idf(t,d)=tf(t,d)×idf(t),idf(t)=log(N/e)可以通过TF-IDF算法可以精准、高效识别黑产团伙账号(黑产账号聚类簇)的模拟器参数。团伙账号中存在TF-IDF权值较大的词,说明团伙账号中存在该团伙账号“独特”的词,大概率是模拟器参数。这样可以节省检测资源,同样的检测个数条件下,按照TF-IDF权重排序可以捞出更多的黑产团伙账号。实验证明,按照此标准排序,可以在相同的检测数量条件下,发现更多的黑产团伙账号。Through the formula tf-idf(t,d)=tf(t,d)×idf(t), idf(t)=log(N/e), the TF-IDF algorithm can accurately and efficiently identify the account of the black gang ( The simulator parameters of the black production account cluster). There are words with large TF-IDF weights in the gang account, indicating that there are words that are "unique" for the gang account in the gang account, and the high probability is the simulator parameter. In this way, detection resources can be saved. Under the same number of detections, more black gang accounts can be fished out according to the TF-IDF weight sorting. Experiments have shown that by sorting according to this standard, more black gang accounts can be found under the same number of detections.
步骤S160,基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Step S160: Determine a cluster of black product accounts based on the weight of each of the words, and obtain a black product account group associated with the target subject.
通过每个词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性,可靠判断账号聚类簇是否黑产账号团伙。According to the importance of each word in the first field data document relative to the second field data document, it can be reliably judged whether the account cluster is a black-producing account group.
一种实施例中,基于每个所述词的权重确定黑产账号聚类簇,包括:In one embodiment, the weight of each described word is used to determine the clusters of black production accounts, including:
确定所述权重高于预定权值的词所来源的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document from which the word with the weight higher than the predetermined weight comes from, as the black product data document;
将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
预定权值可以根据实际情况设定。存在权重高于预定权值的词,说明权重高于预定权值的词所来源的第一字段数据文档中数据异常,确定为黑产数据文档,进而,可以,将黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The predetermined weight can be set according to the actual situation. There are words with weights higher than the predetermined weights, indicating that the data in the first field data document from which the words with weights higher than the predetermined weights come from are abnormal, and it is determined as a black production data document, and then, yes, the account corresponding to the black production data document is The cluster is determined to be a black-produced account gang.
一种实施例中,基于每个所述词的权重确定黑产账号聚类簇,包括:In one embodiment, the weight of each described word is used to determine the clusters of black production accounts, including:
计算每个所述第一字段数据文档中词的权重平均值;Calculate the average weight of words in each of the first field data documents;
确定所述权重平均值高于预定平均值的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document with the weight average value higher than a predetermined average value as a black product data document;
将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
计算每个第一字段数据文档中词的权重平均值,可以综合考虑所有词的权重,基于第一字段数据文档全局考虑账号聚类簇的异常情况。进而,确定权重平均值高于预定平均值的第一字段数据文档,作为黑产数据文档,可以从全局可靠检测出黑产账号团伙。To calculate the average weight of words in each first field data document, the weights of all words can be comprehensively considered, and the abnormal situation of account clusters can be considered globally based on the first field data document. Furthermore, the first field data document whose weight average value is higher than the predetermined average value is determined as the black product data document, and the black product account gang can be reliably detected globally.
本申请还提供了一种基于人工智能的黑产账号检测装置。参考图4所示,该基于人工智能的黑产账号检测装置可以包括获取模块410、构建模块420、聚类模块430、生成模块440、计算模块450及确定模块460。其中:The present application also provides an artificial intelligence-based black production account detection device. Referring to FIG. 4 , the artificial intelligence-based black production account detection device may include an acquisition module 410 , a construction module 420 , a clustering module 430 , a generation module 440 , a calculation module 450 and a determination module 460 . in:
获取模块410可以用于确定用户的手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述用户关联于所述目标主体;The acquisition module 410 can be used to determine that when the number of accounts bound to the user's mobile phone number exceeds a predetermined number, acquire the account attribute data set of the account, and the user is associated with the target subject;
构建模块420可以用于将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;The building module 420 can be configured to use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject;
聚类模块430可以用于基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;The clustering module 430 may be configured to perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram, to obtain a plurality of account clusters;
生成模块440可以用于利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;The generating module 440 can be configured to use the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and obtain the second data of the whitelist account corresponding to the target subject. field data document;
计算模块450可以用于根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;The calculation module 450 may be configured to calculate the weight of each word in the first field data document according to the first field data document and the second field data document, the weight indicating that each word is in the importance in the first field data document relative to the second field data document;
确定模块460可以用于基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。The determining module 460 may be configured to determine a cluster of black product accounts based on the weight of each of the words, and obtain a black product account group associated with the target subject.
在一个实施例中,所述获取模块被进一步配置为:In one embodiment, the obtaining module is further configured to:
获取目标主体与所述手机号的业务关联条件,所述业务关联条件指示所述手机号在目标业务中可以绑定账号的数目阈值,所述目标业务来源于所述目标主体;Obtain the business association condition between the target entity and the mobile phone number, the business association condition indicates the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity;
当绑定了所述手机号的账号超过所述数目阈值时,获取所述账号的账号属性数据集。When the account number bound to the mobile phone number exceeds the number threshold, the account attribute data set of the account number is acquired.
在一个实施例中,所述聚类模块被进一步配置为:In one embodiment, the clustering module is further configured to:
基于所述连接边的属性字段数据,对所述账号检测图利用Connected Component算法进行图聚类处理,得到多个账号群体;Based on the attribute field data of the connected edges, the account detection graph is subjected to graph clustering processing using the Connected Component algorithm to obtain multiple account groups;
从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一登录网络地址的账号群体,得到第一账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
将所述第一账号群体组合及所述第二账号群体组合确定为所述账号聚类簇。The first account group combination and the second account group combination are determined as the account cluster.
在一个实施例中,所述计算模块被进一步配置为:In one embodiment, the computing module is further configured to:
对于所述第一字段数据文档中的每个词,计算每个词在所述第一字段数据文档中出现的第一频率;For each word in the first field data document, calculating the first frequency of occurrence of each word in the first field data document;
对于所述第一字段数据文档中的每个词,计算每个所述词在所述第一字段数据文档及所述第二字段数据文档中同时出现的第二频率;For each word in the first field data document, calculating a second frequency of simultaneous occurrence of each of the words in the first field data document and the second field data document;
将所述第一频率与所述第二频率的乘积,作为每个所述词的权重。The product of the first frequency and the second frequency is used as the weight of each of the words.
在一个实施例中,所述计算模块被进一步配置为:In one embodiment, the computing module is further configured to:
基于公式tf-idf(t,d)=tf(t,d)×idf(t),idf(t)=log(N/e)计算每个词的权重,其中,所述tf-id(tf,d)为权重,所述t为词,所述d为所述第一字段数据文档,所述tf(t,d)为所述词在所述第一字段数据文档中出现的频数,所述idf(t)是文档中词的逆向文本频率,所述N为所述第一字段数据文档及所述第二字段数据文档的总数,e是所述第一字段数据文档及所述第二字段数据文档中出现所述词的文档数量。The weight of each word is calculated based on the formula tf-idf(t,d)=tf(t,d)×idf(t), idf(t)=log(N/e), wherein the tf-id(tf , d) is the weight, the t is the word, the d is the first field data document, the tf(t, d) is the frequency of the word appearing in the first field data document, so The idf(t) is the inverse text frequency of the word in the document, the N is the total number of the first field data document and the second field data document, and e is the first field data document and the second field data document. The number of documents in which the word appears in the field data document.
在一个实施例中,所述确定模块被进一步配置为:In one embodiment, the determining module is further configured to:
确定所述权重高于预定权值的词所来源的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document from which the word with the weight higher than the predetermined weight comes from, as the black product data document;
将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
在一个实施例中,所述确定模块被进一步配置为:In one embodiment, the determining module is further configured to:
计算每个所述第一字段数据文档中词的权重平均值;Calculate the average weight of words in each of the first field data documents;
确定所述权重平均值高于预定平均值的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document with the weight average value higher than a predetermined average value as a black product data document;
将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the apparatus for action performance are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied.
此外,尽管在附图中以特定顺序描述了本申请中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。Additionally, although the various steps of the methods of the present application are depicted in the figures in a particular order, this does not require or imply that the steps must be performed in the particular order or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, and the like.
根据本申请的第三方面,还提供了一种计算机设备,执行上述任一所示的基于人工智能的黑产账号检测方法的全部或者部分步骤。该计算机设备包括:According to a third aspect of the present application, a computer device is also provided, which performs all or part of the steps of any of the above-mentioned artificial intelligence-based methods for detecting fraudulent accounts. The computer equipment includes:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上述任一个示例性实施例所示出的基于人工智能的黑产账号检测方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute as illustrated in any of the above-described exemplary embodiments The artificial intelligence-based black production account detection method.
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。As will be appreciated by one skilled in the art, various aspects of the present application may be implemented as a system, method or program product. Therefore, various aspects of the present application can be embodied in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", "module" or "system".
下面参照图5来描述根据本申请的这种实施方式的计算机设备500。图5显示的计算机设备500仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。A computer device 500 according to this embodiment of the present application is described below with reference to FIG. 5 . The computer device 500 shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
如图5所示,计算机设备500以通用计算设备的形式表现。计算机设备500的组件可以包括但不限于:上述至少一个处理单元510、上述至少一个存储单元520、连接不同系统组件(包括存储单元520和处理单元510)的总线530。As shown in FIG. 5, computer device 500 takes the form of a general-purpose computing device. Components of the computer device 500 may include, but are not limited to, the above-mentioned at least one processing unit 510 , the above-mentioned at least one storage unit 520 , and a bus 530 connecting different system components (including the storage unit 520 and the processing unit 510 ).
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元510执行,使得所述处理单元510执行本说明书上述“实施例方法”部分中描述的根据本申请各种示例性实施方式的步骤。例如,所述处理单元510可以执行如图1中所示的步骤S110,确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;步骤S120,将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;步骤S130,基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;步骤S140,利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;步骤S150,根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;步骤S160,基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Wherein, the storage unit stores program codes, and the program codes can be executed by the processing unit 510, so that the processing unit 510 executes various exemplary methods according to the present application described in the above-mentioned “Methods of Embodiments” of this specification. Implementation steps. For example, the processing unit 510 may execute step S110 as shown in FIG. 1 , and when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, obtain the account attribute data set of the account, and the mobile phone number is derived from the target The account database of the subject; Step S120, use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject; Step S130, based on the account detection graph Perform graph clustering on the accounts in the account detection diagram by using the attribute field data of the connecting edges in the middle to obtain a plurality of account clusters; step S140, use the attribute field data of each of the account clusters to generate each of the account clusters. The first field data file of the account cluster, and the second field data file of the whitelist account corresponding to the target subject is obtained; Step S150, according to the first field data file and the second field data file, calculate The weight of each word in the first field data document, the weight indicating the importance of each word in the first field data document relative to the second field data document; Step S160, Based on the weight of each of the words, a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
存储单元520可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)5201和/或高速缓存存储单元5202,还可以进一步包括只读存储单元(ROM)5203。The storage unit 520 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 5201 and/or a cache storage unit 5202 , and may further include a read only storage unit (ROM) 5203 .
存储单元520还可以包括具有一组(至少一个)程序模块5205的程序/实用工具5204,这样的程序模块5205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, An implementation of a network environment may be included in each or some combination of these examples.
总线530可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 530 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
计算机设备500也可以与一个或多个外部设备700(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该计算机设备500交互的设备通信,和/或与使得该计算机设备500能与一个或多个其它计算机设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口550进行,还可以包括与输入/输出(I/O)接口550连接的显示单元540。并且,计算机设备500还可以通过网络适配器560与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器560通过总线530与计算机设备500的其它模块通信。应当明白,尽管图中未示出,可以结合计算机设备500使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。 Computer device 500 may also communicate with one or more external devices 700 (eg, keyboards, pointing devices, Bluetooth devices, etc.), may also communicate with one or more devices that enable a user to interact with the computer device 500, and/or communicate with Any device (eg, router, modem, etc.) that enables the computer device 500 to communicate with one or more other computer devices. Such communication may take place through an input/output (I/O) interface 550 , which may also include a display unit 540 coupled to the input/output (I/O) interface 550 . Also, the computer device 500 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 560 . As shown, network adapter 560 communicates with other modules of computer device 500 via bus 530 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with computer device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算机设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本申请实施方式的方法。From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present application may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computer device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiment of the present application.
根据本申请的第四方面,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品,所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。According to a fourth aspect of the present application, there is also provided a computer-readable storage medium on which a program product capable of implementing the above-mentioned method of the present specification is stored, and the computer-readable storage medium may be non-volatile or easily accessible. loss of sex. In some possible implementations, various aspects of the present application can also be implemented in the form of a program product, which includes program code, which is used to cause the program product to run on a terminal device when the program product is executed. The terminal device performs the steps according to various exemplary embodiments of the present application described in the above-mentioned "Example Method" section of this specification.
参考图6所示,描述了根据本申请的实施方式的用于实现上述方法的程序产品600,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Referring to FIG. 6, a program product 600 for implementing the above method according to an embodiment of the present application is described, which can adopt a portable compact disk read only memory (CD-ROM) and include program codes, and can be used in a terminal device, For example running on a personal computer. However, the program product of the present application is not limited thereto, and in this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、 光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机设备上执行、部分地在用户计算机设备上执行、作为一个独立的软件包执行、部分在用户计算机设备上部分在远程计算机设备上执行、或者完全在远程计算机设备或服务器上执行。在涉及远程计算机设备的情形中,远程计算机设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机设备,或者,可以连接到外部计算机设备(例如利用因特网服务提供商来通过因特网连接)。Program code for carrying out the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer device, partly on the user's computer device, as a stand-alone software package, partly on the user's computer device and partly on a remote computer device, or entirely on the remote computer device or execute on the server. Where remote computer equipment is involved, the remote computer equipment may be connected to the user computer equipment via any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to external computer equipment (eg, using an Internet service provider business via an Internet connection).
此外,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned figures are only schematic illustrations of the processes included in the methods according to the exemplary embodiments of the present application, and are not intended to be limiting. It is easy to understand that the processes shown in the above figures do not indicate or limit the chronological order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, in multiple modules.
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It should be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (20)

  1. 一种基于人工智能的黑产账号检测方法,包括:An artificial intelligence-based black production account detection method, including:
    确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;When it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject;
    将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;Using the attribute field data in the account attribute data set as a connecting edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject;
    基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;Perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connected edges in the account detection diagram to obtain a plurality of account clusters;
    利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;Using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject;
    根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;From the first field data document and the second field data document, calculate a weight for each word in the first field data document, the weight indicating that each of the words is in the first field data document relative to the importance in the second field data document;
    基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Based on the weight of each of the words, a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
  2. 根据权利要求1所述的方法,其中,所述确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,包括:The method according to claim 1, wherein, when the number of accounts bound to the mobile phone number is determined to exceed a predetermined number, acquiring the account attribute data set of the account, comprising:
    获取目标主体与所述手机号的业务关联条件,所述业务关联条件指示所述手机号在目标业务中可以绑定账号的数目阈值,所述目标业务来源于所述目标主体;Obtain the business association condition between the target entity and the mobile phone number, the business association condition indicates the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity;
    当绑定了所述手机号的账号超过所述数目阈值时,获取所述账号的账号属性数据集。When the account number bound to the mobile phone number exceeds the number threshold, the account attribute data set of the account number is acquired.
  3. 根据权利要求1所述的方法,其中,所述基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇,包括:The method according to claim 1, wherein the graph clustering of the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters, comprising:
    基于所述连接边的属性字段数据,对所述账号检测图利用Connected Component算法进行图聚类处理,得到多个账号群体;Based on the attribute field data of the connected edges, the account detection graph is subjected to graph clustering processing using the Connected Component algorithm to obtain multiple account groups;
    从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一登录网络地址的账号群体,得到第一账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
    从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
    将所述第一账号群体组合及所述第二账号群体组合确定为所述账号聚类簇。The first account group combination and the second account group combination are determined as the account cluster.
  4. 根据权利要求1所述的方法,其中,所述根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:The method according to claim 1, wherein calculating the weight of each word in the first field data document according to the first field data document and the second field data document comprises:
    对于所述第一字段数据文档中的每个词,计算每个词在所述第一字段数据文档中出现的第一频率;For each word in the first field data document, calculating the first frequency of occurrence of each word in the first field data document;
    对于所述第一字段数据文档中的每个词,计算每个所述词在所述第一字段数据文档及所述第二字段数据文档中同时出现的第二频率;For each word in the first field data document, calculating a second frequency of simultaneous occurrence of each of the words in the first field data document and the second field data document;
    将所述第一频率与所述第二频率的乘积,作为每个所述词的权重。The product of the first frequency and the second frequency is used as the weight of each of the words.
  5. 根据权利要求1所述的方法,其中,所述根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:The method according to claim 1, wherein calculating the weight of each word in the first field data document according to the first field data document and the second field data document comprises:
    基于公式tf-idf(t,d)=tf(t,d)×idf(t),idf(t)=log(N/e)计算每个词的权重,其中,所述tf-id(tf,d)为权重,所述t为词,所述d为所述第一字段数据文档,所述tf(t,d)为所述词在所述第一字段数据文档中出现的频数,所述idf(t)是文档中词的逆向文本频率,所述N为所述第一字段数据文档及所述第二字段数据文档的总数,e是所述第一字段数据文档及所述第二字段数据文档中出现所述词的文档数量。The weight of each word is calculated based on the formula tf-idf(t,d)=tf(t,d)×idf(t), idf(t)=log(N/e), wherein the tf-id(tf , d) is the weight, the t is the word, the d is the first field data document, the tf(t, d) is the frequency of the word appearing in the first field data document, so The idf(t) is the inverse text frequency of the word in the document, the N is the total number of the first field data document and the second field data document, and e is the first field data document and the second field data document. The number of documents in which the word appears in the field data document.
  6. 根据权利要求1所述的方法,其中,所述基于每个所述词的权重确定黑产账号聚类簇,包括:The method according to claim 1, wherein the determining of the black product account clusters based on the weight of each of the words comprises:
    确定所述权重高于预定权值的词所来源的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document from which the word with the weight higher than the predetermined weight comes from, as the black product data document;
    将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
  7. 根据权利要求1所述的方法,其中,所述基于每个所述词的权重确定黑产账号聚类簇,包括:The method according to claim 1, wherein the determining of the black product account clusters based on the weight of each of the words comprises:
    计算每个所述第一字段数据文档中词的权重平均值;Calculate the average weight of words in each of the first field data documents;
    确定所述权重平均值高于预定平均值的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document with the weight average value higher than a predetermined average value as a black product data document;
    将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
  8. 一种基于人工智能的黑产账号检测装置,包括:An artificial intelligence-based black production account detection device, comprising:
    获取模块,用于确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;an acquisition module, configured to acquire an account attribute data set of the account when it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, and the mobile phone number is derived from the account database of the target subject;
    构建模块,用于将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;A building module, used to use the attribute field data in the account attribute data set as a connection edge, and use the mobile phone number as a vertex to construct an account detection graph of the target subject;
    聚类模块,用于基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;a clustering module, configured to perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram, to obtain a plurality of account clusters;
    生成模块,用于利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;The generating module is used for using the attribute field data of each of the account clusters to generate the first field data document of each of the account clusters, and to obtain the second field of the whitelist account corresponding to the target subject data files;
    计算模块,用于根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;A calculation module, configured to calculate the weight of each word in the first field data document according to the first field data document and the second field data document, and the weight indicates that each of the words is in the first field data document. importance in a field data document relative to the second field data document;
    确定模块,用于基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。The determining module is configured to determine the black production account clusters based on the weight of each of the words, and obtain the black production account groups associated with the target subject.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is caused to execute:
    确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;When it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject;
    将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;Using the attribute field data in the account attribute data set as a connecting edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject;
    基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;Perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connected edges in the account detection diagram to obtain a plurality of account clusters;
    利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;Using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject;
    根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;From the first field data document and the second field data document, calculate a weight for each word in the first field data document, the weight indicating that each of the words is in the first field data document relative to the importance in the second field data document;
    基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Based on the weight of each of the words, a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
  10. 根据权利要求9所述的计算机设备,其中,所述确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,包括:The computer device according to claim 9, wherein, when the number of accounts bound to the mobile phone number is determined to exceed a predetermined number, acquiring the account attribute data set of the account, comprising:
    获取目标主体与所述手机号的业务关联条件,所述业务关联条件指示所述手机号在目标业务中可以绑定账号的数目阈值,所述目标业务来源于所述目标主体;Obtain the business association condition between the target entity and the mobile phone number, the business association condition indicates the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity;
    当绑定了所述手机号的账号超过所述数目阈值时,获取所述账号的账号属性数据集。When the account number bound to the mobile phone number exceeds the number threshold, the account attribute data set of the account number is acquired.
  11. 根据权利要求9所述的计算机设备,其中,所述基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇,包括:The computer device according to claim 9, wherein the graph clustering of the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters, comprising:
    基于所述连接边的属性字段数据,对所述账号检测图利用Connected Component算法进 行图聚类处理,得到多个账号群体;Based on the attribute field data of the connection side, the account detection graph utilizes the Connected Component algorithm to perform graph clustering processing to obtain a plurality of account groups;
    从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一登录网络地址的账号群体,得到第一账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
    从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
    将所述第一账号群体组合及所述第二账号群体组合确定为所述账号聚类簇。The first account group combination and the second account group combination are determined as the account cluster.
  12. 根据权利要求9所述的计算机设备,其中,所述根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:The computer device according to claim 9, wherein calculating the weight of each word in the first field data document according to the first field data document and the second field data document comprises:
    对于所述第一字段数据文档中的每个词,计算每个词在所述第一字段数据文档中出现的第一频率;For each word in the first field data document, calculating the first frequency of occurrence of each word in the first field data document;
    对于所述第一字段数据文档中的每个词,计算每个所述词在所述第一字段数据文档及所述第二字段数据文档中同时出现的第二频率;For each word in the first field data document, calculating a second frequency of simultaneous occurrence of each of the words in the first field data document and the second field data document;
    将所述第一频率与所述第二频率的乘积,作为每个所述词的权重。The product of the first frequency and the second frequency is used as the weight of each of the words.
  13. 根据权利要求9所述的计算机设备,其中,所述根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:The computer device according to claim 9, wherein calculating the weight of each word in the first field data document according to the first field data document and the second field data document comprises:
    基于公式tf-idf(t,d)=tf(t,d)×idf(t),idf(t)=log(N/e)计算每个词的权重,其中,所述tf-id(tf,d)为权重,所述t为词,所述d为所述第一字段数据文档,所述tf(t,d)为所述词在所述第一字段数据文档中出现的频数,所述idf(t)是文档中词的逆向文本频率,所述N为所述第一字段数据文档及所述第二字段数据文档的总数,e是所述第一字段数据文档及所述第二字段数据文档中出现所述词的文档数量。The weight of each word is calculated based on the formula tf-idf(t,d)=tf(t,d)×idf(t), idf(t)=log(N/e), wherein the tf-id(tf , d) is the weight, the t is the word, the d is the first field data document, the tf(t, d) is the frequency of the word appearing in the first field data document, so The idf(t) is the inverse text frequency of the word in the document, the N is the total number of the first field data document and the second field data document, and e is the first field data document and the second field data document. The number of documents in which the word appears in the field data document.
  14. 根据权利要求9所述的计算机设备,其中,所述基于每个所述词的权重确定黑产账号聚类簇,包括:The computer device according to claim 9, wherein the determining of the black product account clusters based on the weight of each of the words comprises:
    确定所述权重高于预定权值的词所来源的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document from which the word with the weight higher than the predetermined weight comes from, as the black product data document;
    将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
  15. 根据权利要求9所述的计算机设备,其中,所述基于每个所述词的权重确定黑产账号聚类簇,包括:The computer device according to claim 9, wherein the determining of the black product account clusters based on the weight of each of the words comprises:
    计算每个所述第一字段数据文档中词的权重平均值;Calculate the average weight of words in each of the first field data documents;
    确定所述权重平均值高于预定平均值的所述第一字段数据文档,作为黑产数据文档;Determine the first field data document with the weight average value higher than a predetermined average value as a black product data document;
    将所述黑产数据文档对应的账号聚类簇确定为黑产账号团伙。The account cluster corresponding to the black product data document is determined as a black product account group.
  16. 一种存储有计算机可读指令的计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行:A computer-readable storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to execute:
    确定手机号所绑定账号的数目超过预定数目时,获取所述账号的账号属性数据集,所述手机号来源于目标主体的账号数据库;When it is determined that the number of accounts bound to the mobile phone number exceeds a predetermined number, the account attribute data set of the account is obtained, and the mobile phone number is derived from the account database of the target subject;
    将所述账号属性数据集中的属性字段数据作为连接边,并将所述手机号作为顶点构建所述目标主体的账号检测图;Using the attribute field data in the account attribute data set as a connecting edge, and using the mobile phone number as a vertex to construct an account detection graph of the target subject;
    基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇;Perform graph clustering on the accounts in the account detection diagram based on the attribute field data of the connected edges in the account detection diagram to obtain a plurality of account clusters;
    利用每个所述账号聚类簇的属性字段数据,生成每个所述账号聚类簇的第一字段数据文档,并获取所述目标主体所对应白名单账号的第二字段数据文档;Using the attribute field data of each of the account clusters, generate the first field data document of each of the account clusters, and obtain the second field data document of the whitelist account corresponding to the target subject;
    根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,所述权重指示每个所述词在所述第一字段数据文档中相对于在所述第二字段数据文档中的重要性;From the first field data document and the second field data document, a weight is calculated for each word in the first field data document, the weight indicating that each of the words is in the first field data document relative to the importance in the second field data document;
    基于每个所述词的权重确定黑产账号聚类簇,得到关联与所述目标主体黑产账号群体。Based on the weight of each of the words, a cluster of black production accounts is determined, and a group of black production accounts associated with the target subject is obtained.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述确定手机号所绑定账号 的数目超过预定数目时,获取所述账号的账号属性数据集,包括:The computer-readable storage medium according to claim 16, wherein, when the number of accounts bound to the mobile phone number is determined to exceed a predetermined number, acquiring the account attribute data set of the account, comprising:
    获取目标主体与所述手机号的业务关联条件,所述业务关联条件指示所述手机号在目标业务中可以绑定账号的数目阈值,所述目标业务来源于所述目标主体;Obtain the business association condition between the target entity and the mobile phone number, the business association condition indicating the threshold of the number of accounts that can be bound to the mobile phone number in the target business, and the target business originates from the target entity;
    当绑定了所述手机号的账号超过所述数目阈值时,获取所述账号的账号属性数据集。When the account number bound to the mobile phone number exceeds the number threshold, the account attribute data set of the account number is acquired.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述账号检测图中连接边的属性字段数据对所述账号检测图中账号进行图聚类,得到多个账号聚类簇,包括:The computer-readable storage medium according to claim 16, wherein the graph clustering is performed on the accounts in the account detection diagram based on the attribute field data of the connection edges in the account detection diagram to obtain a plurality of account clusters ,include:
    基于所述连接边的属性字段数据,对所述账号检测图利用Connected Component算法进行图聚类处理,得到多个账号群体;Based on the attribute field data of the connected edges, the account detection graph is subjected to graph clustering processing using the Connected Component algorithm to obtain multiple account groups;
    从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一登录网络地址的账号群体,得到第一账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with the same login network address, and obtain a first account group combination;
    从所述多个账号群体中,获取包含手机号个数大于等于预定个数且关联于同一无线网卡的物理地址的账号群体,得到第二账号群体组合;From the plurality of account groups, obtain account groups that include a number of mobile phone numbers greater than or equal to a predetermined number and are associated with physical addresses of the same wireless network card, and obtain a second account group combination;
    将所述第一账号群体组合及所述第二账号群体组合确定为所述账号聚类簇。The first account group combination and the second account group combination are determined as the account cluster.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:The computer-readable storage medium of claim 16, wherein calculating the weight of each word in the first field data document according to the first field data document and the second field data document comprises: :
    对于所述第一字段数据文档中的每个词,计算每个词在所述第一字段数据文档中出现的第一频率;For each word in the first field data document, calculating the first frequency of occurrence of each word in the first field data document;
    对于所述第一字段数据文档中的每个词,计算每个所述词在所述第一字段数据文档及所述第二字段数据文档中同时出现的第二频率;For each word in the first field data document, calculating a second frequency of simultaneous occurrence of each of the words in the first field data document and the second field data document;
    将所述第一频率与所述第二频率的乘积,作为每个所述词的权重。The product of the first frequency and the second frequency is used as the weight of each of the words.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述第一字段数据文档及所述第二字段数据文档,计算所述第一字段数据文档中每个词的权重,包括:The computer-readable storage medium of claim 16, wherein calculating the weight of each word in the first field data document according to the first field data document and the second field data document comprises: :
    基于公式tf-idf(t,d)=tf(t,d)×idf(t),idf(t)=log(N/e)计算每个词的权重,其中,所述tf-id(tf,d)为权重,所述t为词,所述d为所述第一字段数据文档,所述tf(t,d)为所述词在所述第一字段数据文档中出现的频数,所述idf(t)是文档中词的逆向文本频率,所述N为所述第一字段数据文档及所述第二字段数据文档的总数,e是所述第一字段数据文档及所述第二字段数据文档中出现所述词的文档数量。The weight of each word is calculated based on the formula tf-idf(t,d)=tf(t,d)×idf(t), idf(t)=log(N/e), wherein the tf-id(tf , d) is the weight, the t is the word, the d is the first field data document, the tf(t, d) is the frequency of the word appearing in the first field data document, so The idf(t) is the inverse text frequency of the word in the document, the N is the total number of the first field data document and the second field data document, and e is the first field data document and the second field data document. The number of documents in which the word appears in the field data document.
PCT/CN2021/090947 2020-07-31 2021-04-29 Underground industry account detection method and apparatus, computer device, and medium WO2022021977A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010763020.X 2020-07-31
CN202010763020.XA CN111931048B (en) 2020-07-31 2020-07-31 Artificial intelligence-based black product account detection method and related device

Publications (1)

Publication Number Publication Date
WO2022021977A1 true WO2022021977A1 (en) 2022-02-03

Family

ID=73315956

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090947 WO2022021977A1 (en) 2020-07-31 2021-04-29 Underground industry account detection method and apparatus, computer device, and medium

Country Status (2)

Country Link
CN (1) CN111931048B (en)
WO (1) WO2022021977A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785546A (en) * 2022-03-15 2022-07-22 上海聚水潭网络科技有限公司 IP tracing method and system based on service log and IP information
CN116846596A (en) * 2023-05-31 2023-10-03 北京数美时代科技有限公司 Identification method, system, medium and equipment of malicious account
CN114785546B (en) * 2022-03-15 2024-04-26 上海聚水潭网络科技有限公司 IP tracing method and system based on business log and IP information

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931048B (en) * 2020-07-31 2022-07-08 平安科技(深圳)有限公司 Artificial intelligence-based black product account detection method and related device
CN113312560B (en) * 2021-06-16 2023-07-25 百度在线网络技术(北京)有限公司 Group detection method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920947A (en) * 2018-05-08 2018-11-30 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device based on the modeling of log figure
CN109660513A (en) * 2018-11-13 2019-04-19 微梦创科网络科技(中国)有限公司 A kind of method and device based on Storm cluster identification problem account
CN109948641A (en) * 2019-01-17 2019-06-28 阿里巴巴集团控股有限公司 Anomaly groups recognition methods and device
US20190318359A1 (en) * 2018-04-17 2019-10-17 Mastercard International Incorporated Method and system for fraud prevention via blockchain
CN110620770A (en) * 2019-09-19 2019-12-27 微梦创科网络科技(中国)有限公司 Method and device for analyzing network black product account number
CN111931048A (en) * 2020-07-31 2020-11-13 平安科技(深圳)有限公司 Artificial intelligence-based black product account detection method and related device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2305912A1 (en) * 2000-04-17 2001-10-17 Oxford Properties Group Inc. Internet pager service dispatch
CN106372977B (en) * 2015-07-23 2019-06-07 阿里巴巴集团控股有限公司 A kind of processing method and equipment of virtual account
RU2635275C1 (en) * 2016-07-29 2017-11-09 Акционерное общество "Лаборатория Касперского" System and method of identifying user's suspicious activity in user's interaction with various banking services
CN107798541B (en) * 2016-08-31 2021-12-07 南京星云数字技术有限公司 Monitoring method and system for online service
CN107657062A (en) * 2017-10-25 2018-02-02 医渡云(北京)技术有限公司 Similar case search method and device, storage medium, electronic equipment
CN109102301A (en) * 2018-08-20 2018-12-28 阿里巴巴集团控股有限公司 A kind of payment air control method and system
CN109525595B (en) * 2018-12-25 2021-04-16 广州方硅信息技术有限公司 Black product account identification method and equipment based on time flow characteristics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318359A1 (en) * 2018-04-17 2019-10-17 Mastercard International Incorporated Method and system for fraud prevention via blockchain
CN108920947A (en) * 2018-05-08 2018-11-30 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device based on the modeling of log figure
CN109660513A (en) * 2018-11-13 2019-04-19 微梦创科网络科技(中国)有限公司 A kind of method and device based on Storm cluster identification problem account
CN109948641A (en) * 2019-01-17 2019-06-28 阿里巴巴集团控股有限公司 Anomaly groups recognition methods and device
CN110620770A (en) * 2019-09-19 2019-12-27 微梦创科网络科技(中国)有限公司 Method and device for analyzing network black product account number
CN111931048A (en) * 2020-07-31 2020-11-13 平安科技(深圳)有限公司 Artificial intelligence-based black product account detection method and related device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785546A (en) * 2022-03-15 2022-07-22 上海聚水潭网络科技有限公司 IP tracing method and system based on service log and IP information
CN114785546B (en) * 2022-03-15 2024-04-26 上海聚水潭网络科技有限公司 IP tracing method and system based on business log and IP information
CN116846596A (en) * 2023-05-31 2023-10-03 北京数美时代科技有限公司 Identification method, system, medium and equipment of malicious account
CN116846596B (en) * 2023-05-31 2024-01-30 北京数美时代科技有限公司 Identification method, system, medium and equipment of malicious account

Also Published As

Publication number Publication date
CN111931048A (en) 2020-11-13
CN111931048B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2022021977A1 (en) Underground industry account detection method and apparatus, computer device, and medium
CN110958220B (en) Network space security threat detection method and system based on heterogeneous graph embedding
US20200389495A1 (en) Secure policy-controlled processing and auditing on regulated data sets
US20200244760A1 (en) Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment
US10079842B1 (en) Transparent volume based intrusion detection
US7631362B2 (en) Method and system for adaptive identity analysis, behavioral comparison, compliance, and application protection using usage information
US9323928B2 (en) System and method for non-signature based detection of malicious processes
US10728264B2 (en) Characterizing behavior anomaly analysis performance based on threat intelligence
US20190028371A1 (en) Identifying multiple devices belonging to a single user
US20210021644A1 (en) Advanced cybersecurity threat mitigation using software supply chain analysis
US20150172303A1 (en) Malware Detection and Identification
CN110383278A (en) The system and method for calculating event for detecting malice
US20190073483A1 (en) Identifying sensitive data writes to data stores
CN106295349A (en) Risk Identification Method, identification device and the anti-Ore-controlling Role that account is stolen
US10805327B1 (en) Spatial cosine similarity based anomaly detection
US20210136120A1 (en) Universal computing asset registry
CN108932426A (en) It goes beyond one's commission leak detection method and device
US11019494B2 (en) System and method for determining dangerousness of devices for a banking service
CN111931047B (en) Artificial intelligence-based black product account detection method and related device
CN112784281A (en) Safety assessment method, device, equipment and storage medium for industrial internet
US20230104176A1 (en) Using a Machine Learning System to Process a Corpus of Documents Associated With a User to Determine a User-Specific and/or Process-Specific Consequence Index
CN114760106A (en) Network attack determination method, system, electronic device and storage medium
CN110955890B (en) Method and device for detecting malicious batch access behaviors and computer storage medium
US8402545B1 (en) Systems and methods for identifying unique malware variants
CN113037689A (en) Log-based virus discovery method and device, computing equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849086

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849086

Country of ref document: EP

Kind code of ref document: A1