CN115115369A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115115369A
CN115115369A CN202110288455.8A CN202110288455A CN115115369A CN 115115369 A CN115115369 A CN 115115369A CN 202110288455 A CN202110288455 A CN 202110288455A CN 115115369 A CN115115369 A CN 115115369A
Authority
CN
China
Prior art keywords
account
target
abnormal
accounts
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110288455.8A
Other languages
Chinese (zh)
Inventor
陈萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110288455.8A priority Critical patent/CN115115369A/en
Publication of CN115115369A publication Critical patent/CN115115369A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/382Payment protocols; Details thereof insuring higher security of transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, data processing equipment and a storage medium, wherein the method comprises the following steps: acquiring a target account set, and determining a reference account from the target account set according to the registration data of each account in the target account set; the target account set comprises at least one account and registration data of each account; the reference accounts comprise abnormal accounts and associated accounts which have an association relation with the abnormal accounts; acquiring proportion characteristics of the reference account, wherein the proportion characteristics comprise at least one of the following characteristics: abnormal proportion of the abnormal account and associated proportion of the associated account; when the proportion characteristics meet target processing conditions, processing target abnormal accounts collected from the target accounts by adopting processing rules associated with the target processing conditions, and improving the effectiveness of processing abnormal accounts by data processing equipment.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
With the continuous and deep development of internet technology, online payment has become a current payment hot trend, registration application of online accounts is generally required when online payment is carried out, and with the online payment trend, some online payment-based fraud behaviors are continuously generated. Currently, in order to avoid fraud based on online payment, an abnormal account with fraud is generally processed based on feedback information of a user, and therefore, the current processing behavior for the abnormal account is a later processing behavior, so that the effectiveness of processing the abnormal account is low, and therefore how to implement effective processing of the abnormal account becomes a current research hotspot.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device, data processing equipment and a storage medium, which can improve the effectiveness of the data processing equipment in processing an abnormal account.
In one aspect, an embodiment of the present invention provides a data processing method, including:
acquiring a target account set, and determining a reference account from the target account set according to the registration data of each account in the target account set; the target account set comprises at least one account and registration data of each account; the reference accounts comprise abnormal accounts and associated accounts which have an association relation with the abnormal accounts;
acquiring a proportion characteristic of the reference account, wherein the proportion characteristic comprises at least one of the following characteristics: the abnormal proportion of the abnormal account and the associated proportion of the associated account;
and when the proportion characteristics meet target processing conditions, processing the target abnormal account in the target account set by adopting a processing rule associated with the target processing conditions.
In another aspect, an embodiment of the present invention provides a data processing apparatus, including:
an acquisition unit configured to acquire a target account set; the target account set comprises at least one account and registration data of each account;
the determining unit is used for determining a reference account from the target account set according to the registration data of each account in the target account set; the reference accounts comprise abnormal accounts and associated accounts which have an association relation with the abnormal accounts;
the obtaining unit is further configured to obtain a proportion feature of the reference account, where the proportion feature includes at least one of: abnormal proportion of the abnormal account and associated proportion of the associated account;
and the processing unit is used for processing the target abnormal account in the target account set by adopting a processing rule associated with the target processing condition when the proportion characteristic meets the target processing condition.
In another aspect, an embodiment of the present invention provides a data processing apparatus, including a processor, an input apparatus, an output apparatus, and a memory, where the processor, the input apparatus, the output apparatus, and the memory are connected to each other, where the memory is used to store a computer program that supports the data processing apparatus to execute the foregoing method, the computer program includes program instructions, and the processor is configured to call the program instructions to perform the following steps:
acquiring a target account set, and determining a reference account from the target account set according to the registration data of each account in the target account set; the target account set comprises at least one account and registration data of each account; the reference accounts comprise abnormal accounts and associated accounts which have an association relation with the abnormal accounts;
acquiring a proportion characteristic of the reference account, wherein the proportion characteristic comprises at least one of the following characteristics: the abnormal proportion of the abnormal account and the associated proportion of the associated account;
and when the proportion characteristics meet target processing conditions, processing target abnormal accounts in the target account set by adopting processing rules associated with the target processing conditions.
In yet another aspect, an embodiment of the present invention provides a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed by a processor, the program instructions are used to execute the data processing method according to the first aspect.
In the embodiment of the present invention, after acquiring a target account set, a data processing device may determine, according to registration data of each account in the target account set, a reference account from the target account set, where the reference account determined by the data processing device includes an abnormal account and an associated account having an association relationship with the abnormal account, and after the data processing device determines the abnormal account and the associated account, it may further determine an abnormal proportion of the abnormal account corresponding to the target account set and an associated proportion of the associated account in the target account set, and then the data processing device may process the target abnormal account in the target account set based on the abnormal proportion and the associated proportion, process the abnormal account based on the division of the account set by the data processing device, and enable the data processing device to implement clustering processing on the abnormal accounts, therefore, the efficiency of the data processing equipment in processing the target abnormal account is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1a is a schematic diagram of a data processing system according to an embodiment of the present invention;
FIG. 1b is a schematic diagram of a processing system provided by an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of a data processing method provided by an embodiment of the invention;
FIG. 3 is a diagram illustrating exception data handling according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart diagram of a data processing method provided by an embodiment of the invention;
FIG. 5 is a diagram illustrating a trained scoring model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of processing exception data according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a data processing apparatus provided by an embodiment of the present invention;
fig. 8 is a schematic block diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a data processing method, which can judge whether the obtained proportion characteristics of reference accounts in a target account set meet target processing conditions by obtaining the proportion characteristics of the reference accounts, and process target abnormal accounts in the target account set when the proportion characteristics meet the target processing conditions, wherein in one embodiment, the reference accounts in the target account set comprise abnormal accounts and associated accounts having association relations with the abnormal accounts, so that based on the data processing method provided by the embodiment of the invention, the target abnormal accounts in the target account set can be processed when the abnormal accounts or the associated accounts exist in the target accounts, and the data processing equipment can process the target abnormal accounts based on the proportion characteristics of the accounts in different dimensions included in the target account set, and then multi-dimensional processing of the target abnormal account is realized, accurate processing of the target abnormal account is also realized, and the data processing equipment is facilitated to improve the management efficiency of the target abnormal account concentrated on the target account. In one embodiment, an abnormal account refers to an account for which there is abnormal behavior, including one or more of the following: the behavior that the replacement frequency of the electronic resource is greater than or equal to the frequency threshold value exists, and the behavior that the electronic resource is absorbed to the account by issuing illegal information exists in the corresponding registered object of the account; in addition, the associated account is an account that has an association relationship with the account that has been determined to be an abnormal account, such as an account that has frequent electronic resource exchange behavior with the abnormal account, or an account that has abnormal association information, such as corporate information or bank card information, the same as the abnormal account, and the like, where the associated account is not necessarily determined to be an abnormal account in the following, and there is also a possibility that the associated account is determined to be a normal account in the following, it can be understood that the target abnormal account described in the embodiment of the present invention includes: if the account determined to be abnormal and the account subsequently determined to be abnormal in the associated accounts related to the abnormal account are determined to be abnormal, the problem of processing errors can be effectively avoided based on the processing of the data processing equipment on the target abnormal account, and the processing accuracy of the account is ensured. The data processing device can be a terminal device, and the terminal device can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart sound box, a smart watch and the like; alternatively, the data processing device may be a server, which may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.
In one embodiment, the replacement frequency of the electronic resource refers to the frequency of transferring the electronic resource from the corresponding account and the frequency of transferring the electronic resource from the corresponding account, and if the replacement frequency of the electronic resource of a certain account is greater than or equal to a frequency threshold, it indicates that the account has a risk of money laundering, that is, the account is an abnormal account, and it needs to be noted that the frequency threshold of the electronic resource set for an account during replacement may be a fixed frequency threshold; or, different frequency thresholds may be set for different work categories engaged in by the registered users corresponding to the accounts, for example, if a registered user corresponding to an account is a government organization unit, the replacement frequency threshold set for the electronic resource of the account may be a smaller value a because the resource replacement behavior related to the account is less, and if a registered user corresponding to an account is a certain operation, it is seen that the resource replacement behavior related to the account corresponding to the operation is more, the replacement frequency threshold set for the electronic resource of the account should be a larger value b, that is, b > a must exist. In addition, the act of issuing the illegal information existing in the registered object corresponding to the account proposed by the embodiment of the present invention at least includes any one or more of the following: behaviors related to religion, behaviors related to false advertisements, and the like. The above determination method for the abnormal behavior is only an exemplary description, and is not exhaustive, so there are many ways for the data processing device to determine whether an account has the abnormal behavior, that is, there are many ways for the data processing device to determine whether an account is an abnormal account or an associated account.
In one embodiment, the data processing method may be applied to a data processing system as described in fig. 1a, where the data processing system includes a data processing device 10 and a terminal device 11, the terminal device 11 is configured to send an account registration request to the data processing device 10, and the data processing device 10 allocates an account to a corresponding terminal device based on the account registration request sent by the terminal device 11 and manages the account applying for registration. In one embodiment, the terminal device 11, when requesting account registration from the data processing device 10, transmits registration data to the data processing device 10, the registration data including: the method includes the steps that account names of accounts, information of registered legal persons, information of bank cards related to the registered accounts and the like are requested to be registered, and after the data processing equipment 10 allocates accounts to the corresponding terminal equipment 11 based on the registration information acquired from the terminal equipment 11, the data processing equipment 10 conducts abnormal risk management on the accounts in the terminal equipment 11 so as to avoid economic losses of other normal accounts caused by abnormal accounts. According to the data processing method provided by the embodiment of the invention, when performing risk anomaly management on the accounts in the data processing device 10, the data processing device can determine the abnormal accounts and the associated accounts from the registered data of each account included in the target account set, and further can respectively determine the abnormal occupation ratios corresponding to the abnormal accounts in the target account set and the associated occupation ratios corresponding to the associated accounts in the target account set, so that whether the conditions for processing the abnormal accounts in the target account set are met can be determined based on the abnormal occupation ratios and the associated occupation ratios, and when determining that the target processing conditions are met, the data processing device determines the target abnormal accounts from the target account set and processes the target abnormal accounts by adopting the processing rules associated with the target processing conditions. The data processing device 10 may be a node on the blockchain network, so that data in the data processing device can be uploaded to the blockchain network for storage.
In one embodiment, when the data processing device 10 determines whether to process a target abnormal account in the target account set according to the abnormal proportion of the abnormal account and the associated proportion of the associated account, the data processing device 10 may determine different processing rules respectively based on the abnormal proportion of the abnormal account and the different associated proportions corresponding to the associated account, so that the data processing device 10 may process the target abnormal account in the target account set according to the corresponding processing rules. The processing rule executed by the data processing device 10 for the target abnormal account includes: the account blocking processing, the freezing processing of electronic resources in the accounts and the like are carried out, based on the processing of the data processing equipment 10 on the target abnormal accounts in the account set, the simultaneous processing of a plurality of target abnormal accounts can be realized, and because the data processing equipment can respectively process the target abnormal accounts in the target account set based on the abnormal proportion of the abnormal accounts and the difference of the association proportion of the associated accounts, the data processing equipment can realize the processing of the target abnormal accounts based on different dimensions, and the accuracy of the processing of the target abnormal accounts is improved.
In one embodiment, the data processing device 10 may call a processing system for the abnormal account to perform the above steps, wherein the processing system for the abnormal account deployed in the data processing device 10 is shown in fig. 1b and includes: the system comprises a quasi real-time decision engine and an online real-time decision engine, wherein the quasi real-time decision engine comprises a set filtering module and a feature scoring module, and the online real-time decision engine comprises a processing module, wherein the set filtering module is used for determining a target account with abnormal possibility from a large number of accounts based on the basic information of the accounts and the relevance (or similarity) between different accounts; the feature scoring module is used for scoring the account obtained by filtering; and the processing module is used for automatically processing the corresponding account when the account receives the complaint information and the score reaches the threshold value.
Referring to fig. 2, a schematic flow chart of a data processing method according to an embodiment of the present invention is shown, where the method may be specifically executed by the data processing apparatus, and as shown in fig. 2, the method may include:
s201, acquiring a target account set, and determining a reference account from the target account set according to the registration data of each account in the target account set; the target account set comprises at least one account and registration data of each account; the reference account comprises an abnormal account and an associated account which is in an association relationship with the abnormal account.
In one embodiment, the process of acquiring, by the data processing device, a target account set is a process of screening out a target account set with abnormal possibility from a large number of known accounts (or registered accounts), where the target account set acquired by the data processing device is obtained by dividing the target account set according to registration data of an account applying for registration in the data processing device, and the registration data may include at least one of the following: the data processing equipment can adopt a gradient Word segmentation clustering algorithm, such as a G-Word2Vec-KNN algorithm, when acquiring a target account according to registered data, wherein the G-Word2Vec-KNN algorithm represents a first clustering major class based on vectors, and words are segmented in each major class by using a Word2Vec algorithm (a natural language processing algorithm) and a K-nearest neighbor algorithm (K-nearest neighbor, KNN) clustering minor class, so that the clustering efficiency of newly added merchants can be improved, and computing resources are reduced. In an embodiment, the data processing device may also process the corresponding account based on account audit information included in the registration data, where one account may have a plurality of investigation records and complaint records, then, according to each investigation result or complaint result, the data processing device may determine that the corresponding account is an abnormal account, set as an account to be observed, or recover a trusted account, then the scale or the state of the full-amount abnormal account determined by the data processing device every day is dynamic, and after the data processing device performs data cleaning on the registration data, only the latest investigation processing state corresponding to each account may be reserved, and the corresponding audit result is reserved, and then, whether the corresponding account is an abnormal account is determined according to the reserved audit result.
When the data processing equipment acquires the target account set based on the registration data, the registration data can be preprocessed firstly, wherein the registration data can be cleaned in the preprocessing process aiming at the registration data, so that the number of the registration data is reduced, the data processing equipment can process the cleaned registration data, the number of the registration data needing to be processed by the data processing equipment is also reduced, the reduction of the processing pressure of the data processing equipment is realized, and the data processing efficiency of the data processing equipment can be improved. In one embodiment, when the data processing device cleans the registration data, the registration data corresponding to the missing account name or the messy account name code in the registration data can be removed, the registration data corresponding to the trusted account which is endorsed is removed, the registration data which is judged to be the trusted account by the trusted model is removed, and based on the provision of the registration data, the data processing pressure of the data processing device can be reduced while the accuracy of the target data set acquired by the data processing device is ensured.
After the data processing device performs data cleaning on the registration data, a target account set can be obtained according to the cleaned registration data, and it can be understood that the account sets which may have an abnormality and are applied for registration from the data processing device generally have significant similarity, if the account names of the accounts are all formed by adopting a structure of "xxx city xx prefecture xx type xx (shop)", then when the data processing device obtains the target account set according to the registration data, the data processing device can firstly divide the accounts corresponding to the registration data in the data processing device into several account categories based on the account names included in the registration data of each account, specifically, when the data processing device divides the accounts based on the account names, the data processing device can firstly screen out account names included in the registration data for carrying out crust breaking and partitioning on the accounts belonging to the same city, such as pu tian or yong an city. After the data processing device divides the accounts into several major classes according to the registration data, the accounts of each major class can be further subdivided into several minor classes according to the account names included in the registration data, wherein one minor class into which the data processing device subdivides the accounts according to the account names is an account set.
When the data processing device divides the account corresponding to the registered data into a plurality of account categories according to the account name included in the registered data, the account included in the data processing device can be divided by adopting the word2vec algorithm, wherein the data processing device can firstly carry out word ending and word segmentation on the account name in the registered data existing in the data processing device, then remove stop words in the account name by a stop word pre-processing technology, based on the removal of the stop words in the account name, the effect of classifying the account based on the account name can be effectively improved, the processing amount of the data processing device on the account name can be reduced, and further, the data processing pressure of the data processing device is reduced. After the data processing device removes stop words included in the account name, the size of a corpus (corpus) and the size of a context word window can be further determined, and after the size of the corpus and the size of the context word window are determined, the text word set of the account name from which the stop words are removed is led into a word2vec algorithm model, so that the data processing device can obtain a vector representation corresponding to the account name from the output of the word2vec algorithm model.
After the data processing device divides the account of the registered data into several major account classes, further, each major class obtained by division can be further refined into one or more minor classes by adopting a KNN algorithm, and each minor class obtained by division is respectively used as a target account set. In a specific implementation, when the data processing device classifies account names by using a K-nearest neighbor algorithm, a vector distance between any two vector representations may be calculated according to a vector representation corresponding to each account name, where a distance between the account name and the vector representations is used to indicate a degree of similarity between the corresponding account names, that is, if a distance between two account name and vector identifiers is smaller, it indicates that the degree of similarity between the two corresponding account names is higher. Then, the data processing device may use a K-nearest neighbor algorithm, and refer to the vector representation corresponding to each account name, and cluster the accounts with higher similarity of the corresponding names into a subclass, so that the account names corresponding to the accounts in an account set obtained by the data processing device are all similar account names. The data processing device may determine any one account subclass as a target account set from the obtained one or more account subclasses based on dividing the accounts registered in the data processing device, and after the data processing device acquires the target account set, may determine a reference account from the target account set according to the registered data of each account included in the target account set.
When the data processing device determines the reference account from the target account set according to the registration data of each account in the target account set, the data processing device may use both an abnormal account in the target account set and an associated account having an association relationship with the abnormal account as the reference account. When the data processing device determines an abnormal account from the target account set, it may determine whether each account in the target account set is an abnormal account according to the related data of each account included in the target account set, such as the chat data, the electronic resource replacement record, the logistics data, and the like corresponding to the account, that is, if the chat data corresponding to one account in the target account set is abnormal, it indicates that the corresponding account is an abnormal account, or if the electronic resource replacement record corresponding to one account is abnormal, it indicates that the corresponding account is an abnormal account, and the like. The chat data abnormality corresponding to one account means that the chat data corresponding to the one account includes illegal data or fraud data and the like, and the corresponding electronic resource replacement record abnormality means that the corresponding electronic resource replacement record is a forged record and the like. In addition, when the data processing device determines the associated accounts from the target account set, the accounts having the same legal person information, identity information, payment card information and the like as the abnormal accounts in the target account set can be used as the associated accounts, and further, the abnormal accounts and the associated accounts determined in the target accounts can be used as reference accounts.
After the data processing apparatus determines the target account set and determines the reference account from the target account set, the step S202 may be executed instead, based on the number corresponding to the reference account and the total number of accounts included in the target account set, to determine the proportion feature of the reference account in the target account set.
S202, acquiring the proportion characteristics of the reference account, wherein the proportion characteristics comprise at least one of the following characteristics: the abnormal proportion of the abnormal account and the associated proportion of the associated account.
After the data processing device determines the reference account from the target account set, the proportion feature of the reference account may be obtained based on the number of the reference accounts included in the target account set and the account total number between the account total numbers included in the target account set, and then the target abnormal account in the target account set is processed based on the proportion feature. In one embodiment, since the reference accounts determined by the data processing device from the target account set include the abnormal accounts and the associated accounts, the data processing device determines the proportion characteristic of the reference accounts, that is, includes a process in which the data processing device determines the abnormal proportion of the abnormal accounts and a process in which the data processing device determines the associated proportion of the associated accounts, where the abnormal proportion of the abnormal accounts is a ratio between the number of the abnormal accounts included in the target account set and the total number of the accounts included in the target account set, and the associated proportion corresponding to the associated accounts is a ratio between the number of the associated accounts in the target account set and the total number of the accounts included in the target account set. After determining the abnormal proportion and the associated proportion (or the black associated proportion), the data processing device may process the target abnormal account in the target account set according to the abnormal proportion and the associated proportion, that is, switch to execute step S203.
In one embodiment, the proportion characteristics of the reference accounts acquired by the data processing device are used for reflecting the ratio of the determined reference accounts in the target account set, and since the reference accounts determined by the data processing device include the confirmed abnormal accounts and the associated accounts with higher probability of being confirmed as abnormal accounts subsequently, if the proportion of the reference accounts included in the target account set is higher, it is indicated that the target account set has more abnormal accounts, and the abnormal labels can be added to the user sets with higher proportion characteristics correspondingly, so that the data processing device can process the target abnormal accounts in all the user sets including the abnormal labels subsequently, thereby realizing group processing of the abnormal accounts and improving the processing efficiency of the abnormal accounts. The method comprises the steps that accounts included in the data processing equipment are divided according to the registration data of the data processing equipment, the accounts are divided into different large classes according to account names, the accounts in each small class are processed according to the proportion characteristics of the reference accounts in each small class after the accounts are further divided into different small classes, the data volume of the data processing equipment when the data processing equipment processes target abnormal accounts according to the abnormal conditions of all accounts can be reduced, and the efficiency of the data processing equipment in processing the target accounts is improved.
And S203, when the proportion characteristics meet the target processing conditions, processing the target abnormal accounts gathered from the target accounts by adopting the processing rules associated with the target processing conditions.
In one embodiment, the abnormal account determined by the data processing device from the target account set is: the target account set is determined to be an abnormal account, and the associated account determined by the data processing device from the target account set is an account which is not determined to be abnormal or not, and the associated account may be determined to be an abnormal account or a normal account in the following. And when the data processing device determines that the target processing condition is met according to the abnormal proportion of the abnormal account and the associated proportion of the associated account, the target abnormal accounts determined from the target account set comprise the accounts which are determined to be abnormal in the target account set and the accounts which are subsequently determined to be abnormal, namely the target abnormal accounts comprise all the abnormal accounts determined from the target account set by the data processing device and part of the associated accounts. When the data processing device determines that the proportion characteristics meet the target processing conditions and after the target abnormal account is determined, the data processing device processes the target abnormal account by adopting the processing rule associated with the target processing conditions, the data processing device can perform number sealing processing on the target abnormal account, or obtain part of electronic resources from the target abnormal account, and take the obtained part of electronic resources as penalty.
When the data processing device processes a target abnormal account in the target account set according to the proportion characteristics, if the data processing device determines that the abnormal account exists in the target account set, that is, if the data processing device determines that the abnormal proportion exists, the data processing device may process the abnormal account determined in the target account set, and if the data processing device only determines that the associated account exists in the target account set, the data processing device processes the target abnormal account in the target account set based on the association characteristics. In one embodiment, when the data processing device processes the target abnormal account in the target account set based on the association characteristics, if the data processing device only determines that the target abnormal account only includes the associated account in the target account set, the data processing device may call a scoring model, score the abnormality degree of each associated account in combination with the characteristic data of the associated account, determine the abnormality degree score of each associated account, and further the data processing device may determine the target abnormal account in combination with the abnormality degree score of each associated account, and process the target abnormal account. In one embodiment, the scoring model may be a trained xgboost model (a scoring model based on feature data). The xgboost model is trained using the Artificial Intelligence (AI) technique, which is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result, in other words, Artificial Intelligence is a comprehensive technique of computer science, which attempts to understand the essence of Intelligence and produces a new intelligent machine that can react in a manner similar to human Intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The training process of the xgboost model mainly relates to Machine Learning (Machine Learning, ML) of artificial intelligence Learning, and the Machine Learning is a multi-field cross subject and relates to multi-subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method is characterized in that the method specially studies how a computer simulates or realizes the learning behavior of human beings to obtain new knowledge or skills, reorganizes the existing knowledge structure to continuously improve the performance of the computer, machine learning is the core of artificial intelligence and is a fundamental approach for enabling the computer to have intelligence, the computer is applied to various fields of artificial intelligence, and the machine learning and deep learning generally comprise technologies such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and formal teaching learning. In an embodiment, the data processing device may first clean the registration data, further may clean account names included in the cleaned registration data, classify the included accounts into large classes, and further cluster each large class into small classes by using a KNN algorithm, so that a corresponding tag may be added to each target account set according to a feature percentage of a reference account in each small class (i.e., a target account set), thereby implementing batch processing on the abnormal accounts.
In the embodiment of the present invention, after acquiring a target account set, a data processing device may determine, according to registration data of each account in the target account set, a reference account from the target account set, where the reference account determined by the data processing device includes an abnormal account and an associated account having an association relationship with the abnormal account, and after the data processing device determines the abnormal account and the associated account, it may further determine an abnormal proportion of the abnormal account corresponding to the target account set and an associated proportion of the associated account in the target account set, and then the data processing device may process the target abnormal account in the target account set based on the abnormal proportion and the associated proportion, process the abnormal account based on the division of the account set by the data processing device, and enable the data processing device to implement clustering processing on the abnormal accounts, therefore, the efficiency of the data processing equipment in processing the target abnormal account is improved.
Referring to fig. 4, which is a schematic flowchart of a data processing method according to an embodiment of the present invention, the method may be specifically executed by the data processing apparatus, and as shown in fig. 4, the method may include:
s401, receiving a registration request of a new account, and acquiring registration data of the new account.
S402, dividing the new account according to the registration data of the new account, and taking an account set into which the new account is divided as a target account set; the set of target accounts includes at least one account, and registration data for each account.
In step S401 and step S402, the data processing apparatus, upon acquiring the target account set, may take the account set into which the new account is divided as the target account set, in a specific implementation, the data processing device may obtain the registration data of the new account when receiving the registration request of the new account, and further may divide the new account according to the registration data of the new account, wherein, when the data processing device divides the new account according to the registration data of the new account, the data processing device can firstly acquire the account name of the new account from the registration data of the new account, further, the data processing device may obtain a similar account set (the similar account set may be, for example, the large set determined by using the word2vec algorithm model) where the similar account is located, and determine, according to the similar account set, an account set to which the new account is divided. In one embodiment, the similar account set is divided into one or more similar account subsets, and when determining the account set into which the new account is divided according to the similar account set, the data processing device may first obtain a vector representation of account names corresponding to similar accounts in each similar account subset; furthermore, from the vector representation of the account name corresponding to each similar account, the target vector representation closest to the account name corresponding vector representation of the new account can be determined, so that the subset of similar accounts where the target vector representation corresponds to the account can be used as the account set into which the new account is divided.
In one embodiment, the data processing device may determine, by using a KNN algorithm and in combination with the account name of the new account, the account set into which the new account is divided, that is, the data processing device may determine, according to the vector representation of the account name corresponding to the new account, k similar vectors whose distance from the vector representation of the account name corresponding to the new account is smaller than or equal to a preset distance threshold, and further, the data processing device may determine, by means of majority voting, the account set into which the new account is divided. When the data processing device determines the account set to which the new account is divided by adopting the KNN algorithm, the following steps can be executed:
calculating the distance between the vector representation corresponding to the account name of the new account and the vector representation corresponding to the account name of each known account;
sequencing the known accounts according to the sequence of the distance increasing in sequence;
selecting K known accounts with the closest distance between the vector representation of the corresponding name and the vector representation corresponding to the account name of the new account from the known accounts according to the sorting; k is a positive integer greater than or equal to 1;
determining the account set to which each account in the K known accounts belongs and the occurrence number of each account set;
and fifthly, taking the account set with the highest occurrence frequency corresponding to the K known accounts as the account set into which the new account is divided.
In one embodiment, after the data processing device selects the vector representations of the corresponding account names from the known accounts and the K known accounts with the closest distance between the vector representations corresponding to the account name of the new account, it may further determine an account set corresponding to each of the K known accounts and randomly add the new account to one account as the account into which the new account is divided; or, the data processing device may further determine, from the determined K known accounts, a vector identifier that is closest to the vector identifier of the account name corresponding to the new account as a target vector identifier, and use an account set in which the known account corresponding to the target vector identifier is located as an account set into which the new vector is divided.
When a new user registers, the data processing equipment can call the set filtering module to determine the target account set, so that the data processing equipment can identify whether the new account is an abnormal account or not in the registration stage of the new account, and can limit and punish the newly registered abnormal account in the registration stage instead of disposing the abnormal account after cheating by using the account, thereby effectively ensuring the safety of electronic resources in other accounts. After the data processing device determines the target account set, the reference account and the proportion feature of the reference account may be determined from the target account set according to the registration data of each account in the target account set, that is, the data processing device may execute step S403 instead. After the data processing device determines the reference account from the target account set and the proportion characteristic of the reference account, the data processing device can process the target abnormal account in the target account set where the new account is located based on the proportion characteristic of the reference account, so that the data processing device can process the abnormal account in the registration stage of the account, and the registrable new account is used for executing abnormal behaviors after being registered, thereby causing economic loss generated by other users, and further effectively ensuring the electronic resources in the user account.
S403, determining a reference account from the target account set according to the registration data of each account in the target account set; the reference account comprises an abnormal account and an associated account which is in an association relationship with the abnormal account.
S404, acquiring the proportion characteristics of the reference account, wherein the proportion characteristics comprise at least one of the following characteristics: the abnormal proportion of the abnormal account and the associated proportion of the associated account.
In steps S403 and S404, after the data processing device determines the target account set, it may determine an abnormal account and an associated account from the target account set according to the registration data of each account in the target account set, and may further determine an abnormal proportion corresponding to the abnormal account and an associated proportion corresponding to the associated account, respectively, where the associated proportion may also be referred to as a black associated proportion, and the associated proportion refers to an account having one or more same information in the account name, the corporate information, the identity information, and the payment card information that are determined as the abnormal account. When a new account is registered in the data processing device, the data processing device may classify the new account into a corresponding large class according to the account name of the new account, then train the new account into a small class (i.e., a set) through the large class KNN, calculate an account ratio of an account having an abnormality in a target account set where the new account is located, that is, the data processing device needs to calculate an abnormal account ratio of the known abnormal account in the target account set and an associated account ratio corresponding to an associated account having an association relationship with the known abnormal account, and then the data processing device may mark the target account set where the new account is located based on the abnormal account ratio and the associated account ratio, and determine a label corresponding to the target account set. If the merchant does not belong to the existing broad category, a new type of training is performed and the new account is added to the existing broad category. Therefore, the data processing equipment completes the filtering of the abnormal accounts, and further can screen out the account set with high suspicion degree (or high abnormality degree).
After the data processing device determines the abnormal proportion corresponding to the abnormal account and the associated proportion corresponding to the associated account, the data processing device may process the target abnormal accounts in the target account set by using the corresponding processing rule based on the abnormal proportion and the associated proportion, that is, the data processing device may switch to execute step S405.
And S405, when the proportion characteristics meet the target processing conditions, processing the target abnormal accounts gathered from the target accounts by adopting the processing rules associated with the target processing conditions.
After the data processing device determines the abnormal proportion corresponding to the abnormal account and the associated proportion corresponding to the associated account, the data processing device may determine, based on the abnormal proportion and/or the associated proportion, that the proportion characteristic of the reference account in the target account set satisfies the target processing condition, and in one embodiment, when the data processing device determines whether the proportion characteristic satisfies the target processing condition based on the abnormal proportion, if the data processing device determines that the abnormal proportion is greater than the first proportion threshold, the data processing device may determine that the proportion characteristic satisfies the target processing condition, and may further process the target abnormal account in the target account set by using the processing rule associated with the target processing condition. Specifically, when processing a target abnormal account in the target account set, the data processing device may delete a normal account in the target account set according to the registration data of each account in the target account set, where the normal account includes: the registration object included in the registration data of the corresponding account is an endorsed object, or the registration data of the corresponding account is credible characteristic data; further, the data processing device may use the remaining accounts, from which the normal accounts are deleted in the target account set, as the target abnormal account, and process the target abnormal account.
In another embodiment, the data processing apparatus may also determine whether the occupation ratio characteristic satisfies the target processing condition based on the association occupation ratio, wherein if it is determined that the association occupation ratio corresponding to the associated account is greater than the second ratio threshold, the data processing apparatus may determine that the occupation ratio characteristic satisfies the target processing condition, and obtain the abnormality degree score of each account in the target account set, so that the data processing apparatus may take the account in the target account set, which has a corresponding abnormality degree score greater than or equal to the score threshold, as the target abnormal account, and process the target abnormal account using the processing rule associated with the target processing condition. When the data processing equipment determines the associated account which has an association relation with the abnormal account from the target account set, the data processing equipment can acquire the information of the legal person, the identity information of the registered object and the payment card information of the abnormal account from the registered data of the abnormal account; the data processing device may collect the target account as an account having the same identity information as the corporate information, or as an associated account having an association relationship with the abnormal account as the account having the same identity information as the payment card information. It should be noted that, the processing rule adopted by the data processing device when the abnormal proportion is greater than the first proportion threshold value, and the processing rule adopted by the data processing device when the association proportion is determined to be greater than the second proportion threshold value may be the same processing rule or different processing rules, and are not limited in the embodiment of the present invention.
In one embodiment, when the data processing device obtains the trained score model, it may obtain the exception label added to the target account set in the above steps, and determine the audit label corresponding to each account included in the target account set, where the audit label is used to indicate whether each account included in the target account set is an exception account, and in addition, the data processing device may also obtain account feature data of each account included in the target account set, where the account feature data includes one or more of the following: order data, name data, complaint data and information flow data; then, the data processing device may train the initial scoring model according to the account feature data of each account, the abnormal label corresponding to each account, and the audit label added to each account, so as to obtain a trained scoring model. The trained scoring model called by the data processing device may be an xgboost model, a catboost model, or a lightGBM model, and the following describes in detail a process of obtaining the trained scoring model by training the data processing device with reference to fig. 5, specifically, the data processing device may perform the following steps to obtain the trained scoring model: the data processing device may use the feature data of each account in the target account set to which the abnormal tag is added as sample data, where the abnormal tag added to each account in the target account set has a tag value of 1, and the audit tag added to each account is determined according to a result obtained after auditing the corresponding account manually, and if the result of auditing the corresponding account manually determines that the corresponding account is an abnormal account, the tag added to the corresponding account is an abnormal tag having a tag value of 1, and if the result of auditing the corresponding account manually determines that the corresponding account is a normal account, the tag added to the corresponding account is a normal tag having a tag value of 0. Wherein the characteristic data of each account includes one or more of order data, name data, complaint data and information flow data.
After obtaining sample data, that is, after obtaining the feature data of each account in the target account set, the data processing device may first pre-process the obtained data, specifically, the data processing device may first perform data cleaning on the obtained data to delete the feature data missing more than 80% of the data values in the feature data, and in addition, the data processing device may also perform completion processing on a small amount of feature data missing data values. When the data processing equipment completes the data value of the characteristic data, the data value of the corresponding characteristic data can be completed to be 0, or an average value or variance and the like according to the business meaning, and the classification index is processed into a numerical variable, such as male and female mapping to be 0-1; the continuous variables were normalized using Z-score (a normalization method), and then the new data obtained by complementation were (original data-mean)/standard deviation. After the feature data is cleaned, the feature data can be primarily screened according to the service meaning of each feature data, and the feature data with higher Stability of the scoring model is screened according to a Population Stability Index (PSI) value, wherein PSI is an Index for screening feature variables and evaluating the Stability of the model, Stability is referred, and the Stability of the scoring model is calculated according to actual distribution and expected distribution which respectively correspond to a test set and a training set, wherein a PSI calculation formula is formula 1:
Figure BDA0002980334360000161
wherein A is i For actual distribution, E i For the expected distribution, when the psi is less than 0.25, the corresponding feature data can ensure the stability of the scoring model. In one embodiment, the training data in the training set and the test data in the test set may be split according to a time dimension, which may be in a ratio of 7:3, or 6:4, etc.
After the data processing device is split to obtain the training set, the model where the scoring model is located can be trained by using the training data, in the specific implementation, the data processing device can input each training data in the training set into the model where the scoring model is located, such as the xgboost model, and further can adjust each parameter in the model, and the trained xgboost model can be obtained by continuously adjusting and optimizing the model parameters. When the data processing device adjusts the xgboost model, wherein the data processing device may adopt a greedy algorithm to recursively select an optimal feature of a tree structure from a root node when training the xgboost model to adjust model parameters in the model, segment training data according to the feature, calculate an information gain of each segmentation scheme, and segment the node into optimal segments with the largest information gain until a preset segmentation stop condition (such as the maximum depth of the tree) is reached; the max _ depth parameter is used for controlling the depth of the model corresponding to the scoring tree, and if the parameter is larger, overfitting is easier; the lambda parameter is L2 for controlling the weight value of the model complexity, the bigger the regularization parameter is, the less the model is fit, the data processing device can set a stopping condition in the process of adjusting the model parameter, for example, stopping when the tree depth is maximum, determine to obtain the optimal parameter, etc.
After stopping adjusting the model parameters, the data processing equipment also scores each sample characteristic data in the test set based on the distribution of the data in the test set and calls a trained model to obtain the abnormal degree score of each sample characteristic, if the abnormal degree score determined by each sample characteristic by the trained model is the same as the abnormal degree score obtained by manually scoring the sample, the trained model can be used, and the data processing equipment can monitor the stability of the model according to the PSI value and feed back the stability to a service in time, and train the model again when the model is unstable.
Calling the trained scoring model at the data processing equipment, and after acquiring the first score of each account in the target account set, acquiring the evaluation information of each account in the target account set, so that the data processing equipment can determine the second score of each account in the target account set according to the evaluation information; the data processing device may determine the abnormality degree score of each account in the target account set according to the first score and the second score, and may further process the target abnormal account in the target account set based on the abnormality degree score. In one embodiment, the data processing device can call a feature scoring module to score the accounts in the target account set, and the feature scoring module is deployed in the real-time system TSSD, so that the data processing device can break through the limitation of computing resources of an online decision engine, can have more account variables and complex models, can relatively have simpler policy rules, has higher trained model accuracy, and can more accurately process abnormal accounts.
In an embodiment, when the data processing device processes a target abnormal account, a processing module in an online real-time decision engine shown in fig. 1b may be called to process the target abnormal account, where after the data processing device calls a feature scoring module to obtain an abnormality score of each account in a set of target accounts, the feature scoring module writes back the abnormality score of each account into the processing module of the real-time decision engine, and when a transaction of a virtual resource is performed on an account or a user complains on the account to generate complaint information, the real-time system processes the account according to the existing abnormality score of the account and the complaint information, so as to implement real-time interception of the transaction of the account or freeze an electronic resource of the account. Specifically, as shown in fig. 6, the data processing device implements pre-mining of an abnormal account, processes the abnormal account when the abnormal account has not started to be worked on a large scale, and intercepts or puts the abnormal account according to the grade threshold and the doubtful property of the transaction characteristics by combining with the high-precision abnormality degree score corresponding to the corresponding account determined by the grading model, and performs various account dispositions by combining the model grading with the user complaint, so that the data processing device implements multi-dimensional processing of the abnormal account, which is substantially improved compared with the single processing mode for the account.
In the embodiment of the present invention, when receiving a registration request of a new account, a data processing device may obtain registration data of the new account, and divide the new account based on the registration data of the new account, and further may use an account set into which the new account is divided as a target account set, the data processing device determines, based on the registration data of each account in the determined target account set, an abnormal account, an abnormal proportion corresponding to the abnormal account, an associated account having an association relationship with the abnormal account, and an associated proportion corresponding to the associated account, and further the data processing device may process the target abnormal account in the target account set according to the abnormal proportion and the associated proportion, which has been shown by practice that, by using the data processing method provided in the embodiment of the present invention, the efficiency of processing and identifying the abnormal account may be improved, in the period of the abnormal account prevalence, 90% of the abnormal account can be processed within one day, the processing accuracy is 100%, and the data processing equipment can be used for performing association processing on the abnormal account based on the account registration, so that the abnormal account registration can be effectively prevented, and the large-scale abnormal account recurrence can be avoided.
Based on the description of the above data processing method embodiment, an embodiment of the present invention further provides a data processing apparatus, which may be a computer program (including a program code) running in the data processing device. The data processing apparatus may be used to execute the data processing method as shown in fig. 2 and fig. 4, referring to fig. 7, the data processing apparatus includes: an acquisition unit 701, a determination unit 702 and a processing unit 703.
An obtaining unit 701, configured to obtain a target account set; the target account set comprises at least one account and registration data of each account;
a determining unit 702, configured to determine, according to registration data of each account in the target account set, a reference account from the target account set; the reference accounts comprise an abnormal account and an associated account which has an association relation with the abnormal account;
the obtaining unit 701 is further configured to obtain a proportion feature of the reference account, where the proportion feature includes at least one of: the abnormal proportion of the abnormal account and the associated proportion of the associated account;
a processing unit 703, configured to, when the proportion feature satisfies a target processing condition, process a target abnormal account in the target account set by using a processing rule associated with the target processing condition.
In one embodiment, if the duty ratio characteristics include an abnormal duty ratio of an abnormal account; the processing unit 703 is provided with:
and when the exception proportion is larger than a first proportional threshold, determining that a target processing condition is met, and processing a target exception account in the target account set by adopting a processing rule associated with the target processing condition.
In an embodiment, the processing unit 703 is specifically configured to:
deleting normal accounts in the target account set according to the registration data of all the accounts in the target account set, wherein the normal accounts comprise: the registration object included in the registration data of the corresponding account is an endorsed object, or the registration data of the corresponding account is credible characteristic data;
and deleting the residual accounts of the normal accounts in the target account set as target abnormal accounts, and processing the target abnormal accounts by adopting processing rules associated with the target processing conditions.
In one embodiment, if the duty ratio characteristics include an associated duty ratio of an associated account; the processing unit 703 is specifically configured to:
when the association proportion is larger than a second proportion threshold value, determining that a target processing condition is met, and acquiring the abnormal degree score of each account in the target account set;
and centralizing the target accounts, taking the accounts with the corresponding abnormal degree scores larger than or equal to the score threshold value as target abnormal accounts, and processing the target abnormal accounts by adopting processing rules related to the target processing conditions.
In an embodiment, the processing unit 703 is specifically configured to:
acquiring legal person information, identity information of a registered object and payment card information of an abnormal account from registration data of the abnormal account;
and the target account is concentrated into an account which is the same as the legal person information, an account which is the same as the identity information, or an account which is the same as the payment card information, and the account is used as an associated account which has an association relation with an abnormal account.
In an embodiment, the obtaining unit 701 is specifically configured to:
calling a trained grading model, and determining a first score of each account in the target account set by combining the registration information of each account in the target account set;
obtaining evaluation information of each account in the target account set, and determining a second score of each account in the target account set according to the evaluation information;
and determining the abnormal degree score of each account in the target account set according to the first score and the second score.
In an embodiment, the obtaining unit 701 is specifically configured to:
adding abnormal labels to the accounts included in the target account set, and determining auditing labels corresponding to the accounts included in the target account set, wherein the auditing labels are used for indicating whether the accounts included in the target account set are really abnormal accounts or not;
obtaining account characteristic data of each account included in the target account set, the account characteristic data including one or more of: order data, name data, complaint data and information flow data;
and training the initial scoring model according to the account characteristic data of each account, the abnormal label corresponding to each account and the audit label added to each account to obtain the trained scoring model.
In an embodiment, the obtaining unit 701 is specifically configured to:
receiving a registration request of a new account, and acquiring registration data of the new account;
and dividing the new account according to the registration data of the new account, and taking an account set into which the new account is divided as a target account set.
In an embodiment, the obtaining unit 701 is specifically configured to:
acquiring the account name of the new account from the registration data of the new account, and acquiring a similar account of which the similarity with the account name meets a similarity threshold;
and acquiring a similar account set where the similar account is located, and determining an account set into which the new account is divided according to the similar account set.
In one embodiment, the set of similar accounts is divided into one or more subsets of similar accounts; the obtaining unit 701 is specifically configured to:
acquiring vector representation of account names corresponding to all similar accounts in each similar account subset;
determining a target vector representation closest to the account name corresponding vector representation of the new account from the vector representations of the account names corresponding to each similar account;
and representing the similar account subset where the corresponding account is located by the target vector as an account set into which the new account is divided.
In this embodiment of the present invention, after the obtaining unit 701 obtains the target account set, the determining unit 702 may determine a reference account from the target account set according to the registration data of each account in the target account set, where the reference account determined by the determining unit 702 includes an abnormal account and an associated account having an association relationship with the abnormal account, and after determining the abnormal account and the associated account, may further determine an abnormal proportion of the abnormal account corresponding to the target account set and an associated proportion of the associated account in the target account set, then the processing unit 703 may process the target abnormal account in the target account set based on the abnormal proportion and the associated proportion, process the abnormal account based on the division of the account set, and enable the data processing device to implement clustering processing on the abnormal accounts, therefore, the efficiency of the data processing equipment in processing the target abnormal account is improved.
Fig. 8 is a schematic block diagram of a data processing apparatus according to an embodiment of the present invention. The data processing apparatus in the present embodiment as shown in fig. 8 may include: one or more processors 801; one or more input devices 802, one or more output devices 803, and memory 804. The processor 801, the input device 802, the output device 803, and the memory 804 described above are connected by a bus 805. The memory 804 is used to store a computer program comprising program instructions, and the processor 801 is used to execute the program instructions stored by the memory 804.
The memory 804 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 804 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the memory 804 may also include a combination of the above types of memory.
The processor 801 may be a Central Processing Unit (CPU). The processor 801 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or the like. The PLD may be a field-programmable gate array (FPGA), a General Array Logic (GAL), or the like. The processor 801 may also be a combination of the above structures.
In the embodiment of the present invention, the memory 804 is used for storing a computer program, the computer program includes program instructions, and the processor 801 is used for executing the program instructions stored in the memory 804, so as to implement the steps of the corresponding methods as described above in fig. 2 and fig. 4.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
acquiring a target account set, and determining a reference account from the target account set according to the registration data of each account in the target account set; the target account set comprises at least one account and registration data of each account; the reference accounts comprise abnormal accounts and associated accounts which have an association relation with the abnormal accounts;
acquiring proportion characteristics of the reference account, wherein the proportion characteristics comprise at least one of the following characteristics: the abnormal proportion of the abnormal account and the associated proportion of the associated account;
and when the proportion characteristics meet target processing conditions, processing the target abnormal account in the target account set by adopting a processing rule associated with the target processing conditions.
In one embodiment, if the duty ratio characteristics include an abnormal duty ratio of an abnormal account; the processor 801 is configured to call the program instructions for performing:
and when the exception proportion is larger than a first proportional threshold, determining that a target processing condition is met, and processing a target exception account in the target account set by adopting a processing rule associated with the target processing condition.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
deleting normal accounts in the target account set according to the registration data of all the accounts in the target account set, wherein the normal accounts comprise: the registration object included in the registration data of the corresponding account is an endorsed object, or the registration data of the corresponding account is credible characteristic data;
and deleting the residual accounts of the normal accounts in the target account set as target abnormal accounts, and processing the target abnormal accounts by adopting processing rules associated with the target processing conditions.
In one embodiment, if the duty characteristics include an associated duty of an associated account; the processor 801 is configured to call the program instructions for performing:
when the association proportion is larger than a second proportion threshold value, determining that a target processing condition is met, and acquiring the abnormal degree score of each account in the target account set;
and centralizing the target accounts, taking the accounts with the corresponding abnormal degree scores larger than or equal to the score threshold value as target abnormal accounts, and processing the target abnormal accounts by adopting processing rules related to the target processing conditions.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
acquiring legal person information, identity information of a registered object and payment card information of an abnormal account from registration data of the abnormal account;
and the target account is concentrated into an account which is the same as the legal person information, an account which is the same as the identity information, or an account which is the same as the payment card information, and the account is used as an associated account which has an association relation with an abnormal account.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
calling a trained grading model, and determining a first score of each account in the target account set by combining the registration information of each account in the target account set;
obtaining evaluation information of each account in the target account set, and determining a second score of each account in the target account set according to the evaluation information;
and determining the abnormal degree score of each account in the target account set according to the first score and the second score.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
adding an abnormal label to each account included in the target account set, and determining an audit label corresponding to each account included in the target account set, wherein the audit label is used for indicating whether each account included in the target account set is really an abnormal account;
obtaining account characteristic data of each account included in the target account set, the account characteristic data including one or more of: order data, name data, complaint data and information flow data;
and training the initial scoring model according to the account characteristic data of each account, the abnormal label corresponding to each account and the audit label added to each account to obtain the trained scoring model.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
receiving a registration request of a new account, and acquiring registration data of the new account;
and dividing the new account according to the registration data of the new account, and taking an account set into which the new account is divided as a target account set.
In one embodiment, the processor 801 is configured to call the program instructions for performing:
acquiring an account name of the new account from the registration data of the new account, and acquiring a similar account of which the similarity with the account name meets a similarity threshold;
and acquiring a similar account set where the similar account is located, and determining an account set into which the new account is divided according to the similar account set.
In one embodiment, the set of similar accounts is divided into one or more subsets of similar accounts; the processor 801 is configured to call the program instructions for performing:
acquiring vector representation of account names corresponding to all similar accounts in each similar account subset;
determining a target vector representation closest to the account name corresponding vector representation of the new account from the vector representations of the account names corresponding to each similar account;
and representing the similar account subset where the corresponding account is located by the target vector as an account set into which the new account is divided.
Embodiments of the present invention provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method embodiments as shown in fig. 2 or fig. 4. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (13)

1. A data processing method, comprising:
acquiring a target account set, and determining a reference account from the target account set according to the registration data of each account in the target account set; the target account set comprises at least one account and registration data of each account; the reference accounts comprise abnormal accounts and associated accounts which have an association relation with the abnormal accounts;
acquiring a proportion characteristic of the reference account, wherein the proportion characteristic comprises at least one of the following characteristics: the abnormal proportion of the abnormal account and the associated proportion of the associated account;
and when the proportion characteristics meet target processing conditions, processing target abnormal accounts in the target account set by adopting processing rules associated with the target processing conditions.
2. The method of claim 1, wherein if the duty cycle characteristics include an abnormal duty cycle of an abnormal account; when the proportion feature meets a target processing condition, processing a target abnormal account in the target account set by adopting a processing rule associated with the target processing condition, wherein the processing rule comprises the following steps:
and when the exception proportion is larger than a first proportion threshold value, determining that a target processing condition is met, and processing a target exception account in the target account set by adopting a processing rule associated with the target processing condition.
3. The method of claim 2, wherein the processing a target exception account from the set of target accounts with a processing rule associated with the target processing condition comprises:
deleting normal accounts in the target account set according to the registration data of all the accounts in the target account set, wherein the normal accounts comprise: the registration object included in the registration data of the corresponding account is an endorsed object, or the registration data of the corresponding account is credible characteristic data;
and deleting the residual accounts of the normal accounts in the target account set as target abnormal accounts, and processing the target abnormal accounts by adopting the processing rules associated with the target processing conditions.
4. The method of claim 1, wherein if the duty cycle characteristics include an associated duty cycle of an associated account; when the proportion feature meets a target processing condition, processing a target abnormal account in the target account set by adopting a processing rule associated with the target processing condition, wherein the processing rule comprises the following steps:
when the association proportion is larger than a second proportion threshold value, determining that a target processing condition is met, and acquiring the abnormal degree score of each account in the target account set;
and centralizing the target accounts, taking the accounts with the corresponding abnormal degree scores larger than or equal to the score threshold value as target abnormal accounts, and processing the target abnormal accounts by adopting processing rules related to the target processing conditions.
5. The method of claim 4, wherein determining the associated accounts from the set of target accounts that have an association relationship with an anomalous account comprises:
acquiring legal person information, identity information of a registered object and payment card information of an abnormal account from registration data of the abnormal account;
and taking the account which is the same as the legal person information, the account which is the same as the identity information or the account which is the same as the payment card information in the target account set as an associated account which has an association relation with an abnormal account.
6. The method of claim 4, wherein obtaining the abnormality score for each account in the set of target accounts comprises:
calling a trained grading model, and determining a first score of each account in the target account set by combining the registration information of each account in the target account set;
obtaining evaluation information of each account in the target account set, and determining a second score of each account in the target account set according to the evaluation information;
and determining the abnormal degree score of each account in the target account set according to the first score and the second score.
7. The method of claim 6, wherein obtaining the trained scoring model comprises:
adding an abnormal label to each account included in the target account set, and determining an audit label corresponding to each account included in the target account set, wherein the audit label is used for indicating whether each account included in the target account set is really an abnormal account;
obtaining account characteristic data of each account included in the target account set, the account characteristic data including one or more of: order data, name data, complaint data and information flow data;
and training the initial scoring model according to the account characteristic data of each account, the abnormal label corresponding to each account and the audit label added to each account to obtain the trained scoring model.
8. The method of claim 1, wherein the obtaining a set of target accounts comprises:
receiving a registration request of a new account, and acquiring registration data of the new account;
and dividing the new account according to the registration data of the new account, and taking an account set into which the new account is divided as a target account set.
9. The method of claim 8, wherein the partitioning the new account according to the registration data of the new account comprises:
acquiring the account name of the new account from the registration data of the new account, and acquiring a similar account of which the similarity with the account name meets a similarity threshold;
and acquiring a similar account set where the similar account is located, and determining an account set into which the new account is divided according to the similar account set.
10. The method of claim 9, wherein the set of similar accounts is divided into one or more subsets of similar accounts; the determining, according to the similar account set, an account set into which the new account is divided includes:
acquiring vector representation of account names corresponding to all similar accounts in each similar account subset;
determining a target vector representation closest to the account name corresponding vector representation of the new account from the vector representations of the account names corresponding to each similar account;
and representing the similar account subset where the corresponding account is located by the target vector as an account set into which the new account is divided.
11. A data processing apparatus, comprising:
an obtaining unit 701, configured to obtain a target account set; the target account set comprises at least one account and registration data of each account;
a determining unit 702, configured to determine, according to registration data of each account in the target account set, a reference account from the target account set; the reference accounts comprise an abnormal account and an associated account which has an association relation with the abnormal account;
the obtaining unit 701 is further configured to obtain a proportion feature of the reference account, where the proportion feature includes at least one of the following: the abnormal proportion of the abnormal account and the associated proportion of the associated account;
a processing unit 703, configured to, when the proportion feature satisfies a target processing condition, process a target abnormal account in the target account set by using a processing rule associated with the target processing condition.
12. A data processing apparatus comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform a method according to any of claims 1 to 10.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 10.
CN202110288455.8A 2021-03-17 2021-03-17 Data processing method, device, equipment and storage medium Pending CN115115369A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110288455.8A CN115115369A (en) 2021-03-17 2021-03-17 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110288455.8A CN115115369A (en) 2021-03-17 2021-03-17 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115115369A true CN115115369A (en) 2022-09-27

Family

ID=83323718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110288455.8A Pending CN115115369A (en) 2021-03-17 2021-03-17 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115115369A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644372A (en) * 2023-07-24 2023-08-25 北京芯盾时代科技有限公司 Account type determining method and device, electronic equipment and storage medium
CN117421254A (en) * 2023-12-19 2024-01-19 杭银消费金融股份有限公司 Automatic test method and system for reconciliation business

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644372A (en) * 2023-07-24 2023-08-25 北京芯盾时代科技有限公司 Account type determining method and device, electronic equipment and storage medium
CN116644372B (en) * 2023-07-24 2023-11-03 北京芯盾时代科技有限公司 Account type determining method and device, electronic equipment and storage medium
CN117421254A (en) * 2023-12-19 2024-01-19 杭银消费金融股份有限公司 Automatic test method and system for reconciliation business
CN117421254B (en) * 2023-12-19 2024-03-22 杭银消费金融股份有限公司 Automatic test method and system for reconciliation business

Similar Documents

Publication Publication Date Title
CN112613501A (en) Information auditing classification model construction method and information auditing method
CN111597348B (en) User image drawing method, device, computer equipment and storage medium
CN111371767B (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN113011973B (en) Method and equipment for financial transaction supervision model based on intelligent contract data lake
CN105574544A (en) Data processing method and device
CN110310114A (en) Object classification method, device, server and storage medium
CN115115369A (en) Data processing method, device, equipment and storage medium
CN115577152B (en) Online book borrowing management system based on data analysis
CN111259952A (en) Abnormal user identification method and device, computer equipment and storage medium
CN113139876A (en) Risk model training method and device, computer equipment and readable storage medium
CN116865994A (en) Network data security prediction method based on big data
CN116633589A (en) Malicious account detection method, device and storage medium in social network
CN116263906A (en) Method, device and storage medium for determining post address
CN112991079B (en) Multi-card co-occurrence medical treatment fraud detection method, system, cloud end and medium
CN113240259B (en) Rule policy group generation method and system and electronic equipment
CN112632219B (en) Method and device for intercepting junk short messages
CN113450011A (en) Task allocation method and device
CN112950222A (en) Resource processing abnormity detection method and device, electronic equipment and storage medium
CN112307133A (en) Security protection method and device, computer equipment and storage medium
CN110610378A (en) Product demand analysis method and device, computer equipment and storage medium
CN112131607B (en) Resource data processing method and device, computer equipment and storage medium
CN113987309B (en) Personal privacy data identification method and device, computer equipment and storage medium
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium
CN115205025A (en) Risk account identification method and device, computer equipment and storage medium
CN114663121A (en) Method and device for detecting abnormal traffic of advertisement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication