WO2023045691A1 - 对象识别方法、装置、电子设备及存储介质 - Google Patents

对象识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023045691A1
WO2023045691A1 PCT/CN2022/114765 CN2022114765W WO2023045691A1 WO 2023045691 A1 WO2023045691 A1 WO 2023045691A1 CN 2022114765 W CN2022114765 W CN 2022114765W WO 2023045691 A1 WO2023045691 A1 WO 2023045691A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
sample
objects
identified
data
Prior art date
Application number
PCT/CN2022/114765
Other languages
English (en)
French (fr)
Inventor
熊小瑀
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023045691A1 publication Critical patent/WO2023045691A1/zh
Priority to US18/195,868 priority Critical patent/US20230281479A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists

Definitions

  • This application relates to technical fields such as mobile payment, payment security, big data, vehicle-mounted terminals, and artificial intelligence. Specifically, this application relates to an object recognition method, device, electronic equipment, and storage medium.
  • An embodiment of the present application provides an object recognition method, the method comprising:
  • the object recognition model For each object to be recognized, based on the relevant object data of the object to be recognized, the object recognition model is used to predict the first label of the object to be recognized, and the first label characterizes the object type to which an object belongs among various object types ;
  • the reference data set includes related object data and second labels of a plurality of first sample objects with annotation labels, and the annotation label of a first sample object represents the first sample object among multiple object types
  • the real object type to which the sample object belongs, and the second label characterizes the probability that an object belongs to each object type in a plurality of object types;
  • the recognition result of the object to be recognized is determined according to the second label of the object to be recognized.
  • An embodiment of the present application provides an object recognition device, which includes:
  • the first prediction module is used to obtain the related object data of at least one object to be identified; for each object to be identified, based on the related object data of the object to be identified, the first label of the object to be identified is obtained through object recognition model prediction, so The first label characterizes the object type to which an object belongs among multiple object types;
  • the reference data set acquisition module is used to obtain a reference data set, the reference data set includes related object data and second labels of a plurality of first sample objects with label labels, and the label label of a first sample object represents the the real object type to which the first sample object belongs among the multiple object types, and the second label characterizes the probability that an object belongs to each object type among the multiple object types;
  • the second prediction module is used to determine the first association relationship between at least one object to be identified and each object in the plurality of first sample objects according to the related object data of each object to be identified and each first sample object , according to the first label of each object to be identified, the labeled label and the second label of each first sample object, and the first association relationship, determine the second label of each object to be identified;
  • the recognition result determination module is used to determine the recognition result of each object to be recognized according to the second label of each object to be recognized.
  • the embodiment of the present application also provides an electronic device, the electronic device includes a memory, a processor, and a computer program stored on the memory, and the processor executes the computer program to implement the steps of the method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method provided in the embodiment of the present application are implemented.
  • the embodiment of the present application also provides a computer program product, including a computer program, and when the computer program is executed by a processor, the method provided in the embodiment of the present application is implemented.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method provided in the above-mentioned embodiments of the present application.
  • FIG. 1 is a schematic flow chart of an object recognition method provided in an embodiment of the present application
  • FIGS. 2a to 2d are schematic diagrams of objects of several object types provided in examples of the present application.
  • FIG. 3 is a schematic structural diagram of an object recognition system provided by an embodiment of the present application.
  • FIG. 4 is a schematic flow chart of an object recognition method provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the principle of a training method for an object recognition model provided in an embodiment of the present application
  • FIG. 6 is a schematic diagram of the principle of label propagation provided in the example of the present application.
  • FIGS 7a to 7c are schematic diagrams of several different examples of label propagation provided in the example of this application.
  • FIG. 8 is a schematic structural diagram of an object recognition device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device applicable to an embodiment of the present application.
  • connection or wireless coupling may include wireless connection or wireless coupling.
  • the term “and/or” used herein indicates at least one of the items defined by the term, including all or any unit and all combinations of one or more associated listed items, such as “A and/or B ” indicates an implementation as “A”, or an implementation as “A”, or an implementation as “A and B”.
  • This application is aimed at the problems existing in the identification method of target types of objects (such as risky objects, that is, objects/users with fraudulent behavior, users who use illegal/violent social moral means to make profits), in order to better meet the risk identification
  • An identification method for an object that is, an object with a risk of fraud (referring to the transaction risk of illegally obtaining user assets by means of inducement, false information, etc.) by black industry) proposed by the demand.
  • the identification of risky users is often based on other users’ reported losses, the user’s own transaction behavior, etc.
  • User risk labels labels for users with fraudulent behavior
  • black industry black industry/illegal industry/malicious industry, refers to the industry that uses illegal/violent social moral means to make profits
  • black industry often conducts fraud in batches during the same period Behavior. Relying on the identification method of other users' loss reports, when a risky user is marked, the merchants at the same time may have completed the entire fraud process, and there will be a large number of reported losses, which cannot be prevented in advance, which greatly affects the illegal funds. control.
  • this application provides a new object identification method, based on which a risk user relationship network can be created, which not only helps In order to build a user risk system, it can better clarify the life cycle of black products, and provides a new path for pre-identifying fraud risks.
  • the object recognition method provided by the embodiment of the present application can better meet the requirements of timeliness and coverage of object recognition.
  • This method can be applied to the processing of big data (Big data), such as can be realized based on cloud technology (Cloud technology).
  • Big data big data
  • Cloud technology cloud technology
  • the data calculation involved in the embodiment of the present application can adopt the way of cloud computing (Cloud computing).
  • cloud computing can be used for the calculation of steps such as the training of the object recognition model and the determination of the label of the object based on label propagation.
  • Big data refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain period of time. It is a massive, high-growth rate that requires a new processing model to have stronger decision-making power, insight and discovery, and process optimization capabilities. and diverse information assets. With the advent of the cloud era, big data has also attracted more and more attention, and big data requires special techniques to effectively process large amounts of data that tolerate elapsed time. Technologies applicable to big data, including massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems. Among them, cloud technology is a general term for network technology, information technology, integration technology, management platform technology, application technology, etc. based on cloud computing business models. It can form a resource pool and be used on demand, which is flexible and convenient. Cloud computing technology will become an important support.
  • the solutions provided in the embodiments of the present application can also be implemented based on artificial intelligence (AI) technology, for example, the first risk label of the object can be predicted through the trained risk identification model, and the machine can also be used The way of learning is based on the loss function to obtain the reference data set.
  • Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology. Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the storage of the data involved in the embodiments of the present application can adopt cloud storage or blockchain-based storage, which can effectively protect the security of data.
  • blockchain refers to a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain (Block chain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information for verification The validity of its information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Fig. 1 shows a schematic flow chart of an object recognition method provided by an embodiment of the present application.
  • the method can be executed by any electronic device.
  • the method can also be executed by a server.
  • the server can be a cloud server or a physical Server or server cluster, this method can be implemented as an application program or as a plug-in or function module of an existing application program, for example, can be used as a new function module of a transaction class (such as mobile payment) application program, the server of the application program
  • the identification of the label of the object to be identified can be realized, and whether the object to be identified is a target type object, such as whether it is a non-risk object, and the risk it belongs to when the object is a risk object Type (object type, that is, what kind of fraudulent behavior the object is).
  • the method can also be executed by the terminal device, and the terminal device can recognize the label of the object to be recognized by executing the method, and obtain the recognition result.
  • the terminal device includes a user terminal, and the user terminal includes but is not limited to a mobile phone, a computer, an intelligent voice interaction device, a smart home appliance, a vehicle terminal, and the like.
  • the method may be executed by a server.
  • the object recognition method provided by the embodiment of the present application may include the following steps S110-S140.
  • Step S110 Obtain related object data of at least one object to be identified.
  • the objects in this embodiment of the application may include but not limited to users, merchants, etc.
  • An object may be characterized by its object identifier, and the form of the object identifier is not limited in this embodiment of the application, as long as it is information that can uniquely represent an object. Yes, for example, it may include but not less than the object's contact information, the object's account ID, etc., wherein the object's account ID may be the object's social account, such as the object's account in the application (for example, the user's registered account in the application name, nickname, etc.).
  • an account of an object may be used to represent the object.
  • the related object data of an object includes the interaction data of the object, wherein the related object data may be the object's interactive behavior data (also called social behavior data), which refers to the data related to the social interaction of the object , specifically may include data related to the object's interaction with other objects.
  • the relevant object data may be the social behavior data of the object obtained under the authorization of the object.
  • an object's social behavior data may include the object's social/interaction information and transaction information.
  • the social degree of the object reflected by the social information may include, for example, the social activity of the object, such as the number of friends of the object, the number of other objects following the object, or the information forwarded when the object publishes a piece of information , the number of objects in the power station, etc.
  • the criteria for determining friends are not limited in the implementation of this application. For example, two objects that follow each other can be friends with each other.
  • the transaction information of an object refers to the information related to the transaction between the object and other objects.
  • the transaction information may include but not limited to payment behavior information, transfer information (including the object’s payment/transfer to other objects, and other objects’ transfers to other objects).
  • the object makes the payment/transfer), etc.
  • the transaction information of an object can specifically include but not limited to transaction time, transaction initiator and recipient (for example, when A transfers money to B, A is the initiator, and B is the recipient), transaction amount, transaction type (is it a transfer, or sending red envelopes or other forms, etc.).
  • Step S120 For each object to be recognized, based on the relevant object data of the object, the first label of the object is obtained through the prediction of the object recognition model, wherein the first label of an object represents the category to which the object belongs among various object types. object type.
  • the object type may also be referred to as a risk type, and refers to a type of fraudulent behavior of an object.
  • the first label may also be referred to as a first risk label, which characterizes the risk type of the object predicted based on the object's related object data.
  • the object recognition model (also called the risk recognition model) is a pre-trained neural network model based on the training data set.
  • the input of the model is the related object data of the object, or the preprocessed data of the related object data, and the output of the model is the object type corresponding to the related object data, for example, the related object data can be preprocessed according to the preset requirements
  • Data in a fixed format, such as a vector converted into a specified data format is input to the model, and the object type of the object is obtained through model prediction.
  • the object recognition model can be a classification model, and the classification model can be a multi-classification model, and each object type in multiple object types corresponds to a category of the classification model, and social behavior can be predicted through this model
  • the data corresponds to a category
  • the object type represented by the category is the object type of the object to which the social behavior data belongs.
  • the embodiment of the present application does not limit the data form output by the model.
  • it can be a category identifier, or a one-dimensional vector, and the number of elements (that is, numbers) in the vector is equal to the above-mentioned The total number of types in multiple object types.
  • Each element corresponds to a type.
  • the element value of each element can be 0 or 1. For example, only one of the elements has an element value of 1, and the others are 0. The value is 1
  • the type corresponding to the element of is the predicted type of the object, that is, the above-mentioned first label.
  • the above multiple object types may include multiple target types and a non-target type, each target type corresponds to a fraudulent behavior type, that is, a risk type, and the non-target type corresponds to the absence of fraudulent behavior, that is, a non-risk type.
  • User that is to say, no risk can also be used as a risk type.
  • the risk type predicted by the model is no risk, the initial identification result of the object considers that the object is not a risk object.
  • the object recognition model can be a three-category model, through which it can predict whether an object is type A, type B or risk-free Objects of type (i.e. non-target type).
  • the specific training method of the object recognition model is not limited in this embodiment of the present application.
  • the above training end conditions of the model can also be configured according to application requirements.
  • the object recognition model can be obtained by training in the following manner:
  • the first training data set includes related object data of a plurality of second sample objects with labeled labels, and related object data of a plurality of unlabeled third sample objects, and a plurality of second sample objects
  • the real object types of include each of a variety of object types
  • the object type of the third sample object is predicted by the first classification model, and the label label of the third sample object is determined according to the object type;
  • the first classification model is continuously trained until the second training end condition is satisfied, and an object recognition model is obtained.
  • the interactive behavior characteristics (social behavior characteristics) exhibited by the objects are different.
  • training data of various types of objects are used respectively Carry out model training, that is, for each object type, the training data set contains related object data of multiple sample objects of this type, and through training, the model can learn different object types from the related object data of sample objects of different object types The social behavior characteristics of the object.
  • the model training is carried out with the help of semi-supervised learning, that is, The training data set contains both sample data with labels and sample data without labels.
  • the sample data of the model is iteratively trained so that the trained model can meet certain performance requirements, that is, meet the first training end condition, which can be configured according to actual needs. For example, the prediction accuracy of the model is greater than the set value.
  • the model can be used to predict the object type corresponding to the unlabeled sample data, and the related object data of the third sample object can be input into the first classification model that satisfies the above-mentioned first training end condition, and the first classification model of each third sample object can be obtained.
  • a label and use this label as the label label (that is, a pseudo-label) of the third sample object, and then continue to train the model based on the sample data with the label label and the sample data with the pseudo-label, when the model reaches
  • the training can be ended to obtain an object recognition model that meets the application requirements, through which the first label of the object to be recognized can be preliminarily predicted.
  • Step S130 Obtain a reference data set, which includes related object data and second labels of a plurality of first sample objects with labels.
  • annotation label of the first sample object represents the real object type to which the object belongs among the various object types
  • second label of an object represents the fact that the object belongs to each object type in the multiple object types probability
  • the annotation label of an object can be expressed as [1, 0, 0, 0, 0]
  • the second label can be expressed as [p1, p2, p3 , p4, p5], where, p1 to p5 represent the probability that the object is each of the 5 object types, the sum of the 5 probabilities is equal to 1, and the label indicates that the object’s real object type is 5
  • the object type corresponding to the element whose value is 1 in the object type.
  • the reference data set can be understood as a real sample data set, which includes related data of multiple objects of known risk types, including related object data, labeled labels and second labels.
  • the sample object belongs to the probability distribution of each object type among the multiple object types.
  • the implementation of fraudulent behavior often involves many different links, and may involve multiple different risk users (that is, risky users/objects).
  • risky users may also act on different links of the same fraud, and there are subtle connections such as social information and transaction behavior between different risky users. Therefore, one type of risk users is likely to be related to the same type or different types of risk users, and users of different risk types will also spread and affect each other. Therefore, in the embodiment of this application, the The annotation label and the second label respectively reflect a user's own object type from two different levels, and the possibility that the user belongs to each object type when considering the association between the user and the user, that is, The second label is a risk label in consideration of mutual influence between users.
  • the embodiment of the present application does not limit the specific acquisition method of the reference data set.
  • Step S140 According to the relevant object data of each object to be identified and each first sample object, determine a first association relationship between at least one object to be identified and each of the plurality of first sample objects.
  • Step S150 Determine the second label of each object to be recognized according to the first label of each object to be recognized, the labeled label and the second label of each first sample object, and the first association relationship.
  • Step S160 For each object to be recognized, determine the recognition result of the object to be recognized according to the second label of the object to be recognized.
  • the above-mentioned first association relationship between the at least one object to be identified and each of the plurality of first sample objects includes the association relationship between the objects to be identified, and the relationship between the object to be identified and the first sample object. connection relation.
  • the association relationship may also be called a social association relationship or an interaction association relationship.
  • the social relationship between the objects can be determined according to the related object data of the two objects.
  • the embodiment of this application does not limit the granularity of the association relationship.
  • the association relationship between objects may include whether there is an association relationship between objects or no association relationship between objects, and different types of association relationships may be further subdivided.
  • related object data may have multiple different types, and it may be determined according to each type of related object data whether there is an association relationship corresponding to that type between objects.
  • the related object data of an object may include various types of data such as transfer information of the object, red envelope (red envelope sending or receiving red envelope) information, entity information corresponding to the object, etc., wherein entity information refers to Entity information used when the object performs social behaviors, for example, the object's contact information, transaction account number (such as bank card number, virtual resource account number, etc.).
  • entity information refers to Entity information used when the object performs social behaviors, for example, the object's contact information, transaction account number (such as bank card number, virtual resource account number, etc.).
  • transfer information of each object it can be determined whether there is an association relationship corresponding to this type of behavior data between objects, and according to the red envelope information of each object, it can be determined whether there is a corresponding association relationship between objects. That is to say, one type of behavior data can correspond to one type of association relationship.
  • association relationships it is not necessary to classify the types of association relationships, and it is possible to determine whether there is an association relationship between objects based on various types of related object data of the objects, for example, any type of related objects of two objects If the data indicates that there is an association relationship between two objects, then it can be determined that there is an association relationship between the objects.
  • an object A is a risk object, such as an object with fraudulent behavior, another ordinary object A
  • object B an object without risk
  • object A for example, a payment has occurred between the two
  • object B may also become an object with potential risks, that is, the risk will be due to the interactive information between objects Propagation occurs.
  • the solution provided by the embodiment of the present application further considers the relationship between objects when determining the recognition result of the object to be recognized, so that the accuracy and comprehensiveness of object recognition can be improved.
  • the object recognition method provided by the embodiment of the present application considers the social behavior data of the object to be recognized and the social relationship between the object and other objects when identifying the unknown object to be identified.
  • Behavioral data reflects the social characteristics between the subject and other subjects, and the social characteristics of subjects with risk are usually different from those of subjects without risk, and the social characteristics of subjects belonging to different risk types are usually different , therefore, the risk type of the object can be preliminarily assessed based on the social behavior data of the object to be identified.
  • each object's own risk label (namely the first risk label of the object to be identified, the labeled label of the first sample object and the second risk label), can be predicted based on the social behavior data of the object to be identified.
  • the interaction between objects is integrated to determine a more accurate second risk label of the object to be identified, so as to obtain the risk assessment result of the object based on this label.
  • the method provided by the embodiment of the present application can realize the automatic recognition of the object to be recognized based on the reference data set and the related object data of the object to be recognized without relying on the loss report of other objects. Therefore, it can better meet the timeliness requirements in practical applications, and can predict risky objects in advance, that is, can be pre-identified, so that corresponding prevention can be based on the identification results, for example, identification If an object is identified as a risk object, when other objects conduct transactions with this object, they can carry out risk reminders to prevent the introduction of fraud traps, and can also control the risk object accordingly, or manually control the identified risk object. Further follow-up and verification to prevent attacks in advance. Furthermore, when performing risk assessment, the method in the embodiment of the present application can use the association relationship between objects to realize more comprehensive risk assessment on objects, and can effectively expand the coverage of risk object assessment.
  • the recognition result of the object can be determined based on the label.
  • the identification result may include whether the object is a risk object, that is, whether it is an object belonging to the target type, and when the object is a risk object, which type or types are the object types, or the second label may be directly used as the target type.
  • the recognition result of the recognized object through which the probability of the object belonging to each object type can be obtained.
  • the object type corresponding to the probability of the probability greater than or equal to the set threshold in the second label can be determined as the object type of the object to be recognized, or the object type corresponding to the maximum probability can be determined as the object type of the object to be recognized , if the object type with the highest probability is non-risky, it can be considered that the object is currently a non-risky object, that is, a non-target type. Of course, it is also possible to continue to perform follow-up judgment on non-risky objects.
  • the second label of each object to be identified is determined according to the first label of each object to be identified, the labeled label and the second label of each first sample object, and the first association relationship.
  • tags which can include:
  • the updated second labels of the objects having the first association relationship with the object to be identified are fused to obtain the second label of the object to be identified.
  • the second label of the object to be identified may be obtained by means of label propagation. Since objects with association relationships will affect each other, if an object is a risk object, then the risk type of the object, that is, the label, may also be propagated to other objects that are associated with it, that is, the object has an association with the object. The other objects of the relationship have a relatively high probability of being risky objects.
  • each object has its own label (the first label of the object to be identified, the label label and the second label of the sample object), based on the association relationship between the objects, at least one label propagation is performed, and then, For the object to be identified, the second label of the object can be obtained by fusing the labels of the objects (including the sample object and the object to be identified) associated with it.
  • the label propagation algorithm is a graph-based semi-supervised learning method, based on the information transferability of the knowledge map, and propagates the label information along the behavior path.
  • the basic idea is to use the label information of marked nodes to predict the label information of unmarked nodes, and the labels of nodes are passed to other nodes according to the similarity between nodes.
  • the label propagation algorithm is optimized. For the object to be identified, its first label will be predicted based on its related object data. On this basis, based on the relationship between objects, the risk between objects Label propagation, that is, the risk label of an object can be propagated to other objects associated with it. Wherein, the number of implementations for label propagation can be configured according to application requirements.
  • each label propagation includes the following operations:
  • the updated second label of the object is fused with the label label of the object to obtain the updated fifth label of the object, and when the fifth label of the object is used as the next label propagation, the The object's second label.
  • the update of its own second label can be realized according to the second label of each object that has an association relationship with it , such as the second label of each object that has an association relationship with it can be fused (such as adding and then performing normalization processing), to obtain an updated label, and then the updated label and the label of the object type to which it belongs (for example , the first risk label/annotation label) is fused to obtain the fused label of the object, which is the fifth label after the label propagation update. Then, for each object to be identified, the second label of the object is obtained by re-merging the fused fifth labels of the objects associated with it.
  • the above operation can be performed again based on the second label of each object (including the object to be recognized and the first sample object) obtained last time, and the object to be recognized obtained by the last propagation
  • the second label serves as the final second label.
  • the related object data includes at least one type of related object data
  • the first association relationship includes the type of association relationship corresponding to each type of related object data
  • the above-mentioned determination of the second label of each object to be identified according to the first label of each object to be identified, the label label and the second label of each first sample object, and the first association relationship includes:
  • the association relationship corresponding to each type of related object data can be determined respectively, so as to measure whether an object is associated with other objects in various social behaviors in a more fine-grained manner relationship, to more accurately and comprehensively characterize the social relationship of an object.
  • which type or types are specifically included in the above-mentioned specified types can be configured according to requirements, and the embodiment of the present application does not make a limitation.
  • the relevant object data can include multiple types of behavior data, and the specified type can be one of these multiple types. one or more species.
  • the embodiment of the present application does not limit the specific division method of the types of related object data, and the division rules of each data type can be set according to actual requirements and application scenarios.
  • each type of association has its own corresponding weight, so that it has different influences.
  • the relationship between capabilities plays a different role in the assessment of risk objects, which further improves the accuracy of object identification.
  • the method may also include:
  • the above-mentioned determination of the second label of each object to be identified according to the first label of each object to be identified, the label label and the second label of each first sample object, and the first association relationship includes:
  • the influence of an object refers to the size of the object's ability to influence other objects, which represents the social ability of the object from one level.
  • the relevant object data includes transfer information
  • a user who transfers money to more than 30 accounts and a user who transfers money to 2 accounts obviously have a significant difference. Differences in influence.
  • the labels of objects with different influences have different possibilities to affect other objects. Therefore, in order to more accurately evaluate the second label of the object to be identified, the embodiment of the present application further considers the Influence.
  • the influence of each object may be used to weight its label. For example, if a label propagation is performed, for each object in the object to be identified and the first sample object, the influence of the object can be used for its second label (for the object to be identified is its initial second label, That is, the first wind sign) is weighted, and then a label propagation is performed based on the weighted label. If label propagation is performed multiple times, weighting may be performed on the second label of the object that was propagated last time before label propagation is performed each time.
  • the related object data of an object includes at least one type of related object data
  • the first association relationship includes the type of association relationship corresponding to each type of related object data
  • the object and the influence of each of the plurality of first sample objects include the influence of each object corresponding to each type of association relationship.
  • the influence corresponding to each type of related object data can be determined according to the type of related object data, so as to measure an object in various social behaviors in a more fine-grained manner. influence, to more accurately and comprehensively characterize the influence of an object.
  • the final influence of the object can be obtained by fusing the influences corresponding to each type of the object, for example, the influences corresponding to each type can be multiplied.
  • the method may also include:
  • the first label of each object to be identified and the labeled label of each first sample object determine the proportion of the number of objects of each object type in at least one object to be identified and a plurality of first sample objects, the said The ratio of the number of objects includes the ratio of the number of objects of each object type to the total number of the at least one object to be identified and the plurality of first sample objects;
  • the above-mentioned determination of the second label of each object to be identified according to the first label of each object to be identified, the label label and the second label of each first sample object, and the first association relationship includes:
  • the second label of each object to be identified is determined according to the weighted first label of each object to be identified, the weighted labeled label and the second label of each first sample object, and the first association relationship.
  • each object has its own corresponding object type, that is, the first label of the object to be identified and the labeled label of the second sample object. Since the magnitudes of objects under different object types are usually different, for a certain object type, if the magnitude of the number of objects belonging to the object type is larger, then the label of the object type is propagated to the object to be identified The possibility will be greater.
  • the proportion of the number of objects of each object type is further considered, and the object label of the corresponding object type (the second label of the object to be identified First label, label label of the second sample object), so that the influence ability of the object label is positively correlated with the number of objects of the corresponding object type, which is more in line with the actual situation, so as to more accurately estimate the second label of the object to be identified.
  • the object to be identified of the corresponding object type and the object of the first sample object can be compared according to the number of objects of each object type Labels are weighted.
  • the reference data set can be obtained in the following ways:
  • the second training data set includes related object data of a plurality of first sample objects with annotation labels;
  • the updated fourth label of each first sample object is obtained by performing label propagation among a plurality of first sample objects; and for each A first sample object, according to the second association relationship, by fusing the fourth labels of the first sample objects that have an association relationship with the first sample object, a new third label of the first sample object is obtained.
  • some embodiments of the present application based on a large number of sample objects with labeled labels, consider the The mutual influence between objects (that is, the relationship between objects and the labeling of sample objects), adopts the method of label propagation between objects to update the labels of objects, until the preset conditions are met, based on the label As a result of the propagation, the final updated label of each object is obtained, and this label is used as the second label of the sample object. Since the label is based on the label label of the known object, the influence of label propagation between different objects is integrated.
  • the method further includes:
  • new data includes related object data of at least one fourth sample object with an annotation label
  • each fourth sample object in the newly added data as a newly added first sample object in the second training data set to update the second training data set;
  • the updated fourth label of each first sample object is obtained by performing label propagation among multiple first sample objects ,include:
  • the training data set can be updated by adding new sample data, that is, new data after each label propagation, which increases the number of sample data.
  • Quantity which incorporates the association relationship between more objects, so that the result of learning the risk label of the sample object is more general.
  • the labeling of each sample object in the above-mentioned newly added data is obtained through the following methods:
  • the first label of the fourth sample object is obtained through object recognition model prediction, and the fourth sample object The first label of the object is used as the labeled label of the fourth sample object.
  • the newly added data may be related object data of sample objects manually marked, or social behavior data of risk objects reported by the object.
  • the label of the newly added data may be the first label predicted by the trained object recognition model, and this label may be used as the label.
  • the method may also include:
  • meeting the preset conditions includes that the value of the loss function meets the set conditions;
  • the loss function includes a first loss function and a second loss function.
  • the value of the first loss function represents the difference between the labeled label of each first sample object and the new third label
  • the second loss The value of the function characterizes the difference between the new third risks for each pair of similar subjects.
  • the first loss function can be used to constrain the difference between the updated label of the sample object and its labeled label as close as possible
  • the second loss function can be used to constrain the updated labels of similar sample objects to be as similar as possible
  • adopting this scheme can make the label propagation learning have good accuracy and generalization ability, and better meet the application requirements.
  • whether two objects are similar can be determined according to specific types of related object data in the related object data of the objects, such as the similarity between specific types of related object data of two objects If the degree is greater than the set value, the two objects can be considered as a pair of similar objects.
  • the embodiment of the present application does not limit the specific type or types of features, which can be configured according to actual needs, for example, it can be the transfer data of the object.
  • the related object data of the object to be identified and the relationship between the object and other objects are considered at the same time. Since the related object data of an object reflects the The characteristics of objects of different object types are usually different. Therefore, the object type of the object can be preliminarily evaluated based on the relevant object data of the object to be recognized. However, the association relationship between an object and other objects will affect the object. Therefore, the method of the embodiment of the present application further considers the association relationship between objects and the label of each object itself (that is, the first label of the object to be identified).
  • the interaction between objects can be integrated, so as to obtain a more accurate Accurate recognition results.
  • the method of the present application there is no need to rely on the object's complaints and damage reports, and the early preventive identification of the object can be realized, which better meets the timeliness requirements, especially the timeliness requirements in the field of risk identification.
  • the object recognition method provided by the embodiment of the present application also proposes to build a user risk system (user identification system) through the construction and dissemination of user (ie object) tags, which can then be applied to identify fraud risks in advance, that is, it can identify risky users and their risk types.
  • the method provided by this application can be applied in the field of mobile payment.
  • the risk identification of commercial fraud and social fraud is often separated in related technologies.
  • Risky users/merchants which can be called risky users
  • the main tasks include but are not limited to social drainage, account maintenance, transaction guidance, fund transfer (that is, multiple target types, risk types of objects), etc.
  • risky users can be identified from different scenarios, and then the label propagation algorithm is used to spread the risky users, and the user risk system is constructed, and the user risk system is applied to the identification of fraud risks, for Excavating suspicious black products provides a new path.
  • Drainage As shown in Figure 2a, drainage is the main means for the illegal industry to find fraud targets. Risky accounts usually use large Internet platforms to publish a variety of attractive information, that is, inducing messages, and spread these messages to ordinary users. Once users are attracted to ask for detailed information, they start to use well-designed scams and words to carry out fraud. Such accounts are often dedicated to "phishing", and once the fraud is successful, the account will be canceled. Therefore, its social information (that is, the relevant object data corresponding to the account) is significantly different from the normal social account.
  • Account maintenance As shown in Figure 2b, account maintenance often occurs at the initial stage of risk merchant registration, in order to create a false impression that the merchant is operating well, or to reserve funds for later fund transfers, or to avoid risk control supervision , The illegal industry often makes multiple payments on merchants in advance. These transactions are often completed by a single account, and the transaction vouchers cannot be checked for a small amount of large amount or a small amount of multiple transactions. In some scenarios, these transactions may also be completed by multiple accounts, that is, multiple accounts.
  • Fund transfer including money laundering (an act of legalizing illegal gains).
  • the funds withdrawn will flow into other risk merchants or other risk accounts at the same time and when the risk merchant is punished, in order to ensure that the funds will not be frozen, the illegal industry may recover the funds reserved in the account maintenance link through refunds, as shown in Figure 2d, a risk merchant through the form of refund Return the funds to the corresponding account (the risk account shown in the figure), and these accounts can be transferred to other accounts/merchants for fund transfer (the ellipsis and arrow in the figure indicate that the account/merchant can further transfer funds), to achieve Transfer of Illegal Proceeds.
  • FIG. 3 shows a schematic structural diagram of an object recognition system applicable to the embodiment of the present application
  • FIG. 4 shows a schematic flowchart of an implementation of an object recognition method in this scenario.
  • the system can include a server 10 and a plurality of terminal devices (only terminal device 21 and terminal device 22 are shown in the figure), the terminal devices can communicate with the server 10 through the network, and the sample object library on the server 10 side 11 stores a large number of related object data of the first sample object with labels, that is, related object data of the sample user, that is to say, the sample object library 11 stores a reference data set.
  • the terminal device 21 and the terminal device 22 may be terminal devices of the object A to be identified and the object B to be identified.
  • the server 10 may be an application server having a mobile payment function and an application program with an interaction function between users, and the user of the terminal device, that is, the object, can interact through the application program, such as sending information to each other, adding friends, etc., Transactions and mobile payments can also be made through the app.
  • the server 10 can acquire the user-related information of the user, and implement the risk identification of the user by executing the method provided in the embodiment of the present application.
  • the implementation process of the method may include the following steps S1 to S5.
  • Step S1 Obtain an object recognition model based on training data set training.
  • black industry also called: black industry/illegal industry/malicious industry
  • risk accounts depict black industry users, i.e. risk users
  • risk accounts show different characteristics.
  • model training can be performed separately according to different types of risk accounts (that is, different object types).
  • the training of the model can be completed by the server 10 or by other electronic devices, and the server 10 can predict the risk type of the object by invoking the trained object recognition model.
  • the training steps of the model executed by the training device 30 are taken as an example for illustration.
  • Model grouping that is, the type division of objects, that is, risk accounts are divided into risk accounts of various risk types.
  • risk accounts different types of risk users (that is, risk accounts) are grouped according to the life cycle of the illegal industry.
  • the risk account responsible for fund transfer needs to achieve a closed loop between the inflow and outflow of funds. Therefore, it has similar characteristics to the risk account of the raised account, but has different behaviors in different time windows, that is, the risk account of the raised account usually appears in early stage. Therefore, the time window can be used to distinguish the two types of risk users for model training.
  • the risk accounts of the drainage type and the risk accounts of the guide payment are also trained separately.
  • the training data set also includes non-risk accounts during model training, that is, users of non-target types.
  • This step can be completed manually or by an electronic device according to a set division rule.
  • the accounts can be grouped according to risk types and marked, and the classification model can be trained based on the related object data of these marked accounts to obtain the object recognition model.
  • the risk account and the normal account that is, the account without risk, that is, the sample object without risk
  • a risk type that is, with a label
  • the relevant object data of these accounts that is, the second sample object
  • the interaction information between the account and other accounts such as social information, payment behavior information, etc.
  • payment behavior information refers to interaction information related to payment/transaction, which may include payment from this account to other accounts, or payment from other accounts to this account, etc.
  • Social information refers to interactive information other than payment behavior information, for example, friend information/friend degree, activity degree, etc. of the account.
  • risk accounts basically induce users to conduct transactions by means of chatting and posting virtual information.
  • the related object data of risk accounts will be significantly different from that of normal social accounts, and the related object data of different types of risk accounts The object data will also show different characteristics. Therefore, the model can be trained by using the related object data of the marked risk account and the normal account as the sample data of the training model.
  • sample data may also include social behavior data of multiple accounts of unknown risk types (corresponding to the third sample object mentioned above).
  • Model training that is, use the above sample data for model training, and when the training meets certain conditions, use the model (that is, the first classification model in the preceding text) to mark accounts with unknown risk types, and thus obtain the Labeled accounts of unknown risk types, that is, pseudo-labels.
  • the input of the model is the related object data of the account or the related object data after preprocessing, and the output of the model is the predicted risk type of the account, that is, the first label.
  • Model checking train the pseudo-label and the marked sample together, and when the model achieves the expected effect, stop the training and obtain the object recognition model.
  • FIG. 5 shows a schematic diagram of the principle of a model training method in some embodiments provided by the embodiment of the present application.
  • the related object data of the risk account and the related object data of the normal account (the label of which indicates no risk)
  • the unlabeled sample represents the related object data of the risk account of the above unknown risk type
  • the machine learning model is the object recognition model to be trained , it can be seen from the figure that the labeled samples include sample data of various risk types (category 1, category 2, ... shown in the figure).
  • the model When training the model, first use labeled samples for repeated training until the first training end condition is met (for example, one or more preset training indicators meet certain conditions), and the first classification model is obtained. After that, the model is used to Predict the label of the unlabeled sample. Specifically, the relevant object data of the unlabeled sample can be input into the model to obtain the predicted first label, and the first label can be used as a pseudo-label of the unlabeled sample to obtain a pseudo-labeled sample. Afterwards, based on the labeled sample data and these pseudo-labeled sample data, the model continues to be iteratively trained until the effect of the model meets expectations, such as the convergence of the loss function of the model, and a trained object recognition model is obtained.
  • the relevant object data of the unlabeled sample can be input into the model to obtain the predicted first label
  • the first label can be used as a pseudo-label of the unlabeled sample to obtain a pseudo-labeled sample.
  • the model based
  • Step S2 Construct a reference dataset based on label propagation.
  • this step can be performed by the server 10 or by other electronic devices, and the constructed reference data set is provided to the server 10 for use.
  • the construction of the reference data set is also completed by the training device 30 .
  • the method of user identification through semi-supervised learning helps to solve the problem of timeliness in discovering user risks.
  • risk identification model in order to ensure the accuracy of model training, different types of risk users are labeled separately from each other, which will limit the expansion of the risk user system.
  • the behavioral characteristics of black products in the process of using illegal accounts for operations, the behavioral characteristics of black products will continue to mutate. Therefore, the method of user risk identification only with the help of models is not conducive to the long-term operation of the user risk system. Based on this, in this step, user risk labels can be diffused based on the information transferability of the knowledge graph.
  • risk accounts play different roles in the entire life cycle of black industry, and based on the characteristics of user social interaction and payment behavior, different types of users can be marked with the help of semi-supervised learning.
  • marked users that is, users with labels
  • the user's risk label can be spread based on the relationship between users, such as entity association, capital flow (such as transfer, red envelope, etc.).
  • each node in Figure 6 represents a user, and the figure shows users of the first target type (such as risky users of account maintenance), users of the second target type (such as drainage Risk users of the class) and users of the third target type (such as risk users of the guide transaction class), these three known risk types of users, as well as some users of unknown risk types (unknown users), there may exist between users Association (the association relationship can be determined according to the user's social behavior data), and the risk labels between users with the association relationship can be transferred, as shown in Figure 6, the risk labels of users with known risk types will transfer their risk Tags are passed to unknown users who are associated with them, and tags are also passed between users with associated relationships of known risk types.
  • the first target type such as risky users of account maintenance
  • users of the second target type such as drainage Risk users of the class
  • users of the third target type such as risk users of the guide transaction class
  • Figure 7a to Figure 7c schematically show several examples of risk label propagation, wherein, Figure 7a is an example of one-way risk label propagation, between a target type of user (such as a risk user who maintains an account) and an unknown user If there has been a fund transfer (such as a transfer transaction), the risk tag (A target type tag, such as a maintenance number tag) of the risk user will be passed on to the unknown user.
  • Figure 7b is an example of circular propagation of multi-type risk labels. If a user of target type A and a user of target type B (such as a risky user of fund transfer category) have transferred funds, the risk labels of the two will be transmitted to each other. At the same time, it is also possible for the two to have tag transfers with unknown users.
  • Figure 7c is an example of multi-source risk label propagation.
  • the risk label of an unknown user may be obtained through more than one path, and risk users of different risk types (users of the A target type and B target type shown in the figure) may be both Associated with the same unknown user, the tag information of these risky users will also be passed on to the unknown user.
  • Label propagation can be carried out according to the association relationship between users through multiple rounds of iterations.
  • the relationship can be divided into various types of relationship.
  • the relationship of objects can be divided into three types: resource gift relationship such as red envelope relationship, resource transfer relationship such as transfer relationship and entity relationship. Red envelope relationship
  • resource gift relationship such as red envelope relationship
  • resource transfer relationship such as transfer relationship
  • entity relationship Red envelope relationship
  • Both the relationship and the transfer association are divided according to the flow of resources or funds. If two users (namely accounts) have sent or received red envelopes between them, it is considered that there is a red envelope association between them. If two If there has been a transfer (including payment transfer or other transfer methods) between users (that is, accounts), it is considered that there is a transfer relationship between the two.
  • Entity association means that if two users are associated with the same entity (for example, both have used the same contact information), it is considered that there is an entity association between the two users.
  • Loss calculation Calculate Loss(n) based on the result set p(n)
  • f(0) represents the labeled label of each first sample object in the initialization stage
  • f(n) represents the updated label of each first sample object obtained after n rounds of label propagation
  • the user association relationship R is The second association relationship
  • the result summary refers to the step of obtaining the fused risk label p(n) corresponding to the object by fusing the updated labels of each object that has an association relationship with the object for each sample object , in the next round of label propagation, it is based on the relationship between the fused label corresponding to each sample object and the sample object until the loss function meets the set conditions, if it reaches the minimum, that is, the value of the loss function is no longer When decreasing, the iteration is completed, and the fused label of each sample object corresponding to the minimum value of the loss function is used as the second label of each sample object.
  • Set I represents the set of all tagged users, that is, the number of first sample objects
  • S represents the set of all similar users in set I, that is, the set of similar object pairs.
  • y i is the annotation label of the i-th user/account; is the predicted label of the i-th user predicted by the label propagation algorithm (ie, the above-mentioned fused label).
  • the label propagation algorithm ie, the above-mentioned fused label.
  • object types that is, risk types, y i and Both can be a one-dimensional vector, and the vector has 4 element values in total.
  • the value of the element value corresponding to the user's label in y i is 1, and the other 3 values are all 0.
  • ⁇ i represents the importance of the i-th risk tag, that is, the importance of the i-th tagged user.
  • the importance of a user can be determined according to the user-related data, and the specific calculation method is not limited. For example, in the process of fund transfer, when a risk user transfers a larger amount of funds, it can be considered that the validity of the risk information is stronger, and the importance of the user is greater.
  • w a, b represent the similarity between two users a, b (any pair of similar objects), in some embodiments, it can be represented by the coincidence degree of fund-related accounts: That is, the number of intersections of fund transfer accounts (the number of fund transfers between these two users)/the union number of fund transfer accounts (the total number of fund transfers between these two users and all users), that is, priority attention A user relationship pair with a high degree of coincidence of funds account. That is to say, when the account overlap between two users is higher, the risk types of the two users are likely to be the same.
  • ⁇ r represents the impact factor of association type r (that is, the weight of each type of association relationship). Due to the different degree of influence of different association types, the number of users with physical associations is small, and there are large differences in the limit of funds for red envelopes and transfers.
  • the impact factor is used to adjust the combined weight of different association types.
  • the value of the impact factor of each type of association can be set according to requirements or experience. For example, the factor of the entity association type has a larger value, and the factor associated with the transfer can be greater than the factor associated with the red envelope.
  • P r represents the influence matrix of the association type r (that is, the influence of the object corresponding to each type of association relationship), a user who transfers to more than 30 accounts and a user who transfers to 2 accounts obviously have a significant difference in influence.
  • the user's influence weight is described by the influence matrix, for example, the number of accounts associated with the user is standardized to obtain the user's influence weight.
  • P r can be expressed as a vector with N element values, for example, the number of rows of the vector is N, and the number of columns is 1.
  • the element value of each row represents a user corresponding to the type
  • the degree of influence of the association relationship that is, the influence of the user in the corresponding type of social behavior.
  • Q r represents the path of label propagation.
  • the matrix Q r has N ⁇ N dimensions. If account i transfers money to 10 accounts, the values of the 10 transfer account columns in the row corresponding to account i in Q r are all 0.1, and the other columns are all 0. i has no association relationship, and the account corresponding to the element with a value other than 0 indicates that the account is associated with account i, and the value of the element represents the magnitude of the association, which is the value used to represent the association relationship during calculation.
  • association type r is an entity association, assuming that account i has entity associations with 5 accounts, the corresponding value is 0.2, and all others are 0.
  • f(n) represents the result of the nth round of label propagation, the result of the n+1th round is propagated through the nth round of results and the addition of marked user tags, that is, the addition of new data.
  • the number of users in set I is N, after obtaining the propagation result of this propagation, if the number of newly added sample objects is M, then the users in set I in the next round of label propagation
  • the quantity is N+M.
  • W y represents the weight of the risk type (that is, the proportion of the sample objects of each risk type in the set I). Since the magnitude of accounts under different risk types is different, it is necessary to use the weight for standardization.
  • y represents the labeled user matrix, which is the labeled label of each sample object in the set I.
  • a normalized weight can be calculated for different risk types based on the number of tagged users of each risk type. For example, there are 4 risk types, and the number of tagged users of each risk type is divided into a1, a2, a3, a4, then the weight of the i-th risk type can be expressed as:
  • Y is the labeled label matrix of all users in set I. Assuming that there are N users with labeled labels in the first round of label propagation, and there are 4 types of risks, the matrix can be a matrix with N rows and 4 columns, and each As a label for a user, one of the element values in each row has a value of 1, and the other three are 0.
  • the risk type corresponding to the element value of 1 is the real object type of the sample object. Assuming that the number of users with labels in the second round of label propagation is N+m, then Y can be a matrix with N+m rows and 4 columns.
  • the labels of each user in the set I can be continuously updated through multiple iterations.
  • represents a standardized function, such as a softmax function
  • a represents a user that has an association relationship with user x. It can be seen from this expression that the second risk label of user x can be obtained by fusing and normalizing the updated labels of all users associated with user x. All associated accounts of a user are associated users, which are users corresponding to non-zero values in the row corresponding to the user in the matrix Q r .
  • the corresponding result f(n) is obtained for each iteration.
  • the vector f(n) can be an N row and 4 columns (or 4 rows N column), the four values in the i-th row (which can be referred to as user vectors for short) respectively represent the probability that the i-th user belongs to the four risk types.
  • f(n) is obtained, for the i-th user, sum and standardize the user vectors of the associated users to obtain the prediction vector of the i-th user, that is, to calculate the loss function corresponding to this iteration. use Assuming that user i has 3 associated users, the user vectors of these 3 users are superimposed and then standardized.
  • the user vector of each user obtained when the Loss no longer decreases is used as the final risk label (ie, the second label) of the labeled users. That is, when it is applied to predict the recognition result of the object to be recognized, refer to the second label of the sample object in the data set. Assuming that there are 5,000 label users in the last round of iteration, the user vector p(n) of 5,000 users will be obtained in the end.
  • the annotation tags, second tags, and related object data of these 5,000 users can be used as a reference data set.
  • Step S3 The server 10 acquires related object data of the user to be identified, that is, user related data.
  • Step S4 The server 10 invokes the object recognition model to predict the first label of the user to be recognized.
  • the relevant object data of each user to be identified is input into the object user identification model, and the initial risk label of each user to be identified is obtained through model prediction, that is, the first label, that is, the initial risk label of each user to be identified through the model.
  • model prediction that is, the first label, that is, the initial risk label of each user to be identified through the model.
  • Step S5 The server 10 determines the second tag of the user to be identified based on the reference data set.
  • the server 10 predicts the final risk label of each user to be identified, that is, the second label, based on the reference data set and the relevant object data of the user to be identified, and determines the identification result of the user to be identified according to the final risk label.
  • This step can include:
  • a. Determine multiple types of associations between each user to be identified and other users (including other users to be identified and sample objects), including but not limited to the above-mentioned entity associations, resource gift associations such as red envelope associations, resource
  • the transfer association relationship is such as the transfer association relationship and the like.
  • step S32 According to the following label propagation formula and the first risk label of each user to be identified obtained in step S32, through at least one label propagation, the second label of each user to be identified is obtained:
  • the number of nodes (ie, the number of users) in the user relationship network is M+N.
  • ⁇ r represents the influence factor of the association type r
  • the influence factor corresponding to each type of association relationship can be preset according to actual needs or experimental values, and can be compared with the previous iterative phase ⁇ r is the same.
  • the influence matrix P r for each user among the M+N users, according to each type of association relationship between the user and other users, it can be determined that the user corresponds to each type of association relationship Impact factor (aka influence or influence weight). Similarly, according to the association relationship between each user and other users, the propagation path Q r of the user in tag propagation can be determined.
  • an influence matrix P r can be obtained, and there are M+N values in the matrix, indicating the respective influence weights of the M+N users.
  • Q r is a (N+M) ⁇ (N+M)-dimensional matrix.
  • W y is the weight of the risk type, and its value is the same as that of the iteration stage.
  • Y in the application stage is the initial risk label of N+M users.
  • the initial risk label is the first label predicted by the object recognition model.
  • the sample user it is the labeled label of the sample user.
  • f(n) the second labels of each sample user, that is, the second labels of N sample users (ie, the last round of iteration ) and the first labels of the M users to be identified.
  • f(n+1) is a matrix of (N+M) ⁇ k, k represents the number of risk types, such as 4 types, if only Carry out a label propagation, according to f(n+1), through
  • the final result vector of each user to be identified can be calculated. That is, the second label of each user to be identified, the vector includes k probability values, and the risk type corresponding to the maximum probability value or the probability value exceeding the threshold can be determined as the risk type of the user to be identified.
  • the result vector of each user (including the user to be identified and the sample user) obtained by the first label propagation is used as the initial f(n) of this propagation Value, update the label again based on the label propagation formula, repeat the operation until the number of propagation times reaches the set number of times (that is, the maximum number of times of propagation set in advance), and use the result vector of the user to be recognized last obtained as the user to be recognized Second tab.
  • the result vectors of each user should be calculated one by one, and the order of calculation is not limited. However, for a user, after the corresponding result vector has been calculated, it will not be calculated again because the result vectors of the associated users change again.
  • the method provided in the embodiment of this application is based on the life cycle of the illegal industry for the first time to disassemble, and model identification and labeling for different types of risk accounts, and then based on the association between different types of risk users, innovatively adopts the method based on user association.
  • the relational tag propagation algorithm realizes the dissemination of user risk tags and improves the risk user system. Based on this method, it not only portrays user portraits of different risk types, but also ensures the long-term operation and maintenance of risk user tags, which can be better applied in In the strategic attack on risky users, a new idea is provided for identifying risky users in advance.
  • the solution provided by the embodiment of the present application has at least the following advantages:
  • risk identification of users can be realized at any stage through the similarity analysis of risk users, that is, correlation analysis, without relying only on lagging information such as customer complaints.
  • users under different risk types can be used to carry out pre-identification and strategy strikes of fraudulent transactions in different scenarios, which is better applicable to different fraud scenarios and strike methods, and can improve the timeliness of fraudulent identification strategies and identify fraudulent behaviors accuracy.
  • risk label propagation is carried out with the help of information association between users, and the coverage of risk users is expanded.
  • the constructed user risk system can describe the risk attributes of all users who have transactions (such as mobile payment), and it also has many applications for pre-identifying fraud risks. For example:
  • merchants with fraud risks can be identified in advance through the user's previous payment behavior on merchants. It can pre-identify merchants with frequent transactions by users, and identify merchants who may conduct fraudulent transactions later in the stage of merchant account maintenance, and punish merchants.
  • the model can not only Predict various types of risk accounts, and can also identify types of accounts such as trumpet/zombie accounts.
  • the embodiment of the present application also provides an object recognition device.
  • the object recognition device 100 may include a first prediction module 110, a reference data set acquisition module 120 , a second prediction module 130 and a recognition result determination module 140 .
  • the first prediction module 110 is configured to obtain the related object data of at least one object to be identified; for each object to be identified, the first label of the object to be identified is obtained through object recognition model prediction based on the related object data of the object to be identified, The first label characterizes the object type to which an object belongs among multiple object types;
  • the reference data set acquisition module 120 is configured to acquire a reference data set, the reference data set includes related object data and second labels of a plurality of first sample objects with labels, and the label label of a first sample object represents The real object type to which the first sample object belongs among the multiple object types, and the second label characterizes the probability that an object belongs to each object type among the multiple object types;
  • the second prediction module 430 is configured to determine the first association between at least one object to be identified and each of the plurality of first sample objects according to the related object data of each object to be identified and each first sample object relationship, determining the second label of each object to be identified according to the first label of each object to be identified, the label label and the second label of each first sample object, and the first association relationship;
  • the recognition result determination module 140 is configured to determine the recognition result of each object to be recognized according to the second label of each object to be recognized.
  • the second prediction module can be specifically used for:
  • the second prediction module may perform the following operations during each label propagation:
  • the second label of the object is updated; for each For an object, fuse the updated second label of the object with the label label of the object to obtain a fifth label of the object, and use the fifth label of the object as the second label of the object in the next label propagation.
  • the related object data includes at least one type of related object data
  • the first association relationship includes the type of association relationship corresponding to each type of related object data
  • the second prediction module can be used to :
  • the second prediction module may be used to: for at least one object to be identified and each object in the plurality of first sample objects, determine the influence of the object according to the relevant object data of the object; The first label of each object to be identified, the labeled label and the second label of each first sample object, the influence of each object to be identified and the first sample object, and the first association relationship, determine each object to be identified The second label that identifies the object.
  • the related object data includes at least one type of related object data
  • the first association relationship includes the type of association relationship corresponding to each type of related object data
  • at least one object to be identified and a plurality of first association relationships
  • the influence of each object in a sample object includes the influence of each object corresponding to each type of association relationship.
  • the second prediction module may be used to: determine at least one object to be identified and multiple first samples according to the first label of each object to be identified and the labeled label of each first sample object The proportion of the number of objects of each object type in the object; using the proportion of the number of objects of each object type as a weight, weighting the first label of the corresponding object type in at least one object to be identified, and weighting the first label of a plurality of first sample objects According to the weighted first label of each object to be identified, the weighted label label and second label of each first sample object, and the first association relationship, determine each The second label of the object to be recognized.
  • the object recognition model is obtained by the model training module by performing the following operations:
  • the first training data set includes related object data of a plurality of second sample objects with labeled labels, and related object data of a plurality of unlabeled third sample objects, and a plurality of second sample objects
  • the real object types of include each of a variety of object types
  • the initial classification model is trained until the first training end condition is satisfied, and the first classification model is obtained; for each third sample object, based on the related objects of the third sample object Data, the object type of the object is obtained through the prediction of the first classification model, and the label label of the object is determined according to the third sample object type; The relevant object data of the three-sample objects continue to train the first classification model until the second training end condition is satisfied, and an object recognition model is obtained.
  • the reference data set is obtained by the reference data set obtaining module through the following methods:
  • the second training data set includes related object data of a plurality of first sample objects with labels; according to the related object data of each first sample object, determine each of the second training data set The second association relationship between the first sample objects; use the label label of each first sample object as the initial third label of the first sample object, and repeat the following operations until multiple first sample objects
  • the updated third label satisfies the preset condition, and the third label of each first sample object when the preset condition is met is determined as the second label of the first sample object: based on the second association relationship and each first sample object
  • the labeled label and the third label of a sample object are obtained by label propagation among multiple first sample objects, and the updated fourth label of each first sample object is obtained; for each first sample object , according to the second association relationship, by fusing the fourth labels of the first sample objects that have an association relationship with the first sample object, a new third label of the first sample object is obtained.
  • the reference data set acquisition module can also be used for:
  • new data is obtained, and the new data includes related object data of at least one fourth sample object with a labeled label; each fourth sample object in the new data is used as the second training
  • the newly added first sample object in the data set to update the second training data set; according to the related object data of each first sample object in the updated second training data set, determine each of the updated second training data set
  • the second association relationship between the first sample objects is obtained to obtain the updated second association relationship;
  • the reference data set acquisition module obtains the updated fourth label of each first sample object, it can be used for:
  • the labels of the fourth sample objects in the newly added data are obtained in the following manner:
  • the reference data set acquisition module is also used to:
  • the loss function includes a first loss function and a second loss function.
  • the value of the first loss function represents the difference between the labeled label of each first sample object and the new third label
  • the second loss The value of the function characterizes the difference between the new third labels for each pair of similar objects.
  • the device in the embodiment of the present application can execute the method provided in the embodiment of the present application, and its implementation principle is similar.
  • the actions performed by the modules in the device in the embodiments of the present application are the same as the steps in the methods of the embodiments of the present application
  • the detailed functional description of each module of the device reference may be made to the description in the corresponding method shown above, which will not be repeated here.
  • the embodiment of the present application also provides an electronic device, which may include a memory and a processor, wherein a computer program is stored in the memory, and the processor runs The computer program is used to execute the method provided in any optional embodiment of the present application, or to execute the actions performed by the device provided in any optional embodiment of the present application.
  • FIG. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 4000 includes a processor 4001 and a memory 4003 .
  • the processor 4001 is connected to the memory 4003 , such as through a bus 4002 .
  • the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as sending data and/or receiving data.
  • the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 does not limit the embodiment of the present application.
  • Processor 4001 can be CPU (Central Processing Unit, central processing unit), general processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), FPGA (Field Programmable Gate Array , Field Programmable Gate Array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor 4001 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and the like.
  • Bus 4002 may include a path for communicating information between the components described above.
  • the bus 4002 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, etc.
  • the bus 4002 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.
  • Memory 4003 can be ROM (Read Only Memory, read-only memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory, random access memory) or other types of memory that can store information and instructions Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory, Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory, CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, compact disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and can be read by a computer, without limitation .
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • magnetic disk storage media including compressed optical disc, laser disc, compact disc, digital versatile disc, blu-ray disc, etc.
  • magnetic disk storage media
  • the memory 4003 is used to store the computer programs for executing the embodiments of the present application, and the execution is controlled by the processor 4001 .
  • the processor 4001 is configured to execute the computer program stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
  • An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored.
  • the computer program is executed by a processor, the steps and corresponding contents of the aforementioned method embodiments can be realized.
  • the embodiment of the present application also provides a computer program product, including a computer program.
  • a computer program product including a computer program.
  • the steps and corresponding content of the aforementioned method embodiments can be realized.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method provided in any optional embodiment of the present application.
  • arrows indicate various operation steps in the flow chart of the embodiment of the present application
  • the execution order of these steps is not limited to the order indicated by the arrows.
  • the implementation steps in each flowchart may be performed in other orders as required.
  • part or all of the steps in each flow chart may include multiple sub-steps or multiple stages based on actual implementation scenarios. Some or all of these sub-steps or stages may be executed at the same time, and each of these sub-steps or stages may also be executed at different times. In scenarios where execution times are different, the execution order of these sub-steps or stages can be flexibly configured according to requirements, which is not limited in this embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供了一种对象识别方法、装置、电子设备及存储介质,涉及金融支付、支付安全、大数据、云技术、区块链、车载终端及人工智能等领域。该方法包括:获取待识别对象的相关对象数据;基于每个待识别对象的相关对象数据,通过对象识别模型预测得到各待识别对象的第一标签,获取包括多个带有标注标签的第一样本对象的相关对象数据和第二标签的参考数据集,根据待识别对象和第一样本对象的相关对象数据,确定待识别对象和第一样本对象中各对象之间的第一关联关系;根据待识别对象的第一标签、第一样本对象的标注标签和第二标签以及第一关联关系,得到待识别对象的识别结果。

Description

对象识别方法、装置、电子设备及存储介质
本申请要求于2021年9月22日提交中国专利局、申请号为202111109153.6名称为“对象识别方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及移动支付、支付安全、大数据、车载终端和人工智能等技术领域,具体而言,本申请涉及一种对象识别方法、装置、电子设备及存储介质。
背景
随着科学技术的飞速发展,在线支付、转账等已经成为人们生活中非常常见的场景。在科学技术为人们生活带来便利的同时,网络欺诈的形式和手段也是层出不穷。如果有效的预防、避免各种商业欺诈行为,识别存在欺诈行为的用户,一直是相关技术人员研究的非常重要的问题之一。
技术内容
本申请实施例提供了一种对象识别方法,该方法包括:
获取至少一个待识别对象的相关对象数据;
对于每个待识别对象,基于该待识别对象的相关对象数据通过对象识别模型预测得到该待识别对象的第一标签,所述第一标签表征了在多种对象类型中一个对象所属的对象类型;
获取参考数据集,参考数据集中包括带有标注标签的多个第一样本对象的相关对象数据和第二标签,一个第一样本对象的标注标签表征了在多种对象类型中该第一样本对象所属的真实对象类型,所述第二标签表征了一个对象属于多种对象类型中的每种对象类型的概率;
根据每个待识别对象和每个第一样本对象的相关对象数据,确定至少一个待识别对象和多个第一样本对象中各对象之间的第一关联关系;
根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签;
对于每个待识别对象,根据该待识别对象的第二标签,确定出待识别对象的识别结果。
本申请实施例提供了一种对象识别装置,该装置包括:
第一预测模块,用于获取至少一个待识别对象的相关对象数据;对于每个待识别对象,基于该待识别对象的相关对象数据通过对象识别模型预测得到该待识别对象的第一标签,所述第一标签表征了在多种对象类型中一个对象所属的对象类型;
参考数据集获取模块,用于获取参考数据集,参考数据集中包括带有标注标签的多个第一样本对象的相关对象数据和第二标签,一个第一样本对象的标注标签表征了在多种对象类型中该第一样本对象所属的真实对象类型,所述第二标签表征了一个对象属于多种对象类型中的每种对象类型的概率;
第二预测模块,用于根据每个待识别对象和每个第一样本对象的相关对象数据,确定至少一个待识别对象和多个第一样本对象中各对象之间的第一关联关系,根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签;
识别结果确定模块,用于根据每个待识别对象的第二标签,确定每个待识别对象的识别结果。
本申请实施例还提供了一种电子设备,该电子设备包括存储器、处理器及存储在存储器上的计算机程序,处理器执行计算机程序以实现本申请实施例提供的方法的步骤。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本申请实施例提供的方法的步骤。
本申请实施例还提供了一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现本申请实施例提供的方法。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述本申请实施例中提供的方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。
图1为本申请实施例提供的一种对象识别方法的流程示意图;
图2a至图2d为本申请示例中提供的几种对象类型的对象的示意图;
图3为本申请实施例提供的一种对象识别系统的结构示意图;
图4为本申请实施例提供的一种对象识别方法的流程示意图;
图5为本申请实施例提供的一种对象识别模型的训练方法的原理示意图;
图6为本申请示例中提供的标签传播的原理示意图;
图7a至图7c为本申请示例提供的几种不同的导致标签传播的示例的示意图;
图8为本申请实施例提供的一种对象识别装置的结构示意图;
图9为本申请实施例适用的一种电子设备的结构示意图。
实施方式
下面结合本申请中的附图描述本申请的实施例。应理解,下面结合附图所阐述的实施方式,是用于解释本申请实施例的技术方案的示例性描述,对本申请实施例的技术方案不构成限制。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请实施例所使用的术语“包括”以及“包含”是指相应特征可以实现为所呈现的特征、信息、数据、步骤、操作、元件和/或组件,但不排除实现为本技术领域所支持其他特征、信息、数据、步骤、操作、元件、组件和/或它们的组合等。应该理解,当我们称一个元件被“连接”或“耦接”到另一元件时,该一个元件可以直接连接或耦接到另一元件,也可以指该一个元件和另一元件通过中间元件建立连接关系。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的术语“和/或”指示该术语所限定的项目中的至少一个,包括一个或更多个相关联的列出项的全部或任一单元和全部组合,例如“A和/或B”指示实现为“A”,或者实现为“A”,或者实现为“A和B”。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。为了更好的理解相关技术,首先对本申请涉及的一些技术用语进行介绍:
本申请是针对目标类型的对象(如风险对象,即存在欺诈行为的对象/用户,指利用非法/违背社会道德的手段获利的用户)识别方式中存在的问题、为了更好的满足风险识别需求提出的一种对象(即存在欺诈风险(指黑产通过诱导、虚假信息等手段非法获取用户资产的交易风险)的对象)识别方法。目前,对于风险用户的识别,往往是借助于其他用户报损、用户自身的交易行为等,用户风险标签(对存在欺诈行为的用户进行的标记)之间彼此割裂,在识别欺诈风险时,仅借助单一的用户风险标签与其他用户或商户进行关联风险识别。在以往的实践中,用户只是作为单项风险传导的媒介,用户标签维护成本高、耗时耗力。相关的风险对象识别方式中至少存在以下问题:
1)时效性较差:在黑产(黑色产业/非法产业/恶意产业,指利用非法/违背社会道德的手段获利的行业)的整个生命周期中,黑产往往会在同一时期批量开展欺诈行为。依赖于其他用户报损的识别方式,当一个风险用户被标记时,同时期的商户很可能已完成了整个欺诈流程,出现大批量的报损,不能够提前预防,大大影响了对黑产资金的控制。
2)覆盖率不足:由于目前大多欺诈行为都是基于互联网技术的,账号的注册成本几乎为0,为了更快更有效率地开展欺诈交易和资金转移,黑产往往拥有大量号源。而依赖于客户投诉、关联黑商户(存在欺诈行为的商户)来识别风险用户的方案具有较大的局限性,无法全面的覆盖到黑产 账号。
3)关联性不强。相关的用户风险标签建设往往根据不同的业务场景彼此独立,尽管在用户风险识别的过程中,线索的来源各有不同,但通过大量的实践可以发现,不同的风险用户可能作用于同一场欺诈案例的不同环节,而不同的风险用户之间也存在社交信息、交易行为等微妙的联系,但相关的识别方式无法实现不同业务场景中的关联性识别。
为了解决相关技术中存在的多个问题中的至少一项,以更好的满足风险识别需求,本申请提供了一种新的对象识别方法,基于该方法可以打造风险用户关系网络,不仅有助于构建出用户风险体系,更能明晰黑产的生命周期,为预先识别欺诈风险提供了新路径。
在一些实施例中,本申请实施例提供的对象识别方法,能够更好的满足对象识别的时效性和覆盖率等方面的需求。该方法可以应用于大数据(Big data)的处理,如可以基于云技术(Cloud technology)实现。本申请实施例中所涉及的数据计算可以采用云计算(Cloud computing)的方式。比如,对象识别模型的训练、基于标签传播确定对象的标签等步骤的计算可以采用云计算。
大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合,是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。随着云时代的来临,大数据也吸引了越来越多的关注,大数据需要特殊的技术,以有效地处理大量的容忍经过时间内的数据。适用于大数据的技术,包括大规模并行处理数据库、数据挖掘、分布式文件系统、分布式数据库、云计算平台、互联网和可扩展的存储系统。其中,云技术是基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。
在一些实施例中,本申请实施例提供的方案,还可以基于人工智能(Artificial Intelligence,AI)技术实现,比如,可以通过训练好的风险识别模型预测对象的第一风险标签,还可以采用机器学习的方式基于损失函数获取参考数据集。人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
在一些实施例中,本申请实施例中所涉及的数据(如对象的相关对象数据)的存储可以采用云存储或基于区块链的存储,可有效保护数据的安全。其中,区块链是指是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Block chain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。
下面通过对几个示例性实施方式的描述,对本申请实施例的技术方案以及本申请的技术方案产生的技术效果进行说明。需要指出的是,下述实施方式之间可以相互参考、借鉴或结合,对于不同实施方式中相同的术语、相似的特征以及相似的实施步骤等,不再重复描述。
图1示出了本申请实施例提供的一种对象识别方法的流程示意图,该方法可以由任一电子设备执行,如该方法也可以由服务器执行,该服务器可以是云服务器,也可以是物理服务器或服务器集群,该方法可以实现为一个应用程序或者作为已有应用程序的一个插件或功能模块,比如,可以作为交易类(如移动支付)应用程序的一个新增功能模块,应用程序的服务器可以通过执行本申请实施例的方法,实现对待识别对象的标签的识别,识别出待识别对象是否为目标类型的对象,如是否为非风险对象、以及在该对象是风险对象时其所属的风险类型(对象类型,也就是对象是存在哪种欺诈行为的对象)。该方法也可以由终端设备执行,终端设备可以通过执行该方法,识别出待识别对象的标签,得到识别结果。其中,终端设备包括用户终端,用户终端包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端等。在一些实施例中,在实际应用中,为了更好的保证对象信息的安全性,该方法可以由服务器执行。
如图1中所示,本申请实施例提供的该对象识别方法可以包括以下步骤S110-步骤S140。
步骤S110:获取至少一个待识别对象的相关对象数据。
其中,本申请实施例中的对象可以包括但不限于用户、商户等,一个对象可以通过其对象标识表征,对象标识的形式本申请实施例不做限定,只要是能够唯一表征一个对象的信息即可,如可以包括但不小于对象的联系方式、对象的账号标识等,其中,对象的账号标识可以是对象的社交账号,如对象在应用程序中的账号(例如,用户在应用程序的注册账号名称、昵称等)。为了描述方便,在后文的一些实施例描述中,可以采用一个对象的账号来表示该对象。
本申请实施例中,一个对象的相关对象数据包括该对象的交互数据,其中,相关对象数据可以是对象的交互行为数据(也可以称为社交行为数据),是指与对象的社交有关的数据,具体可以包括该对象与其他对象的交互行为有关的数据。在实际应用中,具体采用哪些社交行为数据可以根据需求配置。其中,相关对象数据可以是在对象授权的情况下获取到的该对象的社交行为数据。
在一些实施例中,一个对象的社交行为数据可以包括该对象的社交/交互信息和交易信息。其中,社交信息反映的对象的社交程度,比如,可以包括对象的社交活跃度,如该对象的好友数量、关注该对象的其他对象的数量、或者是该对象发布一条信息时对该信息进行转发、电站的对象数量等等,好友的判定标准本申请实施不做限定,如互相关注的两个对象可以互为好友。一个对象的交易信息指的是该对象与其他对象发生的交易的相关信息,交易信息可以包括但不限于支付行为信息、转账信息(包括该对象向其他对象进行支付/转账,也包括其他对象向该对象进行支付/转账)等。一个对象的交易信息具体可以包括但不限于交易时间、交易的发起方和接受方(如A向B转账,A则为发起方,B为接受方)、交易金额、交易类型(是转账,还是发送红包或其他形式等)。
步骤S120:对于每个待识别对象,基于该对象的相关对象数据,通过对象识别模型预测得到该对象的第一标签,其中,一个对象的第一标签表征了多种对象类型中该对象所属的对象类型。
其中,对象类型也可以称为风险类型,是指一个对象存在的欺诈行为的的类型。第一标签也可以称为第一风险标签,表征了基于对象的相关对象数据预测出的该对象的风险类型。
对象识别模型(也可以称为风险识别模型)是基于训练数据集预先训练好的神经网络模型。该模型的输入为对象的相关对象数据,或者是对相关对象数据进行预处理后的数据,该模型的输出为相关对象数据对应的对象类型,比如,可以将相关对象数据按照预设要求预处理成固定格式的数据,如转换成指定数据格式的向量之后输入至模型,通过模型预测得到对象的对象类型。
本申请实施例中,对象识别模型可以是一个分类模型,该分类模型可以是多分类模型,多种对象类型中的每一种对象类型对应分类模型的一个类别,通过该模型可以预测出社交行为数据对应类别,该类别表征的对象类型即为该社交行为数据所属的对象的对象类型。在实际应用中,对于模型输出的数据形式本申请实施例不做限定,如可以是一个类别的标识,也可以是一个一维的向量,该向量中元素(也就是数)的个数等于上述多个对象类型中总的类型数量,每个元素对应一个类型,各元素的元素值可以是0或1,比如,只有其中一个元素的元素值为1,其他均为0,该取值为1的元素对应的类型则为预测出的对象的类型,也就是上述第一标签。
另外,在实际实施时,上述多种对象类型可以包括多种目标类型和一种非目标类型,每种目标类型对应一种欺诈行为类型即风险类型,非目标类型对应不存在欺诈行为即非风险用户,也就是说,没有风险也可以作为一个风险类型,如果模型预测出的风险类型是没有风险,则该对象的初始识别结果认为该对象不是风险对象。例如,对象的类型存在A类型和B类型两种(即两种目标类型),那对象识别模型可以是一个三分类模型,通过该模型可以预测出一个对象是A类型、还是B类型或者无风险类型(即非目标类型)的对象。
对于对象识别模型的具体训练方式本申请实施例不做限定。模型的上述训练结束条件也可以根据应用需求配置。
本申请的实施例中,对象识别模型可以是通过以下方式训练得到的:
获取第一训练数据集,第一训练数据集包括带有标注标签的多个第二样本对象的相关对象数据、以及多个未标记的第三样本对象的相关对象数据,多个第二样本对象的真实对象类型包括多种对象类型中每种类型;
基于多个第二样本对象的相关对象数据,对初始分类模型进行训练,直至满足第一训练结束条件,得到第一分类模型;
对于每个第三样本对象,基于该第三样本对象的相关对象数据,通过第一分类模型预测得到该第三样本对象的对象类型,根据该对象类型确定该第三样本对象的标注标签;
基于多个第二样本对象的相关对象数据、以及带有标注标签的多个第三样本对象的相关对象数据,对第一分类模型继续训练,直至满足第二训练结束条件,得到对象识别模型。
由于在不同的场景下,对象表现出的交互行为特征(社交行为特征)是不同的。为了保证不同类型的对象在模型学习过程中不会相互干扰而导致判断失误,本申请的一些实施例中,在基于训练数据集训练对象识别模型时,会采用多种不同对象类型的训练数据分别进行模型训练,即对于每种对象类型,训练数据集中都包含该类型的多个样本对象的相关对象数据,通过训练使模型能够从不同对象类型的样本对象的相关对象数据中学习到不同对象类型的对象的社交行为特征。
进一步的,由于带有标注标签的样本数据的获取通常都需要人工参与,样本数据的数量通常比 较受限,考虑于此,本申请的实施例中,借助了半监督学习方式进行模型训练,即训练数据集中同时包含了带有标注标签的样本数据和不带有标注标签的样本数据,在对模型进行训练时,为了保证模型训练的准确性,在训练的第一个阶段采用带有标注标签的样本数据对模型进行迭代训练,使得训练出的模型能够满足一定的性能要求,即满足第一训练结束条件,该条件可以根据实际需求配置,比如模型的预测准确度大于设定值,此时则可以通过模型预测未标注的样本数据对应的对象类型,可以将第三样本对象的相关对象数据输入至满足上述第一训练结束条件的第一分类模型中,得到每个第三样本对象的第一标签,并将该标签作为第三样本对象的标注标签(也就是伪标签),之后则可以基于带有标注标签的样本数据和带有伪标签的样本数据对模型继续进行训练,当模型达到预期效果时,可以结束训练,得到满足应用需求的对象识别模型,通过该模型可以初步预测得到待识别对象的第一标签。
步骤S130:获取参考数据集,参考数据集中包括多个带有标注标签的第一样本对象的相关对象数据和第二标签。
其中,所述第一样本对象的标注标签表征了在多种对象类型中该对象所属的真实对象类型,一个对象的第二标签表征了该对象属于多种对象类型中的每种对象类型的概率。
为了便于理解,作为一个示例,假设多种对象类型包括5种类型,一个对象的标注标签可以表示为[1,0,0,0,0],第二标签可以表示为[p1,p2,p3,p4,p5],其中,p1至p5分别表示该对象是5种对象类型中每种类型的概率,5个概率之和等于1,而标注标签则表示了该对象的真实对象类型是5种对象类型中取值为1的元素对应的对象类型。
参考数据集可以理解为真实的样本数据集,其中包含了多个已知风险类型的对象的相关数据,包括相关对象数据、标注标签和第二标签。
本申请实施例中,对于上述每一个第一样本对象而言,其标注标签和第二标签都可以理解为该对象的真实标签,第二标签可以理解成在该样本对象的真实对象类型是标注标签对应的对象类型的情况下,该样本对象属于多种对象类型中的每种对象类型的概率分布情况。
在实际应用中,欺诈行为的实施往往会涉及到多个不同的环节,可能会涉及到多个不同的风险用户(即存在风险的用户/对象),在欺诈行为的整个生命周期中,不同的风险用户也可能会作用于同一场欺诈行为的不同环节,而不同的风险用户之间也存在社交信息、交易行为等微妙的联系。因此,一种类型的风险用户很可能会与同类型的或者不同类型的风险用户存在关联,不同风险类型的用户之间也会存在传播,会相互影响,因此,本申请实施例中,采用了标注标签和第二标签从两个不同的层面分别反映了一个用户自身的对象类型,以及在考虑了该用户与其用户之间的关联时该用户属于每种对象类型的可能性,也就是说,第二标签是在考虑了用户之间的相互影响的情况下的一个风险标签。其中,对于参考数据集的具体获取方式本申请实施例不做限定。
步骤S140:根据每个待识别对象和每个第一样本对象的相关对象数据,确定至少一个待识别对象和多个第一样本对象中各对象之间的第一关联关系。
步骤S150:根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签。
步骤S160:对于每个待识别对象,根据待识别对象的第二标签,确定待识别对象的识别结果。
其中,上述至少一个待识别对象和多个第一样本对象中各对象之间的第一关联关系,包括待识别对象之间的关联关系,以及待识别对象与第一样本对象之间的关联关系。该关联关系也可以称为社交关联关系或交互关联关系。
由于一个对象的相关对象数据中包含该对象与其他对象的交互数据,因此,可以根据两个对象的相关对象数据确定出对象之间的社交关联关系。对于关联关系的划分粒度,本申请实施例不做限定,在一些实施例中,对象之间的关联关系可以包括对象之间有关联关系或者没有关联关系,还可以进一步细分不同类型的关联关系,比如,相关对象数据可以具有多种不同类型,可以根据每种类型的相关对象数据,来确定对象之间是否有该种类型对应的关联关系。
在一些实施例中,一个对象的相关对象数据可以包括该对象的转账信息、红包(发红包或接红包)信息、该对象对应的实体信息等多种不同类型的数据,其中,实体信息是指该对象进行社交行为时所应用到的实体信息,比如,该对象的联系方式、交易账号(如银行卡号、虚拟资源账号等)。可以根据各对象的转账信息,确定对象之间是否具有该类型行为数据对应的关联关系,可以根据各对象的红包信息,确定对象之间是否具有对应的关联关系。也就是说,一种类型的行为数据可以对应一种类型的关联关系。当然,在实际应用中,也可以不对关联关系进行类型划分,可以基于对象的各种类型的相关对象数据,确定对象之间是否有关联关系,比如,两个对象的任一种类型的相关 对象数据表明两个对象之间具有关联关系,则可以确定对象之间具有关联关系。
在实际应用中,由于对象之间的社交关联关系是会对对象的属性信息造成影响的,在风险识别领域,如果一个对象A是风险对象,例如是具有欺诈行为的对象,另一个普通的对象B(不存在风险的对象)如果与对象A具有关联(比如两者之间发生过支付行为),那么对象B也可能会变成具有潜在风险的对象,即风险会由于对象之间的交互信息发生传播,考虑于此,本申请实施例提供的该方案,在确定待识别对象的识别结果时,进一步考虑对象之间的关联关系,从而可以提高对象识别的准确性和全面性。
本申请实施例提供的对象识别方法,在对未知是否存在风险的待识别对象进行识别时,同时考虑了待识别对象自身的社交行为数据和该对象与其他对象之间的社交关联关系,由于社交行为数据反映了该对象与其他对象之间的社交特征,而具有风险的对象的社交特征和不具有风险的对象的社交特征通常是不同的,属于不同风险类型的对象的社交特征通常也是不同的,因此,可以基于待识别对象的社交行为数据来初步评估该对象的风险类型。进一步的,由于一个对象与其他对象之间的社交关系会对该对象产生影响,尤其是具有风险的对象会对与其有关联关系的对象产生影响,因此,进一步考虑对象之间的社交关联关系、以及各对象自身的风险标签(即待识别对象的第一风险标签、第一样本对象的标注标签和第二风险标签),可以在基于待识别对象的社交行为数据预测出的该对象的第一风险标签的基础上,融入对象之间的相互影响,确定出待识别对象的更加准确的第二风险标签,从而基于该标签得到对象的风险评估结果。
另外,由于本申请实施例提供的该方法,可以基于参考数据集和待识别对象的相关对象数据,实现对待识别对象的自动化识别,而无需依赖于其他对象的报损,因此,可以在有需求时即可对对象的进行评估,因此,能够更好的满足实际应用中对于时效性的要求,可以提前预测出具有风险的对象,即可以预先识别,以可以基于识别结果相应预防,比如,识别出一个对象是风险对象,其他对象在与该对象进行交易时,可以进行风险提醒,防止导入欺诈陷阱,还可以对风险对象进行相应的管制,或者还可以通过人工手段对识别出的风险对象进行进一步的跟踪核实,以预先防范打击。再者,在进行风险评估时,本申请实施例的该方法,可以借助对象之间的关联关系,更加全面的实现对对象的风险评估,可有效扩展风险对象评估的覆盖范围。
其中,在得到待识别对象的第二标签之后,则可以基于该标签确定该出该对象的识别结果。其中,该识别结果可以包括该对象是否为风险对象即是否是属于目标类型的对象,在该对象是风险对象时,其对象类型是哪个或哪些类型,或者,也可以直接将第二标签作为待识别对象的识别结果,通过该标签可以得到该对象属于各个对象类型的概率。在一些实施例中,可以将第二标签中概率大于或等于设定阈值的概率对应的对象类型确定为待识别对象的对象类型,或者是将最大概率对应的对象类型确定待识别对象的对象类型,如果最大概率的对象类型为不存在风险,则可以认为该对象目前是不存在风险的对象即非目标类型的类型,当然,也可以对不存在风险的对象继续进行后期跟踪判断。
本申请的一些实施例中,上述根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签,可以包括:
将每个待识别对象的第一标签作为待识别对象的标注标签和初始的第二标签,根据每个待识别对象和第一样本对象的标注标签和第二标签,基于第一关联关系,在待识别对象和第一样本对象之间进行至少一次标签传播,得到每个待识别对象和第一样本对象更新后的第二标签;
对于每个待识别对象,根据第一关联关系,将与该待识别对象具有第一关联关系的各对象的更新后的第二标签进行融合,得到该待识别对象的第二标签。
本申请实施例中,可以采用标签传播的方式来得到待识别对象的第二标签。由于具有关联关系的对象之间会相互造成影响,如果一个对象是风险对象,那么该对象的风险类型即标签也是有可能传播给与其具有关联关系的其他对象,也就是说,与该对象具有关联关系的其他对象是风险对象的可能性相对较高。因此,可以在各对象都具有各自的标签(待识别对象的第一标签,样本对象的标注标签和第二标签)的前提下,基于对象之间的关联关系,进行至少一次标签传播,然后,对于待识别对象,可以通过融合与其具有关联关系的各对象(包括样本对象和待识别对象)的标签,得到该对象的第二标签。
标签传播算法是是一种基于图的半监督学习方法,是基于知识图谱的信息传递性,将标签信息随行为路径进行传播。其基本思路是用已标记节点的标签信息去预测未标记节点的标签信息,节点的标签按照节点间的相似度传递给其他节点。本申请一些实施例中,对标签传播算法进行了优化, 对于待识别对象,会首先基于其相关对象数据预测其第一标签,在此基础上,基于对象间的关联关系,进行对象间的风险标签的传播,即一个对象的风险标签可以传播给与其具有关联关系的其他对象。其中,对于标签传播的实施次数可以根据应用需求配置。
其中,每次标签传播包括以下操作:
对于所述待识别对象和所述第一样本对象中的每个对象,根据所述第一关联关系,基于与该对象具有关联关系的各对象的第二标签,对该对象的第二标签进行更新;
对于所述每个对象,将该对象的更新后的第二标签和该对象的标注标签进行融合,得到该对象更新后的第五标签,将该对象的第五标签作为下一次标签传播时该对象的第二标签。
假设标签传播次数为1次,对于上述至少一个待识别对象和多个第一样本对象中的每个对象,可以根据与其具有关联关系的各对象的第二标签实现自身的第二标签的更新,如可以将与其具有关联关系的各对象的第二标签进行融合(如相加后再做标准化处理),得到更新后的标签,再将该更新后的标签与其所属的对象类型的标签(例如,第一风险标签/标注标签)进行融合得到该对象的融合后的标签,也就是此次标签传播更新后的第五标签。然后,对于每个待识别对象,通过将与其具有关联关系的各对象的融合后的第五标签再次进行融合,得到该对象的第二标签。
如果标签传播的次数大于1次,则可以基于上一次得到的各个对象(包括待识别对象和第一样本对象)的第二标签,再次执行上述操作,将最后一次传播得到的待识别对象的第二标签作为最终的第二标签。
本申请的实施例中,相关对象数据包括至少一种类型的相关对象数据,第一关联关系包括与每种类型的相关对象数据对应的该类型的关联关系;
相应的,上述根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签,包括:
获取每种类型的关联关系对应的权重;
根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、每种类型的关联关系、以及每种类型的关联关系对应的权重,确定每个待识别对象的第二标签。
在本申请实施例中,可以按照相关对象数据的类型,分别确定每种类型的相关对象数据对应的关联关系,从而更加细粒度的衡量一个对象在各种社交行为中与其他对象的是否具有关联关系,以更加准确全面的表征出一个对象的社交关联关系。其中,上述指定类型具体包括哪个或哪些类型,可以根据需求配置,本申请实施例不做限定,比如,相关对象数据可以包括多种类型的行为数据,指定类型可以是这多种类型中的一种或多种。对于相关对象数据的类型的具体划分方式本申请实施例也不做限定,可以根据实际需求和应用场景设置各数据类型的划分规则。
而在实际应用中,由于不同类型的关联关系的影响程度是不同的,因此,为了更加准确的评估对象之间的关联关系,每种类型的关联关系具有各自对应的权重,从而使得具有不同影响能力的关联关系风险对象评估中起到不同的影响作用,进一步提升了对象识别的准确性。
本申请的实施例中,该方法还可以包括:
对于至少一个待识别对象和多个第一样本对象中的每个对象,根据该对象的相关对象数据,确定该对象的影响力;
相应的,上述根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签,包括:
根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、每个待识别对象和第一样本对象的影响力、以及第一关联关系,确定每个待识别对象的第二标签。
其中,一个对象的影响力是指该对象对其他对象的影响能力的大小,从一个层面上表征了该对象的社交能力。在实际应用中,由于不同的对象的影响力通常是有差异的,比如,相关对象数据包括转账信息,一个向30个以上的账户转账的用户与一个向2个账户转账的用户显然具有显著的影响力差异。而不同的影响力的对象的标签可以对其他对象造成影响的可能性也就不同,因此,为了更加准确的评估出待识别对象的第二标签,本申请实施例还进一步考虑了每个对象的影响力。
在一些实施例中,在基于标签传播确定待识别对象的第二标签时,可以在每次标签传播过程中,采用每个对象的影响力对其标签进行加权。例如,如果是进行一次标签传播,对于待识别对象和第一样本对象中的每个对象,可以用该对象的影响力对其第二标签(对于待识别对象是其初始的第二标签,也就是第一风签)进行加权,然后基于加权后的标签进行一次标签传播。如果是进行多次标签传播,则可以在每次进行标签传播前,对上一次得到传播得到的对象的第二标签进行加权。
在一些实施例中,一个对象的相关对象数据包括至少一种类型的相关对象数据,上述第一关联 关系包括与每种类型的相关对象数据分别对应的该类型的关联关系,上述至少一个待识别对象和多个第一样本对象中的每个对象的影响力包括每个对象对应于每种类型的关联关系的影响力。
也就是说,在对相关对象数据进行分类处理时,可以按照相关对象数据的类型,分别确定每种类型的相关对象数据对应的影响力,从而更加细粒度的衡量一个对象在各种社交行为中的影响力,以更加准确全面的表征出一个对象的影响力。
在一些实施例中,对于每个对象,可以通过融合该对象对应于各个类型的影响力,得到该对象的最终的影响力,比如,可以将各类型对应的影响力相乘。
本申请的一些实施例中,该方法还可以包括:
根据每个待识别对象的第一标签和每个第一样本对象的标注标签,确定在至少一个待识别对象和多个第一样本对象中每种对象类型的对象数量占比,所述对象数量占比包括每种对象类型的对象的数量与所述至少一个待识别对象和多个第一样本对象的总数量的比值;
相应的,上述根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签,包括:
将每种对象类型的对象数量占比作为权重,对至少一个待识别对象中相应对象类型的第一标签进行加权,并对多个第一样本对象中相应对象类型的标注标签进行加权;
根据每个待识别对象的加权后的第一标签、每个第一样本对象的加权后的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签。
对于上述待识别对象和第二样本对象,每个对象都有各自对应的对象类型,即待识别对象的第一标签、第二样本对象的标注标签。由于不同对象类型下的对象的量级通常是有所差异的,对于某个对象类型而言,如果属于该对象类型的对象数量的量级越大,那么该对象类型的标签传播给待识别对象的可能性也就会越大。因此,本申请一些实施例,在确定待识别对象的第二标签时,进一步考虑了每种对象类型的对象数量占比,并根据该占比对对应对象类型的对象标签(待识别对象的第一标签、第二样本对象的标注标签)进行加权,从而使得对象标签的影响能力与对应对象类型的对象数量成正相关,更加符合实际情况,以更加准确的预估待识别对象的第二标签。
在一些实施例中,在基于标签传播的处理方式中,可以在每次执行标签传播时,都可以按照每种对象类型的对象数量对对应对象类型的待识别对象和第一样本对象的对象标签进行加权。
本申请的实施例中,参考数据集可以是通过以下方式获取到的:
获取第二训练数据集,第二训练数据集包括带有标注标签的多个第一样本对象的相关对象数据;
根据每个第一样本对象的相关对象数据,确定第二训练数据集各第一样本对象之间的第二关联关系;
将每个第一样本对象的标注标签作为第一样本该对象初始的第三标签,重复执行以下操作,直至多个第一样本对象更新后的第三标签满足预设条件,将满足预设条件时的每个第一样本对象的第三标签确定为该第一样本对象的第二标签:
基于第二关联关系以及各第一样本对象的第三标签,通过在多个第一样本对象之间进行标签传播,得到每个第一样本对象更新后的第四标签;并对于每个第一样本对象,根据第二关联关系,通过融合与该第一样本对象具有关联关系的各第一样本对象的第四标签,得到该第一样本对象新的第三标签。
由前文的描述可知,不同对象之间的标签是会传播的,如果对象之间发生过社交行为,尤其是一些与欺诈行为有关的特定类型的社交行为,比如转账、支付等行为,那么对象的风险标签是很有可能会传播给与其有交互的对象的。为了更好的学习到不同对象的标签之间的传播影响情况,以用于预测待识别对象的第二标签,本申请的一些实施例,基于带有标注标签的大量样本对象,考虑到对象之间的相互影响(即对象之间的关联关系、以及样本对象的标注标签),采用在对象之间进行标签传播的方式,实现对对象的标签的更新,直至在满足预设条件时,基于标签传播的结果,得到每个对象最终更新后的标签,将该标签作为样本对象的第二标签,由于该标签是在已知对象的标注标签的前提下,融合了不同对象之间的标签传播影响的情况下的更新标签,因此,可以基于这些样本对象的标注标签和第二标签,在已经预测得到待识别对象的第一标签(可以理解为待识别对象的初步的标注标签)的情况下,基于待识别对象和这些样本对象之间的关联关系,进行对象之间的标签传播,进一步确定出待识别对象的第二标签。其中,对于每次标签传播的具体操作,可以参考前文中的对应描述,在此不再说明。
本申请的一些实施例中,在每进行一次标签传播后,该方法还包括:
获取新增数据,新增数据包括带有标注标签的至少一个第四样本对象的相关对象数据;
将新增数据中的每个第四样本对象作为所述第二训练数据集中新增的第一样本对象以更新第二训练数据集;
根据更新后的第二训练数据集中每个第一样本对象的相关对象数据,确定更新后的第二训练数据集中各第一样本对象之间的第二关联关系,得到更新后的第二关联关系;
相应的,上述基于第二关联关系以及各第一样本对象的第三标签,通过在多个第一样本对象之间进行标签传播,得到每个第一样本对象更新后的第四标签,包括:
将每个新增的第一样本对象的标注标签作为该第一样本对象的第三标签,基于更新后的第二关联关系以及更新后的各第一样本对象的第三标签,通过在更新后的多个第一样本对象之间进行标签传播,得到更新后的每个第一样本对象的第四标签。
为了提升学习的泛化能力,在学习样本对象之间的标签传播影响时,可以在每进行一次标签传播之后,通过加入新的样本数据即新增数据来更新训练数据集,增加了样本数据的数量,融入了更多对象之间的关联关系,从而使学习得到的样本对象的风险标签的结果更具有通用性。
本申请的一些实施例中,上述新增数据中各样本对象的标注标签是通过以下方式获取到的:
获取至少一个未标注的第四样本对象的相关对象数据;
对于至少一个未标注的第四样本对象中每个第四样本对象,基于该第四样本对象的相关对象数据,通过对象识别模型预测得到该第四样本对象的第一标签,将该第四样本对象的第一标签作为该第四样本对象的标注标签。
在实际应用中,对于新增数据,可以是由人工标注的样本对象的相关对象数据,可以是对象举报的风险对象的社交行为数据。考虑到人工成本及新增数据的数据量,本申请的一些实施例中,新增数据的标注标签可以是通过训练好的对象识别模型预测得到的第一标签,将该标签作为标注标签。
本申请的一些实施例中,该方法还可以包括:
根据多个第一样本对象的相关对象数据,确定多个第一样本对象中的相似对象对;
其中,满足预设条件包括损失函数的值满足设定条件;
损失函数包括第一损失函数和第二损失函数,对于每次标签传播,第一损失函数的值表征了各第一样本对象的标注标签和新的第三标签之间的差异,第二损失函数的值表征了各相似对象对的新的第三风险之间的差异。
在一些实施例中,通过第一损失函数可以约束样本对象每次更新后的标签与其标注标签的差异尽量接近,通过第二损失函数可以约束相似样本对象之间的更新后的标签之间尽量相似,采用该方案,可以使得标签传播学习具有很好的准确性和泛化能力,更好的满足应用需求。在一些实施例中,在确定相似对象对时,可以根据对象的相关对象数据中特定类型的相关对象数据来确定两个对象是否相似,如两个对象的特定类型的相关对象数据之间的相似度大于设置值,则可以认为这两个对象为相似对象对。其中,特征类型具体是哪种或者哪几种本申请实施例不做限定,可以根据实际需求配置,比如可以是对象的转账数据。
本申请实施例提供的方案,在对待识别对象进行识别时,同时考虑了待识别对象自身的相关对象数据和该对象与其他对象之间的关联关系,由于一个对象的相关对象数据反映了该对象的特征,而不同对象类型的对象的特征通常是不同的,因此,可以基于待识别对象的相关对象数据来初步评估该对象的对象类型。而一个对象与其他对象之间的关联关系会对该对象产生影响,因此,本申请实施例的方法,进一步考虑对象之间的关联关系、以及各对象自身的标签(即待识别对象的第一标签、第一样本对象的标注标签和第二标签),可以在基于待识别对象的相关对象数据预测出的该对象的第一标签的基础上,融入对象之间的相互影响,从而得到更加准确的识别结果。此外,由于本申请的该方法,无需依赖对象的投诉、报损,可以实现对象的提前预防识别,更好的满足了时效性的要求,尤其是风险识别领域对于时效性的要求。
本申请实施例提供的对象识别方法,还提出了通过用户(即对象)标签建设与传播,构建出用户风险体系(用户识别体系),进而可以应用于提前识别欺诈风险,即可以识别出具有风险的用户以及用户的风险类型。
本申请提供的方法可以应用在移动支付领域,在该领域,相关技术中对商业欺诈和社交欺诈的风险识别往往被割裂开来,但通过大量的打击案例中发现,黑产的账号(即具有风险的用户/商户,可以称为风险用户)在商业欺诈与社交欺诈的链路中都扮演着不可忽视的角色。其中主要承担的任 务包含但不限于社交引流,养号、引导交易、资金转移(也就是多种目标类型、对象的风险类型)等。基于本申请实施例提供的方法,可以分别从不同的场景下识别风险用户,然后利用标签传播算法进行风险用户的扩散,构建出用户风险体系,并将用户风险体系应用于欺诈风险的识别,为挖掘可疑黑产提供新路径。
为了更好的理解和说明本申请提供的方案,下面结合移动支付场景对本申请的一种具体可选实施方式进行说明。
为了便于理解,首先对非法产业涉及的多个环节进行介绍,在非法欺诈的整个过程中,往往需要依赖黑产账号(也可以称为非法账号/风险账号,也就是风险用户/商户的账号,代表风险用户)实现引流、养号、引导交易和资金转移等多个环节(每个环节对应一种目标类型),具体表现形式在不同环节有着如下不同的特点:
1)引流:如图2a所示,引流是非法产业寻找欺诈目标的主要手段。风险账号通常借助大型互联网平台发布多种多样的极具吸引力的信息即诱导消息,并将这些消息扩散给一般用户。一旦吸引到用户询问详细信息,则开始采用设计好的骗局和话术实施诈骗。此类账号往往专用于“钓鱼”,一旦诈骗成功即注销账号,因此其社交信息(即账号对应的相关对象数据)与正常的社交账号具有显著差异。
2)养号:如图2b所示,养号行为往往发生在风险商户注册初期,为了营造出商户经营状况良好的假象,或为后期资金转移预留资金,又或是避开风控的监管,非法产业往往会预先在商户上进行多笔支付。这些交易往往由单一账号完成,少笔大额或小额多笔,交易凭证均不可查,有些场景中这些交易也可能是由多个账号完成,即多人养号。
3)引导交易:如图2c所示,引导交易的行为往往发生在某些特定的场景中,黑产在引导用户支付的同时会一同进行向风险商户支付,隐藏在正常用户之中,但交易频率和金额均高于一般用户,即风险账号会通过参与到交易(引导交易)中的方式,引导一般用户也进行交易(被骗交易)。
4)资金转移:包括洗钱(是一种将非法所得合法化的行为)如图2d所示,由于非法产业往往同时运营多个商户,因而提现的资金将同时流入到其他风险商户或其他风险账号手中;而当风险商户被处罚,非法产业为了保证资金不被冻结,可能将养号环节预留的资金通过退款的形式进行回收,如图2d中所示,一个风险商户通过退款的形式将资金退回到对应的账号(图中所示的风险账号)中,这些账号可以通过转账给其他账号/商户进行资金转移(图中的省略号和箭头标识账号/商户还可以进一步转移资金),实现非法所得的转移。
下面结合上述所列举的包括多个环节的欺诈场景,对本申请实施例提供的方法进行说明。
图3示出了本申请实施例所适用的一种对象识别系统的结构示意图,图4示出了该场景下的对象识别方法的实施流程示意图。如图3所示,该系统可以包括服务器10和多个终端设备(图中仅示出了终端设备21和终端设备22),终端设备可以通过网络与服务器10通信,服务器10侧的样本对象库11中存储有大量带有标注标签的第一样本对象的相关对象数据,也就是样本用户的相关对象数据,也就是说样本对象库11中存储有参考数据集。终端设备21和终端设备22可以是待识别对象A和待识别对象B的终端设备。在一些实施例中,服务器10可以是具有移动支付功能和用户间交互功能的应用程序的应用服务器,终端设备的用户即对象可以通过该应用程序进行交互,比如相互发送信息、加好友等等,还可以通过应用程序进行交易、进行移动支付。服务器10在用户授权的情况下,可以获取到用户的用户相关信息,通过执行本申请实施例提供的方法,实现对用户的风险识别。
如图4中所示,该方法的实施流程可以包括如下步骤S1至步骤S5。
步骤S1:基于训练数据集训练得到对象识别模型。
如图2a至图2d所示,黑产(也可以称为:黑色产业/非法产业/恶意产业)在整个生命周期中都有风险账号(代表了黑产用户即风险用户)贯穿其中。而在不同场景下,风险账号表现出不同的特征。为了保证不同类型的黑产用户在模型学习过程中不会相互干扰,导致判别失误,本申请实施例中,可以根据风险账号的不同类型(也就是不同的对象类型),分别进行模型训练。其中,模型的训练可以是由服务器10完成,也可以是由其他电子设备完成,服务器10通过调用训练好的对象识别模型进行对象的风险类型预测。该实施例中以通过训练设备30执行模型的训练步骤为例进行说明。
在本方案中借助半监督学习进行模型训练,具体实现操作流程如下:
1.模型分组:也就是对象的类型划分,即将风险账号划分为多种风险类型的风险账号,首先根据非法产业的生命周期,对不同类型的风险用户(也就是风险账号)进行分组。例如,负责资金转 移的风险账号需要在资金的流入与流出实现闭环,因此,与养号的风险账号特征有所相似,但在不同的时间窗口具有行为差异,即养号的风险账号通常出现在前期。因此可以借助时间窗口,将两类风险用户进行区分,进行模型训练。类似的,引流类型的风险账号、引导支付的风险账号也分别进行模型训练,当然,在模型训练时训练数据集中还包括非风险账号,也就是非目标类型的用户。
该步骤可以由人工完成或者根据设定的划分规则由电子设备完成。通过该步骤,可以按照不同类型的账号的特征不同,将账号按照风险类型进行分组,并进行标记,以基于标记好的这些账号的相关对象数据对进行分类模型进行训练,得到对象识别模型。
2.样本获取:即第二训练数据集(图3中所示的训练数据集12)的构建
该步骤使用已经标记风险类型(即带有标注标签)的风险账号和正常账号(即没有风险的账号,也就是没有风险的样本对象),作为模型学习的目标。将这些账号(即第二样本对象)的相关对象数据(也就是该账号与其他账号的交互信息,如社交信息、支付行为信息等)作为模型识别的特征变量。
例如,支付行为信息指的是与支付/交易有关的交互信息,可以包括该账户向其他账户付款,也可以包括其他账号向该账号付款等。社交信息则是除支付行为信息之外的交互信息,比如,该账号的好友信息/好友度、活跃度等。
在实际场景中,风险账号基本通过聊天、发布虚拟信息等方式来诱导用户进行交易,风险账号的相关对象数据与正常的社交账号的相关对象数据会有显著差异,而不同类型的风险账号的相关对象数据之间也会表现出不同的特征,因此,可以通过已经标记的风险账号和正常账号的相关对象数据作为训练模型的样本数据,对模型进行训练。
其中,样本数据还可以包括多个未知风险类型的账号(对应前文中第三样本对象)的社交行为数据。
3.模型训练:即使用上述样本数据进行模型训练,在训练满足一定条件时,将该模型(即前文中的第一分类模型)用于对未知风险类型的账号进行标记,由此可以得到被标记的未知风险类型的账号,即伪标签。
训练时,模型的输入为账号的相关对象数据或者是经过预处理后的相关对象数据,模型的输出为预测得到的账号的风险类型,也就是第一标签。
4.模型检验:将伪标签与已标记的样本一同训练,当模型达到预期效果时,停止训练,得到对象识别模型。
图5中示出了本申请实施例提供的一种在一些实施例中模型训练方法的原理示意图,如图5中所示,标记样本即为带有标注标签的样本数据,即带有标注标签的风险账号的相关对象数据和正常账号(其标注标签表示没有风险)的相关对象数据,未标记样本表示上述未知风险类型的风险账号的相关对象数据,机器学习模型即为要训练的对象识别模型,由图中可以看出,标记样本包括多种风险类型(图中所示的类别1、类别2、…)的样本数据。
在对模型训练时,首先采用标记样本进行重复训练,直至满足第一训练结束条件(比如,预设的一个或多个训练指标满足一定条件),得到第一分类模型,之后,通过该模型对未标记样本进行标签预测,具体的,可以将未标记样本的相关对象数据输入至模型中,得到预测的第一标签,将该第一标签作为未标记样本的伪标签,得到伪标签样本。之后,基于已标记的样本数据和这些带有伪标签的样本数据对模型继续进行迭代训练,直至模型的效果达到预期,比如模型的损失函数收敛,得到训练好的对象识别模型。
步骤S2:基于标签传播构建参考数据集。
同样的,该步骤可以由服务器10执行,也可以由其他电子设备执行,将构建的参考数据集提供给服务器10使用。该实施例中以参考数据集的构建同样由训练设备30完成为例。
通过半监督学习(即风险识别模型)进行用户识别的方式有助于解决发现用户风险的时效性问题。但在用户风险标签识别的过程中,为了保障模型训练的准确性,不同类型的风险用户是彼此分离进行标注的,这会限制对风险用户体系的扩展。此外,黑产在使用非法账号进行运营的过程中,行为特征也会不断变异。因此,仅借助模型进行用户风险识别的方法,不利于用户风险体系的长期运营。基于此,该步骤中可以基于知识图谱的信息传递性,使得用户风险标签进行扩散。
在前文中描述了风险账号在黑产的整个生命周期中扮演不同角色,而基于用户社交、支付行为等特征的不同,可以借助半监督学习对不同类型的用户进行标记。对于已标记的用户,即带有标注标签的用户,可以基于用户间的关联关系,如实体关联、资金流动(如转账、发红包等)将用户的风险标签传播出去。
如图6所示的示意图中,图6中每个节点代表一个用户,该图中示出了第一目标类型的用户(如养号类的风险用户)、第二目标类型的用户(如引流类的风险用户)和第三目标类型的用户(如引导交易类的风险用户)这三种已知风险类型的用户,以及一些未知风险类型的用户(未知用户),用户之间有可能会存在关联(可以根据用户的社交行为数据确定关联关系),而具有关联关系的用户之间的风险标签是可以传递的,如图6中所示,已知风险类型的用户的风险标签会将其风险标签传递给与其具有关联的未知用户,已知风险类型的具有关联关系的用户之间也会产生标签传递。
图7a至图7c示意性的示出了几种风险标签传播的示例,其中,图7a为单向风险标签传播的示例,A目标类型的用户(如养号类风险用户)与未知用户之间如果发生过资金转移(如转账交易),该风险用户的风险标签(A目标类型标签,如养号标签)会传递该未知用户。图7b为多类型风险标签环形传播的示例,A目标类型的用户与B目标类型的用户(如资金转移类的风险用户)如果进行过资金转移,二者的风险标签会相互进行传递。与此同时,二者也有可能与未知用户之间发生标签传递。这种情况下很可能出现标签传导的闭环,即无穷无尽的传播,此时基于损失函数跳出循环。图7c为多源风险标签传播的示例,一个未知用户的风险标签可能通过不仅一条路径获取,不同风险类型的风险用户(图中所示的A目标类型的用户和B目标类型的用户)可能都与同一个未知用户发生关联,这些风险用户的标签信息也都将传递给未知用户。
可见,具有关联关系的用户之间的标签都是可以通过标签传播相互影响的。因此,为了更加全面、准确的评估一个用户的风险结果,需要考虑这些因素。
标签传播可以通过多轮迭代的方式,按照用户间的关联关系进行标签传播。其中,关联关系可划分为多种类型的关联关系,比如,可以将对象的关联关系划分资源赠送关联关系例如红包关联关系、资源转移关联关系例如转账关联关系和实体关联关系三种类型,红包关联关系和转账关联关系都是按照资源或资金的流动进行的划分,如果两个用户(即账号)之间进行过红包的发送或接收行为,则认为两者之间具有红包关联关系,如果两个用户(即账号)之间进行过转账(包括支付转账或其他转账方式),则认为两者之间具有转账关联关系。实体关联则是如果两个用户都与同一实体(如都使用过同一个联系方式)具有关联,则认为两个之间具有实体关联。
可以理解的是,上述关联关系的说明只是示例,在实际应用中,不同的应用场景中可以根据需求配置不同的划分方式。
标签传播算法的实现流程如下:
初始化:y=f(0),ln(f)=Loss(0)(初始化时的损失函数)
当Loss减少:
标签传播:由第n-1轮的传播结果f(n-1)和用户关联关系R,得到第n轮的传播结果f(n)
结果汇总:基于第n轮的传播结果f(n)汇总成p(n)
损失计算:基于结果集p(n)计算Loss(n)
输出:当Loss最小时的结果p
其中,f(0)表示初始化阶段各第一样本对象的标注标签,f(n)表示经过n轮标签传播得到的各第一样本对象更新后的标签,用户关联关系R即为前文中的第二关联关系,结果汇总则是指对于每个样本对象,通过融合与该对象具有关联关系的各对象的更新后的标签,得到该对象对应的融合后的风险标签p(n)的步骤,在下一轮标签传播时,则是基于每个样本对象对应的融合后的标签和样本对象间的关联关系进行,直至损失函数满足设定条件,如达到最小,也就是损失函数的值不再减小时,完成迭代,将损失函数的值最小时所对应的各样本对象的融合后的标签作为各样本对象的第二标签。
下面结合具体实施流程详细介绍标签算法的具体实现方式,上述提及到的各个参数的含义也会在下面进行解释:
1.标签传播算法中用于判断多轮迭代是否结束的损失函数可以表示如下:
Figure PCTCN2022114765-appb-000001
其中,
Figure PCTCN2022114765-appb-000002
为第一损失函数,
Figure PCTCN2022114765-appb-000003
为第二损失函数,α和β为预设的损失函数权重。
损失函数中各个参数的具体含义如下:
1)集合I表示所有已标注标签用户的集合,也就是第一样本对象的数量,S表示集合I中所有相似用户的集合,也就是相似对象对的集合。
y i是第i个用户/账号的标注标签;
Figure PCTCN2022114765-appb-000004
是通过标签传播算法预测得到的第i个用户的预测标签(即上述融合后的标签)。假设对象类型即风险类型共4个类型,y i
Figure PCTCN2022114765-appb-000005
都可以是一个一维向量,向量共有4个元素值,y i中用户的标签对应的元素值的值为1,其他3个值都是0,
Figure PCTCN2022114765-appb-000006
则是4个概率值,分别代表用户当前次标签传播后用户属于各个风险类型的概率。
2)σ i表示第i个风险标签的重要性程度,即第i个已标注标签用户的重要程度。用户的重要程度可以根据其用户相关数据确定,具体计算方式不做限定。例如在资金转移过程中,当风险用户转移资金的额度更大时,可以认为该风险信息的有效性更强,该用户的重要程度则较大。
w a,b表示两个用户a,b(任一相似对象对)之间的相似性,在一些实施例中,可以采用资金关联账号重合度来表示:
Figure PCTCN2022114765-appb-000007
也就是资金往来账号的交集数(这两个用户之间的资金往来次数)/资金往来账号的并集数(这两个用户与所有用户之间进行过资金往来的总次数),即优先关注资金往来账号重合度高的用户关系对。也就是当两个用户往来的账号重合度越高时,则两个用户的风险类型大概率相同。
3)
Figure PCTCN2022114765-appb-000008
表示第n轮标签传播,账号i的预测用户向量(即预测标签)与该账号标注标签的余弦距离;
Figure PCTCN2022114765-appb-000009
则表示第n轮标签传播,账号a,b的预测用户向量之间的余弦距离。其中,
Figure PCTCN2022114765-appb-000010
表示用户i的预测用户向量,
Figure PCTCN2022114765-appb-000011
分别表示用户a和用户b在第n轮的预测用户向量(也就是下一次传播时用户的第二标签)。
2.标签传播的表达式可以表示为:
Figure PCTCN2022114765-appb-000012
该表达式中各个参数的含义如下:
1)集合R表示用户间关联关系的集合,例如R={红包、转账、实体},关联关系的类型有这三种类型,r表示其中一种关联类型;
2)α r表示关联类型r的影响因子(也就是每种类型的关联关系的权重)。由于不同关联类型的影响程度是不同的,拥有实体关联的用户量少,红包与转账则在资金限制额度上有较大的差异,影响因子则用于调节不同关联类型的组合权重。每种关联类型的影响因子的取值可以根据需求或者经验设置。比如,实体关联类型的因子取值较大,转账关联的因子可以大于红包关联的因子。
3)P r表示关联类型r的影响力矩阵(也就是对象对应于每种类型的关联关系的影响力),一个向30个以上账户转账的用户与一个向2个账户转账的用户显然具有显著的影响力差异。通过影响力矩阵刻画用户的影响力权重,例如,将用户关联的账号数标准化,从而得到该用户的影响力权重。
假设集合I中共有N个用户,P r则可以表示为一个具有N个元素值的向量,比如该向量的行数N,列数为1,每一行的元素值代表一个用户对应于该中类型的关联关系的影响力大小,也就是在对应类型的社交行为中该用户的影响力。
4)Q r表示标签传播的路径。
假设用户关系网络即集合I中有N个节点,则矩阵Q r有N×N维。若账号i向10个账户进行了转账,则Q r中对应账号i行的10个转账账户列的值均为0.1,其他列均为0,值为0的元素对应的账号表示该账号与账号i没有关联关系,值为非0的元素对应的账号表示该账号与账号i有关联关系,且元素的取值表征了关联性的大小,也就是计算时所使用的代表关联关系的数值。
如果关联类型r是实体关联,假设账号i和5个账号有实体关联,对应的取值作为0.2,其他均为0。
5)f(n)表示第n轮标签传播结果,第n+1轮的结果通过第n轮结果的传播以及标记用户标签的加入,也就是新增数据的加入。
比如,在一次标签传播时,集合I中的用户数量为N,在得到该次传播的传播结果之后,如果新增的样本对象的数量为M,则下一轮标签传播时集合I中的用户数量则为N+M。
6)W y表示风险类型的权重(也就是各个风险类型的样本对象在集合I中的数量占比),由于不同风险类型下的账号量级有所差异,因此需要借助权重进行标准化处理。y表示已标注用户矩阵, 也就是集合I中每个样本对象的标注标签。
也就是说,可以根据每种风险类型的带标签的用户数量,为不同风险类型计算一个归一化的权重,比如,共4种风险类型,每种风险类型的带标注标签的用户数量分为a1、a2,a3、a4,则第i种风险类型的权重可以表示为:
ai/(a1+a2+a3+a4)。
Y也就是集合I中所有用户的标注标签矩阵,假设第一轮标签传播时共有N个带标注标签的用户,共有4种风险类型,则该矩阵可以是一个N行4列的矩阵,每一行为一个用户的标注标签,每一行的元素值有1个取值是1,其他3个为0,取值为1元素值对应的风险类型即是该样本对象的真实对象类型。假设第2轮标签传播时带标注标签的用户为N+m,则Y可以是一个N+m行4列的矩阵。
基于上述标签传播公式,可以通过多轮迭代,对集合I中各用户的标签不断进行更新。
假设经过n轮的标签传播,得到传播结果f(n)。对于集合I中一个账号x,通过n轮与其所有关联账户A的标签传播后,其对应的结果向量(预测标签)可以表示如下:
Figure PCTCN2022114765-appb-000013
其中,σ表示标准化函数,如可以选用softmax函数,a表示与用户x具有关联关系的一个用户。由该表达式可以看出,可以通过将与用户x具有关联关系的所有用户的更新后的标签融合并进行归一化处理,得到用户x的第二风险标签。一个用户的所有关联账户即关联用户,为矩阵Q r中该用户对应的行中非零值对应的用户。
具体的,在迭代过程中,每一次迭代得到对应的结果f(n),假设共有N个标签用户,共4种风险类型,向量f(n)可以是一个N行4列(或者4行N列)的矩阵,第i行的4个值(可以简称为用户向量)分别表示第i个用户属于4种风险类型的概率。得到f(n)后,对于第i个用户,根据与其有关联的各个用户的用户向量进行求和然后标准化处理,得到第i个用户的预测向量,也就是计算此次迭代对应的损失函数要用的
Figure PCTCN2022114765-appb-000014
假设用户i有3个关联用户,则将这3个用户的用户向量叠加后在进行标准化处理。
通过不断的迭代更新,在Loss不再减少时得到的各个用户的用户向量作为这些已标注标签的用户的最终风险标签(即第二标签)。也就是后续应用于预测待识别对象的识别结果时参考数据集中样本对象的第二标签,假设最后一轮迭代共有标签用户5千个,最后会得到5千个用户的用户向量p(n),那这5千个用户的标注标签、第二标签、以及相关对象数据则可以作为参考数据集。
步骤S3:服务器10获取待识别用户的相关对象数据即用户相关数据。
步骤S4:服务器10调用对象识别模型预测待识别用户的第一标签。
具体的,将每个待识别用户的相关对象数据输入至对象用户识别模型,通过模型预测得到每个待识别用户的初始风险标签即第一标签,也就是通过模型初步判断出的该用户属于哪一种风险类型的用户。
步骤S5:服务器10基于参考数据集确定待识别用户的第二标签。
服务器10基于参考数据集和待识别用户的相关对象数据,预测每个待识别用户的最终风险标签即第二标签,根据最终风险标签确定待识别用户的识别结果。该步骤可以包括:
a.确定每个待识别用户与其他用户(包括其他待识别用户和样本对象)之间的多个类型的关联关系,包括但不限于上述实体关联关系、资源赠送关联关系例如红包关联关系、资源转移关联关系例如转账关联关系等。
b.根据下述标签传播公式和步骤S32得到的每个待识别用户的第一风险标签,通过至少一次标签传播,得到每个待识别用户的第二标签:
Figure PCTCN2022114765-appb-000015
作为一个示例,假设待识别用户的用户数量为M,样本用户的数量为N,在识别阶段,用户关系网络中的节点数量(即用户数量)则为M+N。
此时,对于标签传播公式中的上述各参数,α r表示关联类型r的影响因子,每种类型的关联关系对应的影响因子可以根据实际需求或实验值预先设置,可以与前文迭代阶段的α r相同。
对于影响力矩阵P r,对于M+N个用户中的每个用户,可以根据该用户与其他用户之间的每种类型的关联关系,可以确定出该用户对应于每种类型的关联关系的影响因子(也就是影响力或影响力权重)。同样的,可以根据每个用户与其他用户之间的关联关系,确定出该用户在标签传播中的传播路径Q r
例如,以关系类型r为例,对于M+N个用户而言,可以得到影响力矩阵P r,该矩阵中有M+N个值,表示这M+N个用户各自的影响力权重。Q r为(N+M)×(N+M)维的矩阵。
W y是风险类型的权重,其取值与迭代阶段相同。应用阶段的Y则为N+M个用户的初始风险标签,对于待识别用户,初始风险标签是通过对象识别模型预测得到的第一标签,对于样本用户,则是该样本用户的标注标签。
该应用阶段,在第一轮标签传播时,f(n)各个样本用户的第二标签,即N个样本用户的第二标签(即最后一轮迭代的
Figure PCTCN2022114765-appb-000016
)和M个待识别用户的第一标签。
根据上述标签传播公式,计算此时的f(n+1),f(n+1)是一个(N+M)×k的矩阵,k表示风险类型的类型数,如4种,如果是只进行一次标签传播,根据f(n+1),通过
Figure PCTCN2022114765-appb-000017
可以计算得到每个待识别用户的最终结果向量。也就是每个待识别用户的第二标签,该向量包括k个概率值,可以将其中最大概率值或者超过阈值的概率值对应的风险类型确定为待识别用户的风险类型。如果是进行多次标签传播,在进行第二次标签传播时,将第一次标签传播得到的各个用户(包括待识别用户和样本用户)的结果向量作为此次传播的f(n)的初始值,基于标签传播公式再次进行标签更新,重复该操作,直至传播次数到达设定次数(即预先设定的传播的最大次数),将最后一次得到的待识别用户的结果向量作为待识别用户的第二标签。
可以理解的是,在实际实施时,为了避免无限循环,在每进行一次标签传播计算该次传播对应的结果向量时,各个用户的结果向量应该是逐个计算的,逐个计算的顺序不做限定,但是对于一个用户,已经计算得到其对应的结果向量之后,不会再因为与其有关联关系的各个用户的结果向量再次变化而再次计算。
另外,在实际应用中,还可以不断收集各种类型的新的风险用户的社交行为数据,即可以不断的更新、扩充训练数据集12,定期或者在更新的数据量达到一定数目时,对风险识别模型再次进行更新训练,以进一步提升模型性能。同样的,对于样本对象库11中的数据也可以进行更新,以扩展样本用户的数据量。
本申请实施例提供的方法,首次基于非法产业的生命周期进行拆解,对不同类型的风险账号各自进行模型识别标注,再基于不同类型风险用户之间的关联,创新性地采用了基于用户关联关系的标签传播算法,实现用户风险标签的传播,完善了风险用户体系,基于该方法,不仅刻画出不同风险类型的用户画像,同时保证了风险用户标签的长期运维,可以更好地应用在风险用户的策略打击中,为提前识别风险用户提供了新思路。与相关技术中的方式相比,本申请实施例提供的方案至少具有以下好处:
1)可以提升风险用户发现的时效性。
对于非法产业进行欺诈行为的各个可能的阶段,任何一个阶段通过对风险用户的相似性分析即关联性分析,都可以实现对用户的风险识别,不必仅仅依赖于客诉等滞后的信息。由此可以借助不同风险类型下的用户在不同场景下进行欺诈交易的预先识别和策略打击,更好地适用于不同的欺诈场景和打击手段,可以提升策略识别欺诈行为的时效性和识别欺诈行为的准确性。
2)增加了用户风险标签的覆盖率。
基于用户之间关联图谱的标签传播算法,借助用户之间的信息关联进行风险标签传播,扩展风险用户的覆盖范围。通过用户风险标签的建设和传播,构建出的用户风险体系可以刻画出发生交易(如移动支付)的所有用户的风险属性,对于预先识别欺诈风险方面也有诸多应用。例如:
1.对于具有引流风险的风险用户,可以通过追踪用户的社交情况,提示其他用户与之交易可能出现的风险情况。比如,当该用户与新加好友发生大额交易时,可进行实时策略打击,阻止用户掉入欺诈陷阱。
2.对于具有养号风险的用户,可以通过用户的前期在商户上的支付行为,预先识别出有欺诈风险的商户。可对用户频繁交易的商户进行预先识别,在商户养号阶段识别出后期可能进行欺诈交易的商户,进行商户处罚。
3.对于具有资金转移、洗钱风险的用户,则可以监测用户资金流向,及时阻止非法的资金流动。比如该类用户进行大批量出资金行为时,可进行实时管控,避免将资金转移。
4、建立用户风险体系的过程中,可能会发现不具有任何属性的用户,其中不乏小号和僵尸号。这些可能是非法产业用于后期作恶的工具,可以为识别欺诈风险提供新的源数据。
例如,在利用标签传播算法预测得到一个账号的各个风险属性的概率/权重均为0,也就是该账号的结果向量
Figure PCTCN2022114765-appb-000018
中各个属性维度的值均为0,则可以认为该账号是小号/僵尸号,这样的账号的社交信息、支付行为信息等可以作为风险识别模型的新增样本,通过训练可以使模型不仅能够对各个类型的风险账号进行预测,还能够识别出小号/僵尸号等类型的账号。
基于与本申请实施例提供的方法相同的原理,本申请实施例还提供了一种对象识别装置,如图8所示,该对象识别装置100可以包括第一预测模块110、参考数据集获取模块120、第二预测模块130和识别结果确定模块140。
第一预测模块110,用于获取至少一个待识别对象的相关对象数据;对于每个待识别对象,基于该待识别对象的相关对象数据通过对象识别模型预测得到该待识别对象的第一标签,所述第一标签表征了在多种对象类型中一个对象所属的对象类型;
参考数据集获取模块120,用于获取参考数据集,参考数据集中包括带有标注标签的多个第一样本对象的相关对象数据和第二标签,一个第一样本对象的标注标签表征了在多种对象类型中该第一样本对象所属的真实对象类型,所述第二标签表征了一个对象属于多种对象类型中的每种对象类型的概率;
第二预测模块430,用于根据每个待识别对象和每个第一样本对象的相关对象数据,确定至少一个待识别对象和多个第一样本对象中各对象之间的第一关联关系,根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签;
识别结果确定模块140,用于根据每个待识别对象的第二标签,确定每个待识别对象的识别结果。
在一些实施例中,第二预测模块具体可以用于:
将每个待识别对象的第一标签作为待识别对象的标注标签和初始的第二标签,根据每个待识别对象和第一样本对象的标注标签和第二标签,基于第一关联关系,在待识别对象和第一样本对象之间进行至少一次标签传播,得到每个待识别对象和第一样本对象更新后的第五标签;对于每个待识别对象,根据第一关联关系,将与该待识别对象具有第一关联关系的各对象的更新后的第五标签进行融合,得到该待识别对象的第二标签。
在一些实施例中,第二预测模块在进行每次标签传播时可以执行以下操作:
对于待识别对象和第一样本对象中的每个对象,根据第一关联关系,基于与该对象具有关联关系的各对象的第二标签,对该对象的第二标签进行更新;对于每个对象,将该对象的更新后的第二标签和该对象的标注标签进行融合,得到该对象的第五标签,将该对象的第五标签作为下一次标签传播时该对象的第二标签。
在一些实施例中,相关对象数据包括至少一种类型的相关对象数据,第一关联关系包括与每种类型的相关对象数据对应的该类型的关联关系;相应的,第二预测模块可以用于:
获取每种类型的关联关系对应的权重;根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、每种类型的关联关系、以及每种类型的关联关系对应的权重,确定每个待识别对象的第二标签。在一些实施例中,第二预测模块可以用于:对于至少一个待识别对象和多个第一样本对象中的每个对象,根据该对象的相关对象数据,确定该对象的影响力;根据每个待识别对象的第一标签、每个第一样本对象的标注标签和第二标签、每个待识别对象和第一样本对象的影响力、以及第一关联关系,确定每个待识别对象的第二标签。
在一些实施例中,相关对象数据包括至少一种类型的相关对象数据,第一关联关系包括与每种类型的相关对象数据分别对应的该类型的关联关系,至少一个待识别对象和多个第一样本对象中的每个对象的影响力包括每个对象对应于每种类型的关联关系的影响力。
在一些实施例中,第二预测模块可以用于:根据每个待识别对象的第一标签和每个第一样本对象的标注标签,确定在至少一个待识别对象和多个第一样本对象中每种对象类型的对象数量占比;将每种对象类型的对象数量占比作为权重,对至少一个待识别对象中相应对象类型的第一标签进行加权,对多个第一样本对象中相应对象类型的标注标签进行加权;根据每个待识别对象的加权后的第一标签、每个第一样本对象的加权后的标注标签和第二标签、以及第一关联关系,确定每个待识别对象的第二标签。
在一些实施例中,对象识别模型由模型训练模块通过执行以下操作得到:
获取第一训练数据集,第一训练数据集包括带有标注标签的多个第二样本对象的相关对象数据、以及多个未标记的第三样本对象的相关对象数据,多个第二样本对象的真实对象类型包括多种对象类型中每种类型;
基于多个第二样本对象的相关对象数据,对初始分类模型进行训练,直至满足第一训练结束条件,得到第一分类模型;对于每个第三样本对象,基于该第三样本对象的相关对象数据,通过第一分类模型预测得到该对象的对象类型,根据该第三样本对象类型确定该对象的标注标签;基于多个第二样本对象的相关对象数据、以及带有标注标签的多个第三样本对象的相关对象数据,对第一分类模型继续训练,直至满足第二训练结束条件,得到对象识别模型。
在一些实施例中,参考数据集是由参考数据集获取模块通过以下方式获取到的:
获取第二训练数据集,第二训练数据集包括带有标注标签的多个第一样本对象的相关对象数据;根据每个第一样本对象的相关对象数据,确定第二训练数据集各第一样本对象之间的第二关联关系;将每个第一样本对象的标注标签作为该第一样本对象初始的第三标签,重复执行以下操作,直至多个第一样本对象更新后的第三标签满足预设条件,将满足预设条件时的每个第一样本对象的第三标签确定为该第一样本对象的第二标签:基于第二关联关系以及各第一样本对象的标注标签和第三标签,通过在多个第一样本对象之间进行标签传播,得到每个第一样本对象更新后的第四标签;对于每个第一样本对象,根据第二关联关系,通过融合与该第一样本对象具有关联关系的各第一样本对象的第四标签,得到该第一样本对象新的第三标签。
在一些实施例中,参考数据集获取模块在还可以用于:
每进行一次标签传播后,获取新增数据,新增数据包括带有标注标签的至少一个第四样本对象的相关对象数据;将新增数据中的每个第四样本对象作为所述第二训练数据集中新增的第一样本对象,以更新第二训练数据集;根据更新后的第二训练数据集中每个第一样本对象的相关对象数据,确定更新后的第二训练数据集中各第一样本对象之间的第二关联关系,得到更新后的第二关联关系;
参考数据集获取模块在得到每个第一样本对象更新后的第四标签时,可以用于:
将每个新增的第一样本对象的标注标签作为该第一样本对象的第三标签,基于更新后的第二关联关系、以及更新后的各第一样本对象的标注标签和第三标签,通过在更新后的多个第一样本对象之间进行标签传播,得到更新后的每个第一样本对象的第四标签。
在一些实施例中,新增数据中各第四样本对象的标注标签是通过以下方式获取到的:
获取至少一个未标注的第四样本对象的相关对象数据;对于至少一个未标注的第四样本对象中每个第四样本对象,基于该第四样本对象的相关对象数据,通过对象识别模型预测得到该第四样本对象的第一标签,将该第四样本对象的第一标签作为该第四样本对象的标注标签。
在一些实施例中,对于每次标签传播,参考数据集获取模块还用于:
根据多个第一样本对象的相关对象数据,确定多个第一样本对象中的相似对象对;其中,满足预设条件包括损失函数的值设定条件;
损失函数包括第一损失函数和第二损失函数,对于每次标签传播,第一损失函数的值表征了各第一样本对象的标注标签和新的第三标签之间的差异,第二损失函数的值表征了各相似对象对的新的第三标签之间的差异。
本申请实施例的装置可执行本申请实施例所提供的方法,其实现原理相类似,本申请各实施例的装置中的各模块所执行的动作是与本申请各实施例的方法中的步骤相对应的,对于装置的各模块的详细功能描述具体可以参见前文中所示的对应方法中的描述,此处不再赘述。
基于与本申请实施例提供的方法、装置相同的原理,本申请实施例还提供了一种电子设备,该电子设备可以包括存储器和处理器,其中,存储器中存储有计算机程序,处理器在运行该计算机程序时用于执行本申请任一可选实施例提供的方法,或者用于执行本申请任一可选实施例提供的装置所执行的动作。
作为一个可选实施例,图9中示出了本申请实施例的一种电子设备的结构示意图,如图9所示,该电子设备4000包括处理器4001和存储器4003。其中,处理器4001和存储器4003相连,如通过总线4002相连。可选地,电子设备4000还可以包括收发器4004,收发器4004可以用于该电子设备与其他电子设备之间的数据交互,如数据的发送和/或数据的接收等。需要说明的是,实际应用中收发器4004不限于一个,该电子设备4000的结构并不构成对本申请实施例的限定。
处理器4001可以是CPU(Central Processing Unit,中央处理器),通用处理器,DSP(Digital Signal Processor,数据信号处理器),ASIC(Application Specific Integrated Circuit,专用集成电路), FPGA(Field Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器4001也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
总线4002可包括一通路,在上述组件之间传送信息。总线4002可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。总线4002可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器4003可以是ROM(Read Only Memory,只读存储器)或可存储静态信息和指令的其他类型的静态存储设备,RAM(Random Access Memory,随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM(Electrically Erasable Programmable Read Only Memory,电可擦可编程只读存储器)、CD-ROM(Compact Disc Read Only Memory,只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质、其他磁存储设备、或者能够用于携带或存储计算机程序并能够由计算机读取的任何其他介质,在此不做限定。
存储器4003用于存储执行本申请实施例的计算机程序,并由处理器4001来控制执行。处理器4001用于执行存储器4003中存储的计算机程序,以实现前述方法实施例所示的步骤。
本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。
本申请实施例还提供了一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。
本申请实施例还提供了一种本计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述本申请任一可选实施例中提供的方法。
应该理解的是,虽然本申请实施例的流程图中通过箭头指示各个操作步骤,但是这些步骤的实施顺序并不受限于箭头所指示的顺序。除非本文中有明确的说明,否则在本申请实施例的一些实施场景中,各流程图中的实施步骤可以按照需求以其他的顺序执行。此外,各流程图中的部分或全部步骤基于实际的实施场景,可以包括多个子步骤或者多个阶段。这些子步骤或者阶段中的部分或全部可以在同一时刻被执行,这些子步骤或者阶段中的每个子步骤或者阶段也可以分别在不同的时刻被执行。在执行时刻不同的场景下,这些子步骤或者阶段的执行顺序可以根据需求灵活配置,本申请实施例对此不限制。
以上所述仅是本申请部分实施场景的可选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请的方案技术构思的前提下,采用基于本申请技术思想的其他类似实施手段,同样属于本申请实施例的保护范畴。

Claims (16)

  1. 一种对象识别方法,由服务器执行,包括:
    获取至少一个待识别对象的相关对象数据;
    对于每个所述待识别对象,基于该待识别对象的相关对象数据通过对象识别模型预测得到该待识别对象的第一标签,所述第一标签表征了在多种对象类型中一个对象所属的对象类型;
    获取参考数据集,所述参考数据集中包括带有标注标签的多个第一样本对象的相关对象数据和第二标签,一个第一样本对象的标注标签表征了在所述多种对象类型中该第一样本对象所属的真实对象类型,所述第二标签表征了一个对象属于所述多种对象类型中的每种对象类型的概率;
    根据每个所述待识别对象和每个所述第一样本对象的相关对象数据,确定所述至少一个待识别对象和所述多个第一样本对象中各对象之间的第一关联关系;
    根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签;
    对于每个所述待识别对象,根据该待识别对象的第二标签,确定出待识别对象的识别结果。
  2. 根据权利要求1所述的方法,其中,所述根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签,包括:
    将每个所述待识别对象的第一标签作为所述待识别对象的标注标签和初始的第二标签;
    根据每个所述待识别对象和所述第一样本对象的标注标签和第二标签,基于所述第一关联关系,在所述待识别对象和所述第一样本对象之间进行至少一次标签传播,得到每个所述待识别对象和所述第一样本对象更新后的第五标签;
    对于每个所述待识别对象,根据所述第一关联关系,将与该待识别对象具有第一关联关系的各对象的更新后的第五标签进行融合,得到该待识别对象的第二标签。
  3. 根据权利要求2所述的方法,其中,每次标签传播包括以下操作:
    对于所述待识别对象和所述第一样本对象中的每个对象,根据所述第一关联关系,基于与该对象具有关联关系的各对象的第二标签,对该对象的第二标签进行更新;
    对于所述每个对象,将该对象的更新后的第二标签和该对象的标注标签进行融合,得到该对象的第五标签,将该对象的第五标签作为下一次标签传播时该对象的第二标签。
  4. 根据权利要求1至3中任一项所述的方法,其中,所述相关对象数据包括至少一种类型的相关对象数据,所述第一关联关系包括与每种类型的相关对象数据对应的该类型的关联关系;
    所述根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签,包括:
    获取每种类型的关联关系对应的权重;
    根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、每种类型的关联关系、以及每种类型的关联关系对应的权重,确定每个所述待识别对象的第二标签。
  5. 根据权利要求1至3中任一项所述的方法,其中,还包括:
    对于所述至少一个待识别对象和所述多个第一样本对象中的每个对象,根据该对象的相关对象数据,确定该对象的影响力;
    所述根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签,包括:
    根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、每个所述待识别对象和所述第一样本对象的影响力、以及所述第一关联关系,确定每个所述待识别对象的第二标签。
  6. 根据权利要求5所述的方法,其中,所述相关对象数据包括至少一种类型的相关对象数据,所述第一关联关系包括与每种类型的相关对象数据对应的该类型的关联关系,所述至少一个待识别对象和所述多个第一样本对象中的每个对象的影响力包括每个对象对应于每种类型的关联关系 的影响力。
  7. 根据权利要求1至3中任一项所述的方法,其中,还包括:
    根据每个所述待识别对象的第一标签和每个所述第一样本对象的标注标签,确定在所述至少一个待识别对象和所述多个第一样本对象中每种对象类型的对象数量占比;
    所述根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签,包括:
    将每种对象类型的对象数量占比作为权重,对所述至少一个待识别对象中相应对象类型的第一标签进行加权,并对所述多个第一样本对象中相应对象类型的标注标签进行加权;
    根据每个所述待识别对象的加权后的第一标签、每个所述第一样本对象的加权后的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签。
  8. 根据权利要求1所述的方法,其中,所述对象识别模型是通过以下方式训练得到的:
    获取所述第一训练数据集,所述第一训练数据集包括带有标注标签的多个第二样本对象的相关对象数据、以及多个未标记的第三样本对象的相关对象数据,所述多个第二样本对象的真实对象类型包括所述多种对象类型中每种类型;
    基于所述多个第二样本对象的相关对象数据,对初始分类模型进行训练,直至满足第一训练结束条件,得到第一分类模型;
    对于每个所述第三样本对象,基于该第三样本对象的相关对象数据,通过所述第一分类模型预测得到该第三样本对象的对象类型,根据该对象类型确定该第三样本对象的标注标签;
    基于所述多个第二样本对象的相关对象数据、以及带有标注标签的多个第三样本对象的相关对象数据,对所述第一分类模型继续训练,直至满足第二训练结束条件,得到所述对象识别模型。
  9. 根据权利要求1所述的方法,其中,所述参考数据集是通过以下方式获取到的:
    获取第二训练数据集,所述第二训练数据集包括带有标注标签的多个第一样本对象的相关对象数据;
    根据每个所述第一样本对象的相关对象数据,确定所述第二训练数据集各第一样本对象之间的第二关联关系;
    将每个所述第一样本对象的标注标签作为该第一样本对象初始的第三标签,重复执行以下操作,直至所述多个第一样本对象更新后的第三标签满足预设条件,将满足所述预设条件时的每个所述第一样本对象的第三标签确定为该第一样本对象的第二标签:
    基于所述第二关联关系以及各所述第一样本对象的标注标签和第三标签,通过在所述多个第一样本对象之间进行标签传播,得到每个第一样本对象更新后的第四标签;并对于每个所述第一样本对象,根据所述第二关联关系,通过融合与该第一样本对象具有关联关系的各第一样本对象的第四标签,得到该第一样本对象新的第三标签。
  10. 根据权利要求9所述的方法,其中,在每进行一次标签传播后,所述方法还包括:
    获取新增数据,所述新增数据包括带有标注标签的至少一个第四样本对象的相关对象数据;
    将所述新增数据中的每个第四样本对象作为所述第二训练数据集中新增的第一样本对象以更新所述第二训练数据集;
    根据更新后的第二训练数据集中每个所述第一样本对象的相关对象数据,确定更新后的第二训练数据集中各第一样本对象之间的第二关联关系,得到更新后的第二关联关系;
    所述基于所述第二关联关系以及各所述第一样本对象的标注标签和第三标签,通过在所述多个第一样本对象之间进行标签传播,得到每个第一样本对象更新后的第四标签,包括:
    将每个新增的第一样本对象的标注标签作为该第一样本对象的第三标签,基于更新后的第二关联关系、以及更新后的各第一样本对象的标注标签和第三标签,通过在更新后的多个第一样本对象之间进行标签传播,得到更新后的每个第一样本对象的第四标签。
  11. 根据权利要求10所述的方法,其中,所述新增数据中各第四样本对象的标注标签是通过以下方式获取到的:
    获取至少一个未标注的第四样本对象的相关对象数据;
    对于所述至少一个未标注的第四样本对象中每个第四样本对象,基于该第四样本对象的相关对象数据,通过所述对象识别模型预测得到该第四样本对象的第一标签,将该第四样本对象的第一标签作为该第四样本对象的标注标签。
  12. 根据权利要求9所述的方法,其中,所述方法还包括:
    根据所述多个第一样本对象的相关对象数据,确定所述多个第一样本对象中的相似对象对;
    其中,所述满足预设条件包括损失函数的值设定条件;
    所述损失函数包括第一损失函数和第二损失函数,对于每次标签传播,所述第一损失函数的值表征了各所述第一样本对象的标注标签和新的第三标签之间的差异,所述第二损失函数的值表征了各所述相似对象对的新的第三标签之间的差异。
  13. 一种对象识别装置,包括:
    第一预测模块,用于获取至少一个待识别对象的相关对象数据;对于每个所述待识别对象,基于该待识别对象的相关对象数据通过对象识别模型预测得到该待识别对象的第一标签,所述第一标签表征了在多种对象类型中一个对象所属的对象类型;
    参考数据集获取模块,用于获取参考数据集,所述参考数据集中包括带有标注标签的多个第一样本对象的相关对象数据和第二标签,一个第一样本对象的标注标签表征了在所述多种对象类型中该第一样本对象所属的真实对象类型,所述第二标签表征了一个对象属于所述多种对象类型中的每种对象类型的概率;
    第二预测模块,用于根据每个所述待识别对象和每个所述第一样本对象的相关对象数据,确定所述至少一个待识别对象和所述多个第一样本对象中各对象之间的第一关联关系,根据每个所述待识别对象的第一标签、每个所述第一样本对象的标注标签和第二标签、以及所述第一关联关系,确定每个所述待识别对象的第二标签;
    识别结果确定模块,用于根据每个所述待识别对象的第二标签,确定每个所述待识别对象的识别结果。
  14. 一种电子设备,包括存储器、处理器及存储在存储器上的计算机程序,所述处理器执行所述计算机程序以实现权利要求1-12中任一项所述方法的步骤。
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-12中任一项所述的方法的步骤。
  16. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1-12中任一项所述方法的步骤。
PCT/CN2022/114765 2021-09-22 2022-08-25 对象识别方法、装置、电子设备及存储介质 WO2023045691A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/195,868 US20230281479A1 (en) 2021-09-22 2023-05-10 Object recognition method and apparatus, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111109153.6A CN115859187A (zh) 2021-09-22 2021-09-22 对象识别方法、装置、电子设备及存储介质
CN202111109153.6 2021-09-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/195,868 Continuation US20230281479A1 (en) 2021-09-22 2023-05-10 Object recognition method and apparatus, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2023045691A1 true WO2023045691A1 (zh) 2023-03-30

Family

ID=85652151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114765 WO2023045691A1 (zh) 2021-09-22 2022-08-25 对象识别方法、装置、电子设备及存储介质

Country Status (3)

Country Link
US (1) US20230281479A1 (zh)
CN (1) CN115859187A (zh)
WO (1) WO2023045691A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467500A (zh) * 2023-06-15 2023-07-21 阿里巴巴(中国)有限公司 数据关系识别、自动问答、查询语句生成方法
CN116542673A (zh) * 2023-07-05 2023-08-04 成都乐超人科技有限公司 应用于机器学习的欺诈行为识别方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688482A (zh) * 2019-09-12 2020-01-14 新华三大数据技术有限公司 多标签识别方法、训练方法及装置
CN111444334A (zh) * 2019-01-16 2020-07-24 阿里巴巴集团控股有限公司 数据处理方法、文本识别方法、装置及计算机设备
US20200334501A1 (en) * 2019-04-18 2020-10-22 Adobe Inc Robust training of large-scale object detectors with a noisy dataset
CN112115957A (zh) * 2019-06-21 2020-12-22 华为技术有限公司 数据流识别方法及装置、计算机存储介质
CN112818826A (zh) * 2021-01-28 2021-05-18 北京市商汤科技开发有限公司 目标识别方法及装置、电子设备及存储介质
CN112989055A (zh) * 2021-04-29 2021-06-18 腾讯科技(深圳)有限公司 文本识别方法、装置、计算机设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444334A (zh) * 2019-01-16 2020-07-24 阿里巴巴集团控股有限公司 数据处理方法、文本识别方法、装置及计算机设备
US20200334501A1 (en) * 2019-04-18 2020-10-22 Adobe Inc Robust training of large-scale object detectors with a noisy dataset
CN112115957A (zh) * 2019-06-21 2020-12-22 华为技术有限公司 数据流识别方法及装置、计算机存储介质
CN110688482A (zh) * 2019-09-12 2020-01-14 新华三大数据技术有限公司 多标签识别方法、训练方法及装置
CN112818826A (zh) * 2021-01-28 2021-05-18 北京市商汤科技开发有限公司 目标识别方法及装置、电子设备及存储介质
CN112989055A (zh) * 2021-04-29 2021-06-18 腾讯科技(深圳)有限公司 文本识别方法、装置、计算机设备和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467500A (zh) * 2023-06-15 2023-07-21 阿里巴巴(中国)有限公司 数据关系识别、自动问答、查询语句生成方法
CN116467500B (zh) * 2023-06-15 2023-11-03 阿里巴巴(中国)有限公司 数据关系识别、自动问答、查询语句生成方法
CN116542673A (zh) * 2023-07-05 2023-08-04 成都乐超人科技有限公司 应用于机器学习的欺诈行为识别方法及系统
CN116542673B (zh) * 2023-07-05 2023-09-08 成都乐超人科技有限公司 应用于机器学习的欺诈行为识别方法及系统

Also Published As

Publication number Publication date
US20230281479A1 (en) 2023-09-07
CN115859187A (zh) 2023-03-28

Similar Documents

Publication Publication Date Title
TWI712981B (zh) 風險辨識模型訓練方法、裝置及伺服器
US11907955B2 (en) System and method for blockchain automatic tracing of money flow using artificial intelligence
Wang et al. Ponzi scheme detection via oversampling-based long short-term memory for smart contracts
WO2023045691A1 (zh) 对象识别方法、装置、电子设备及存储介质
US11257070B2 (en) Computer-implemented system and method for generating and extracting user related data stored on a blockchain
TWI726341B (zh) 樣本屬性評估模型訓練方法、裝置、伺服器及儲存媒體
WO2022121145A1 (zh) 一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置
JP2017091516A (ja) 不正取引を特定するコンピュータ実装方法、データ処理システムおよびコンピュータ・プログラム
US20230050193A1 (en) Probabilistic feature engineering technique for anomaly detection
US20110166979A1 (en) Connecting decisions through customer transaction profiles
CN112700252B (zh) 一种信息安全性检测方法、装置、电子设备和存储介质
CN113011646B (zh) 一种数据处理方法、设备以及可读存储介质
US11538044B2 (en) System and method for generation of case-based data for training machine learning classifiers
CN109829721B (zh) 基于异质网络表征学习的线上交易多主体行为建模方法
CN111951008A (zh) 一种风险预测方法、装置、电子设备和可读存储介质
Altman et al. Realistic synthetic financial transactions for anti-money laundering models
CN113935738A (zh) 交易数据处理方法、装置、存储介质及设备
Zhao et al. Detecting fake reviews via dynamic multimode network
CN116451050A (zh) 异常行为识别模型训练、异常行为识别方法和装置
KR20210106592A (ko) 인공지능을 이용한 암호화폐 계좌 분류 방법, 장치 및 컴퓨터프로그램
Islam An efficient technique for mining bad credit accounts from both olap and oltp
Cherif et al. Encoder–decoder graph neural network for credit card fraud detection
Xu et al. Illegal Accounts Detection on Ethereum Using Heterogeneous Graph Transformer Networks
CN113011968B (zh) 账号状态的检测方法、装置和存储介质及电子设备
Deshpande et al. Multilevel credit card fraud detection using face recognition and machine learning

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 11202306813X

Country of ref document: SG

NENP Non-entry into the national phase

Ref country code: DE