CN109978033B

CN109978033B - Method and device for constructing same-operator recognition model and method and device for identifying same-operator

Info

Publication number: CN109978033B
Application number: CN201910199958.0A
Authority: CN
Inventors: 王萌
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2020-08-04
Anticipated expiration: 2039-03-15
Also published as: CN109978033A

Abstract

A method and a device for constructing a same-operator recognition model and identifying the same operator are provided. By carrying out deep mining and quantitative analysis on the information related to the accounts, the obtained strong correlation characteristic and/or weak correlation characteristic which can represent the correlation between the two accounts can be used as a basis for judging whether the two accounts are the same operator, so that whether the two accounts are the same operator can be judged more accurately. For example, a co-operator recognition model for recognizing whether two accounts are controlled by the same operator may be constructed by using a strong correlation feature and/or a weak correlation feature capable of characterizing the correlation between the two accounts as features of a training sample.

Description

Method and device for constructing same-operator recognition model and method and device for identifying same-operator

Technical Field

The present invention relates generally to the field of data science, and more particularly, to a method and apparatus for constructing a co-operator recognition model, a co-operator recognition method and apparatus, a system, and a storage medium.

Background

Currently, most websites and internet products provide specific services for users mainly according to network accounts registered by the users. For example, the user can log in the registered account on a shopping website to perform online shopping. Because a user can register a large number of accounts by using a plurality of devices or virtual machines, how to identify whether two accounts are controlled by the same natural person is very important. For example, the shopping website can find the account with fraudulent behavior such as billing by identifying whether the two accounts are controlled by the same operator.

On the other hand, as the consumption concept is upgraded, more and more users choose to pay by swiping a card, and as the payment technology (especially, the mobile payment technology) is developed, more and more users realize mobile payment by binding financial accounts (such as bank card accounts and credit card accounts) with payment software installed in mobile devices. If two accounts related to finance can be accurately identified to be controlled by the same operator, the risk accounts with illegal behaviors such as credit card frying, card raising, cash register and the like can be found.

Therefore, the method has a great significance in various fields such as the internet, finance and the like with the operator identification technology. Most of the existing same operator identification technologies are based on business experience and expert rules, deep mining and quantitative analysis on mass data are lacked, and valuable information cannot be effectively extracted from the mass data to be used as a basis for judging whether two accounts are the same operator or not. Secondly, the existing identification technology of the same operator is strongly coupled with the service, has poor universality, can only solve the problem of a certain vertical scene, and does not have the capability of large-scale popularization or migration use. In addition, in a risk control type scene of a strong attack and defense environment, a crime risk mode can be rapidly evolved, the existing same operator identification technology cannot effectively cope with the change, and the effect can be rapidly attenuated along with time.

Disclosure of Invention

Exemplary embodiments of the present invention are directed to a co-operator identification scheme to address at least one of the problems set forth above.

According to a first aspect of the present invention, a method for constructing a co-operator recognition model is provided, which includes: acquiring a training data set; the training data set comprises at least one piece of training data, each piece of training data corresponds to one account pair, and the mark of each piece of training data is used for indicating whether the corresponding account pair is controlled by the same operator, wherein the same operator refers to the same natural person or the same group; performing feature extraction processing on the training data set to obtain a training sample set; the features of the training samples in the training sample set comprise strongly-associated features and/or weakly-associated features, the strongly-associated features refer to features related to strongly-associated media shared by the account pairs, the weakly-associated features refer to features related to weakly-associated media shared by the account pairs, the strongly-associated media refer to media with the aggregated account number smaller than a first predetermined threshold, the weakly-associated media refer to media with the aggregated account number larger than a second predetermined threshold, and the media are used for representing associated carriers of a plurality of accounts in a certain dimension; constructing a same operator recognition model based on the training sample set; the same operator identification model is used for identifying whether the two accounts are controlled by the same operator.

Optionally, the step of acquiring a training data set comprises: receiving a training data set from outside; or the step of obtaining a training data set comprises: collecting relevant information of a plurality of accounts, and constructing a training data set based on the relevant information of the plurality of accounts.

Optionally, the step of constructing a training data set comprises: discovering account pairs sharing over-strong correlation media according to the strong correlation media, constructing training data based on the relevant information of the discovered account pairs, and marking the training data corresponding to the discovered account pairs according to the service feedback information; and/or discovering account pairs according to the service feedback information, constructing training data based on the relevant information of the discovered account pairs, and marking the training data corresponding to the discovered account pairs.

Optionally, the step of marking the corresponding training data for the found account includes: marking training data corresponding to the account pair according to whether the account pair belongs to the same group; wherein the indicia is used to indicate that the account pair is in control of the same operator if the account pair belongs to the same group, and to indicate that the account pair is not in control of the same operator if the account pair does not belong to the same group; and/or marking training data corresponding to the account pair according to whether two accounts in the account pair are risk accounts or not; the mark is used for indicating that the account pair is controlled by the same operator when both the two accounts in the account pair are risk accounts, and the mark is used for indicating that the account pair is not controlled by the same operator when only one of the two accounts in the account pair is a risk account.

Optionally, the information related to the account comprises at least one of: natural person information including information related to a natural person acquired when an account is registered and/or used; the account information comprises information which is stored by the server and related to the account; operation information including information related to operation behavior generated by the account; social information including social relationship type information related to a natural person corresponding to the account; the account is used as the information of the event passive party; other information related to the account.

Optionally, the step of constructing the same-operator recognition model includes: training in a supervised learning mode to obtain a same operator recognition model; or constructing the same operator recognition model based on a manual experience mode.

Optionally, the media is divided into a first media and a second media, the first media is used for characterizing original associated carriers of a plurality of accounts in a certain dimension, and the second media is a new media derived based on the first media.

Optionally, the second medium is a new medium consisting of part of the fields in the first medium; alternatively, the second medium is a new medium composed of a plurality of the first media; or the second medium is obtained by associating the first medium based on a preset association mode.

Optionally, the strongly associated feature comprises at least one of: a first correlation number, a first maximum correlation degree and a first cumulative sum of correlation degrees; the first association number is used for representing the number of the account pairs of strong association media sharing the same dimension, the first maximum association degree is used for representing the maximum first association degree in the first association degrees corresponding to at least part of the strong association media in the account pairs of strong association media sharing the same dimension, the first association degree is related to the number of the accounts sharing the strong association media, and the first association degree is accumulated and is used for representing the sum of the first association degrees corresponding to the account pairs of at least part of the strong association media sharing the same dimension; and/or the weakly associated feature comprises at least one of: the second correlation number, the second maximum correlation degree and the second correlation degree are accumulated; the second association number is used for representing the number of the account pairs sharing the weak association medium with the same dimension, the second maximum association degree is used for representing the maximum second association degree in the second association degrees corresponding to at least part of the weak association media in the account pairs sharing the same dimension, the second association degree is related to the number of the account pairs sharing the weak association medium with the same dimension, and the second association degree is accumulated and used for representing the sum of the second association degrees corresponding to the account pairs sharing the weak association medium with the same dimension.

Optionally, the features of the training samples in the training sample set further include: and comparing features, wherein the comparing features are used for characterizing the difference between the related information of the account pair.

Optionally, the method further comprises: after the same operator recognition model is used for recognizing whether the two accounts are controlled by the same operator, under the condition that the actual service feedback result is inconsistent with the recognition result, a new training sample is constructed based on the related information of the two accounts, and the same operator recognition model is updated by using the new training sample.

Optionally, both accounts of the account pair are financial related accounts; or one account of the two accounts in the account pair is an existing account, and the other account is a temporary account; or, the two accounts in the account pair are accounts corresponding to different mobile phone numbers.

According to the second aspect of the present invention, there is also provided a method for identifying a co-operator, comprising: obtaining prediction data; wherein the prediction data comprises information about two accounts; performing feature extraction processing on the prediction data to obtain a prediction sample; the characteristics of the prediction sample comprise strong correlation characteristics and/or weak correlation characteristics, the strong correlation characteristics refer to characteristics related to a strong correlation medium shared by the two accounts, the weak correlation characteristics refer to characteristics related to a weak correlation medium shared by the two accounts, the strong correlation medium refers to a medium with the aggregated account number smaller than a first preset threshold value, the weak correlation medium refers to a medium with the aggregated account number larger than a second preset threshold value, and the medium is used for representing a correlation carrier of a plurality of accounts in a certain dimension; and inputting the prediction sample into a same-operator recognition model to obtain a recognition result which is output by the same-operator recognition model and used for predicting whether the two accounts are controlled by the same operator, wherein the same operator refers to the same natural person or the same group.

Optionally, the strongly associated feature comprises at least one of: a first correlation number, a first maximum correlation degree and a first cumulative sum of correlation degrees; the first association degree is used for representing the number of strong association media which share the same dimension by two accounts, the first maximum association degree is used for representing the maximum first association degree in the first association degrees corresponding to at least part of the strong association media in the strong association media which share the same dimension by the two accounts, the first association degree is related to the number of the accounts which share the strong association media, and the first association degree is accumulated and is used for representing the sum of the first association degrees corresponding to at least part of the strong association media which share the same dimension by the two accounts; and/or the weakly associated feature comprises at least one of: the second correlation number, the second maximum correlation degree and the second correlation degree are accumulated; the second association degree is used for representing the number of weak association media with the same dimension shared by the two accounts, the second maximum association degree is used for representing the maximum second association degree in the second association degrees corresponding to at least part of the weak association media in the weak association media with the same dimension shared by the two accounts, the second association degree is related to the number of the accounts with the weak association media shared by the two accounts, and the second association degree is accumulated and is used for representing the sum of the second association degrees corresponding to at least part of the weak association media with the same dimension shared by the two accounts.

Optionally, predicting the characteristics of the sample further comprises: and comparing features, wherein the comparing features are used for representing the difference between the related information of the two accounts.

Optionally, the related information of the two accounts comprises at least one of: natural person information including information related to a natural person acquired when an account is registered and/or used; the account information comprises information which is stored by the server and related to the account; operation information including information related to operation behavior generated by the account; social information including social relationship type information related to a natural person corresponding to the account; the account is used as the information of the event passive party; other information related to the account.

Optionally, the two accounts are financial related accounts; or one account of the two accounts is an existing account, and the other account is a temporary account; or, the two accounts are accounts corresponding to different mobile phone numbers.

Optionally, the recognition result is a prediction score, and the method further comprises: constructing a first ordered set according to the magnitude sequence of the prediction scores obtained by predicting the same operator identification model aiming at the plurality of prediction samples, wherein the first ordered set comprises a plurality of first elements, and each first element comprises the prediction score of the same operator identification model aiming at one prediction sample and the actual service feedback score of the prediction sample; calculating a corrected score of the predicted score according to the actual service feedback scores of a preset number of first elements near a certain predicted score in the first ordered set, wherein the corrected score is related to the sum of the actual service feedback scores of the preset number of first elements and related to the preset number; and correcting the predicted score according to the calculated correction score.

Optionally, the method further comprises: constructing a second ordered set according to the magnitude order of the predicted scores, wherein the second ordered set comprises a plurality of second elements, and each second element comprises the predicted score of the same operator recognition model for one predicted sample and a corrected score of the predicted score; calculating a corrected value of the predicted value by an interpolation mode based on the second ordered set aiming at the predicted value obtained by predicting the same operator identification model aiming at the new predicted sample; and correcting the new prediction score according to the calculated correction score.

Optionally, the method further comprises: outputting interpretation information for interpreting the recognition result; wherein the interpretation information includes: first interpretation information relating to strongly and/or weakly correlated features of the prediction samples; and/or second interpretation information relating to features other than strongly and/or weakly correlated features of the predicted sample.

Optionally, the recognition result is a prediction score, and the method further comprises: in the case that the predicted score is greater than a third predetermined threshold and less than a fourth predetermined threshold, checking at least one of the two accounts; and/or performing degradation processing on the operation authority of at least one of the two accounts under the condition that the predicted score is larger than a fourth preset threshold value.

Optionally, the method further comprises: when the fact that the accounts in the account pair controlled by the same operator complete verification through the third-party platform is found, or the fact that the accounts exceeding the preset number complete verification through the third-party platform, the verification mode is changed; and/or, in the case that the probability that the account pair with the predicted score larger than the third predetermined threshold and smaller than the fourth predetermined threshold belongs to the same operator is found to be increased, turning down the third predetermined threshold and/or the fourth predetermined threshold.

According to the third aspect of the present invention, there is also provided an apparatus for constructing a co-operator recognition model, including: a data acquisition unit for acquiring a training data set; the training data set comprises at least one piece of training data, each training sample corresponds to one account pair, and the mark of each piece of training data is used for indicating whether the corresponding account pair is controlled by the same operator, wherein the same operator refers to the same natural person or the same group; the characteristic extraction unit is used for extracting the characteristics of the training data set to obtain a training sample set; the characteristics of the training samples in the training sample set comprise strong association characteristics and/or weak association characteristics, the strong association characteristics refer to characteristics related to a strong association medium shared by the account pair, the weak association characteristics refer to characteristics related to a weak association medium shared by the account pair, the strong association medium refers to a medium with an aggregated account number smaller than a first predetermined threshold, the weak association medium refers to a medium with an aggregated account number larger than a second predetermined threshold, the medium is used for representing association carriers of a plurality of accounts in a certain dimension, and marks of the training samples are used for indicating whether the account pair is controlled by the same operator; the same operator refers to the same natural person or the same group; the model construction unit is used for constructing a same operator recognition model according to the training sample set; the same operator identification model is used for identifying whether the two accounts are controlled by the same operator.

Optionally, the data acquisition unit receives a training data set from outside; or the data acquisition unit acquires the relevant information of a plurality of accounts and constructs a training data set based on the relevant information of the plurality of accounts.

Optionally, the data obtaining unit finds an account pair sharing an excessively strong correlation medium according to the strong correlation medium, constructs training data based on the found related information of the account pair, and marks the training data corresponding to the found account pair according to the service feedback information; and/or the data acquisition unit finds the account pair according to the service feedback information, constructs training data based on the found related information of the account pair, and marks the training data corresponding to the found account pair.

Optionally, the data obtaining unit marks training data corresponding to an account pair according to whether the account pair belongs to the same group; wherein the indicia is used to indicate that the account pair is in control of the same operator if the account pair belongs to the same group, and to indicate that the account pair is not in control of the same operator if the account pair does not belong to the same group; and/or the data acquisition unit marks training data corresponding to the account pair according to whether the two accounts in the account pair are risk accounts or not; the mark is used for indicating that the account pair is controlled by the same operator when both the two accounts in the account pair are risk accounts, and the mark is used for indicating that the account pair is not controlled by the same operator when only one of the two accounts in the account pair is a risk account.

Optionally, the model building unit is trained in a supervised learning manner to obtain a same operator recognition model; or the model building unit builds the same operator recognition model based on a manual experience mode.

Optionally, the model building unit is further configured to, after recognizing whether the two accounts are controlled by the same operator by using the same operator recognition model, build a new training sample based on the relevant information of the two accounts when the actual service feedback result is inconsistent with the recognition result, and update the same operator recognition model by using the new training sample.

According to a fourth aspect of the present invention, there is also provided a co-operator identifying apparatus comprising: an obtaining unit configured to obtain prediction data; wherein the prediction data comprises information about two accounts; the extraction unit is used for performing feature extraction processing on the prediction data to obtain a prediction sample; the characteristics of the prediction sample comprise strong correlation characteristics and/or weak correlation characteristics, the strong correlation characteristics refer to characteristics related to a strong correlation medium shared by the two accounts, the weak correlation characteristics refer to characteristics related to a weak correlation medium shared by the two accounts, the strong correlation medium refers to a medium with the aggregated account number smaller than a first preset threshold value, the weak correlation medium refers to a medium with the aggregated account number larger than a second preset threshold value, and the medium is used for representing a correlation carrier of a plurality of accounts in a certain dimension; and the processing unit is used for inputting the prediction sample into a same-operator recognition model so as to obtain a recognition result which is output by the same-operator recognition model and used for predicting whether the two accounts are controlled by the same operator, wherein the same operator refers to the same natural person or the same group.

Optionally, the recognition result is a predicted score, and the apparatus further comprises: a first constructing unit, configured to construct a first ordered set according to a magnitude order of prediction scores obtained by the same-operator identification model through prediction for multiple prediction samples, where the first ordered set includes multiple first elements, and each first element includes a prediction score of the same-operator identification model for one prediction sample and an actual service feedback score of the prediction sample; a first calculating unit, configured to calculate a modified score of the predicted score according to an actual service feedback score in a predetermined number of first elements near a predicted score in the first ordered set, where the modified score is related to a sum of actual service feedback scores in the predetermined number of first elements and is related to the predetermined number; and the first correcting unit is used for correcting the prediction score according to the calculated correction score.

Optionally, the apparatus further comprises: a second constructing unit, configured to construct a second ordered set according to the magnitude order of the prediction scores, where the second ordered set includes a plurality of second elements, and each of the second elements includes a prediction score of the co-operator recognition model for one prediction sample and a modified score of the prediction score; the second calculation unit is used for calculating a predicted score obtained by predicting the same operator identification model aiming at a new prediction sample, and calculating a corrected score of the predicted score by utilizing an interpolation mode based on the second ordered set; and the second correcting unit is used for correcting the new prediction score according to the calculated correction score.

Optionally, the apparatus further comprises: an interpretation unit for outputting interpretation information for interpreting the recognition result; wherein the interpretation information includes: first interpretation information relating to strongly and/or weakly correlated features of the prediction samples; and/or second interpretation information relating to features other than strongly and/or weakly correlated features of the predicted sample.

Optionally, the apparatus further comprises: the verification unit is used for verifying at least one account in the two accounts under the condition that the prediction score is larger than a third preset threshold and smaller than a fourth preset threshold; and/or the degradation processing unit is used for performing degradation processing on the operation authority of at least one of the two accounts under the condition that the prediction score is larger than a fourth preset threshold value.

Optionally, the verification unit changes the verification mode when finding that the accounts belonging to the account pair controlled by the same operator complete verification through the third-party platform, or the accounts exceeding the preset number complete verification through the third-party platform; and/or the degradation processing unit reduces the third threshold value and/or the fourth threshold value in the case that the probability that the account pair with the predicted score larger than the third threshold value and smaller than the fourth threshold value belongs to the same operator is found to be increased.

According to a fifth aspect of the present invention, there is also provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the method according to the first or second aspect of the present invention.

According to a sixth aspect of the present invention, there is also provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method as set forth in the first or second aspect of the present invention.

In the method and the device for constructing the same-operator recognition model and the same-operator recognition, the system and the storage medium according to the exemplary embodiment of the invention, the information related to the account is deeply mined and quantitatively analyzed, and the obtained strong association characteristic and/or weak association characteristic capable of representing the association between the two accounts is used as a basis for judging whether the two accounts are the same operator, so that whether the two accounts are the same operator can be more accurately judged.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

FIG. 1 illustrates a flow chart of a method of building a co-operator recognition model according to an exemplary embodiment of the present invention;

FIG. 2 illustrates a flow chart of a co-operator identification method according to an exemplary embodiment of the present invention;

FIG. 3 illustrates a block diagram of an apparatus for constructing a co-operator recognition model according to an exemplary embodiment of the present invention;

fig. 4 illustrates a block diagram of an apparatus for constructing a co-operator recognition model according to an exemplary embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.

According to the invention, through deep mining and quantitative analysis of the information related to the accounts, valuable information can be extracted from the information related to the accounts to be used as a basis for judging whether the two accounts are the same operator or not. Specifically, the invention provides concepts of a strong correlation medium and a weak correlation medium, and can obtain a strong correlation characteristic and/or a weak correlation characteristic capable of representing the correlation between two accounts according to the strong correlation medium and/or the weak correlation medium shared by the two accounts, and the strong correlation characteristic and/or the weak correlation characteristic can be used as a basis for judging whether the two accounts are the same operator or not.

In an exemplary embodiment of the present invention, a co-operator recognition model for recognizing whether two accounts are controlled by the same operator may be constructed with a strong correlation feature and/or a weak correlation feature capable of characterizing a correlation between the two accounts as features of a training sample. For example, modeling can be performed based on machine learning technology, so that the machine automatically mines rules and patterns in data, and the defect of excessive dependence on business experience and expert rules is avoided. And after the constructed co-operator identification model is used for predicting the prediction sample, the co-operator identification model can be updated according to the actual service feedback result so as to improve the prediction effect of the co-operator identification model. Therefore, the invention can be realized as a set of complete and self-learning same-operator identification scheme, thereby ensuring that the performance of the system does not obviously attenuate with time.

Fig. 1 illustrates a flowchart of a method of constructing a co-operator recognition model according to an exemplary embodiment of the present invention. The method shown in fig. 1 may be implemented entirely by a computer program in software, and the method shown in fig. 1 may also be executed by a specifically configured computing device.

Referring to fig. 1, in step S110, a training data set is acquired. The training data set comprises at least one piece of training data, each piece of training data corresponds to one account pair, and the mark of each piece of training data is used for indicating whether the corresponding account pair is controlled by the same operator, wherein the same operator refers to the same natural person or the same group.

Step S120, performing feature extraction processing on the training data set to obtain a training sample set; wherein the features of the training samples in the training sample set comprise strongly correlated features and/or weakly correlated features.

The strongly associated features refer to features related to a strongly associated medium shared by the account pair, and the weakly associated features refer to features related to a weakly associated medium shared by the account pair. The strong association feature may be a feature obtained by calculating a shared strong association medium for the account, the weak association feature may be a feature obtained by calculating a shared weak association medium for the account, and specific meanings of the strong association feature and the weak association feature may be referred to in the following description, which is not repeated herein.

Strongly associated media refers to media with an aggregated account number less than a first predetermined threshold, and weakly associated media refers to media with an aggregated account number greater than a second predetermined threshold. The first predetermined threshold is less than or equal to the second predetermined threshold, and specific values of the first predetermined threshold and the second predetermined threshold may be set according to actual situations, which is not described herein again.

The medium is used to characterize the association carrier of a plurality of accounts in a dimension. In one embodiment of the present invention, the media may be divided into a first media for characterizing the original associated carriers of the plurality of accounts in a dimension and a second media which is a new media derived based on the first media. For example, the second medium may be a new medium composed of partial fields in the first medium, and/or the second medium may also be a new medium composed of a plurality of first media, and/or the second medium may also be associated by the first media based on a preset association manner. The association may be a plurality of mappings, such as a mapping of up to one. For example, ip addresses may be mapped to specific geographical areas such as province, city, district, etc. to obtain a new medium, i.e., "geographical location information at the time of account login", so that in some cases, multiple ip addresses may be mapped to the same province and city area. In addition, other first media may also be mapped, for example, a mobile phone number may also be mapped to the home location, which is not described herein again. Thus, the strongly and/or weakly associated medium that is shared by the account pair may be the first medium or the second medium.

For example, the first medium may be, but is not limited to, a medium corresponding to complete field information of a dimension, such as an ip address, a mobile phone number, an app version, a mailbox, a bank card number, a network access type, a transaction time, a transaction party, a transaction amount, a transaction channel, a transaction type, and a device model. The second medium may be, but is not limited to, a new medium composed of partial fields in the first medium, such as, but not limited to, several digits in a mobile phone number (e.g., first three digits, first four digits, first five digits, and first six digits), several digits in an ip address, a postfix, several digits in a bank card number, and the like, or a new medium composed of a plurality of first media, such as, but not limited to, a combination of an ip address and a date, a combination of a transaction party and a transaction amount, and the like. Taking the first medium as an ip address as an example, the second medium may also be geographical location information obtained by mapping the ip address to a specific region range such as province, city, district, and the like.

In step S130, constructing a co-operator recognition model based on the training sample set; the same operator identification model is used for identifying whether the two accounts are controlled by the same operator.

The training can be performed in a supervised learning (i.e., supervised machine learning) manner to obtain the same-operator recognition model, or the same-operator recognition model can be constructed in a manner based on human experience. Therefore, the same-operator recognition model can be a machine learning model, and can also be a model constructed based on manual experience, such as a scoring card model. The structure and construction process of the same operator recognition model are not described in detail here.

The constructed same-operator identification model is used for identifying whether the two accounts are controlled by the same operator. In the case that the same-operator identification model is used to identify whether two accounts are controlled by the same operator, the characteristics of the information related to the two accounts can be extracted and used as prediction samples, and the characteristics of the prediction samples are input into the same-operator identification model to obtain prediction results of the same-operator identification model for the prediction samples, that is, prediction probabilities (that is, prediction scores mentioned below) obtained by predicting whether the two accounts are controlled by the same operator.

In the embodiment of the present invention, in step S110, a pre-constructed training data set may be received from the outside, or related information of a plurality of accounts may be collected, and a training data set is constructed based on the related information of the plurality of accounts.

The following is an exemplary description of the process of constructing the training data set.

Data acquisition

First, information about a plurality of accounts may be collected. The related information of the account mentioned here may be various information related to the account, such as one or more of information that may include, but is not limited to, natural person information, account information, operation information, social information, information that the account is a passive party of the event, and other information related to the account.

The natural person information may include information related to the natural person obtained when the account is registered and/or used. Natural persons here refer to natural persons registered when an account is registered and/or used. The natural person information may be basic information about the natural person stored by the server corresponding to the account, including information actively submitted when the operator registers and/or uses the account and obtained through a third-party data source query, such as basic information about the natural person corresponding to the account, which may include, but is not limited to, name, gender, age, ethnicity, occupation, household address, address of living, unit address, contact way, education level, income, marital status, and the like.

The account information may include information stored on the server side relating to the account.

The operational information may include information related to operational behavior generated by the account. Taking a financial account as an example, the operation information may include transaction behavior information that occurs when an operator uses the account, such as transaction record information including consumption, transfer, repayment, and the like, and specifically may include, but is not limited to, information such as transaction time, transaction opponents, amount, channel, type, pos machine, ip, device, geographic location, and the like; the operation information may also include non-transaction behavior information that occurs when the operator uses the account, such as non-transaction behavior information that may include login, password modification, adjustment, browsing, clicking, comment, and coupon, and specifically may include, but is not limited to, information such as occurrence time, type, acceptor id, ip, device, and geographic location of an event (i.e., a non-transaction behavior event).

The social information includes social relationship type information related to a natural person corresponding to the account. The social information mentioned here may include a collection of various social relationship type information of natural persons registered in an account used by an operator, not social relationship type information registered by the account itself. For example, social information may include, but is not limited to, cell phone contact lists, call records, frequent contacts, family relationships, alumni relationships, social relationships of various types of social software, and so on.

The information that the account is used as the event passive party can comprise information that the account is used as the event passive party, such as recommended registration, transferred account, added to a marketing list and the like.

Data cleaning and storing

After the information related to the plurality of accounts is collected, the collected raw data may be processed so that the processed data has a proper format or form. By way of example, after the information related to the plurality of accounts is collected, the data can be arranged into structured data so as to facilitate feature calculation subsequently. For example, it may be stored in multiple copies in a database in daily partitions (e.g., slice tables) or in full tables (e.g., linked lists).

Constructing a training data set

Before the training data set is constructed, the collected related information of the plurality of accounts can be analyzed to determine the media included in each account and the values of the media. The media referred to herein may include the first and second media mentioned above. As an example, a medium corresponding to a complete field of a certain dimension in the related information of the account may be used as the first medium. For example, the medium corresponding to the field of the complete mobile phone number in the related information of the account is the mobile phone number, and the value of the medium is the complete mobile phone number.

In order to enhance the discovery capability of the association characteristics between accounts and better discover the association mode in the account pair controlled by the same operator, the collected related information of the accounts can be specially processed to obtain more new media. The new medium referred to herein is the second medium referred to above.

As one example of the present invention, an original field used to characterize a first medium may be obfuscated to obtain a new medium. For example, the previous segment, the previous two segments and the previous three segments can be taken for the complete ipv4 address field to obtain three new media. For example, an ipv4 address is a medium, and for an ipv4 address whose value is "12.34.56.78", the medium value is "12", "12.34", and "12.34.56", by performing fuzzy processing on the ipv4 address, a plurality of new media such as a previous ipv4 address, previous ipv4 addresses, and previous pv4 addresses can be obtained. Similarly, the first 3 digits, the first 4 digits, the first 5 digits, the first 6 digits and the like can be taken for the complete mobile phone number field respectively to obtain a plurality of new media (i.e. new media such as the first 3 digits of the mobile phone number, the first 4 digits of the mobile phone, and the like). In addition, the fields of the bank card number, the address, the mailbox, the position information and the like can be fuzzified to obtain a new medium. It should be noted that, instead of replacing the original field, the blurring process adds a new type of media (i.e., the second media). The resulting new media can be used to characterize coarse-grained associations.

As another example of the present invention, the original fields used to characterize the first medium may also be field combined to obtain a new medium. For example, ip addresses may be combined with dates, channels may be combined with browser versions, and so forth. Such a combination may improve the pertinence of weakly associated media, for example, a plurality of weakly associated media may become one strongly associated medium by combination. It should be noted that this combination process does not replace the original field, but adds a new type of media. This operation will enhance the association degree of the weak association medium, so that a more specific association pattern between account pairs can be better discovered.

As another example of the present invention, the first medium may be further processed (e.g., mapped) based on a preset association manner to obtain a new medium. For example, the ip address may be mapped to a specific geographical area such as province, city, district, etc. to obtain a new medium, i.e., "geographical location information at the time of account login".

After the media included in the account and the values thereof are determined, a hyper-parameter C can be determined, if the number of different accounts aggregated on a certain dimension medium is less than C, the medium can be considered to belong to a strong association medium, and the corresponding calculation features are also called as strong association features. Otherwise, the medium can be considered to belong to a weakly associated medium, and the corresponding computed feature is also referred to as a weakly associated feature. According to experience, the value range of C is 50-1000 generally. As an example, the collected information may be summarized into an ATV (Account, Type, Value) table of < Account, media Type, media Value >, and then an ATVC (Account, Type, Value, Count) table of < Account, media Type, media Value, aggregated Account number > is calculated, and according to the set hyper-parameter C, the strongly associated media table vc (strong ATVC) and the weakly associated media table watvc (weak ATVC) may be split therefrom.

As an example, the SATVC table may be represented as follows:

Account	Type	Value	Cnt
				A00001	Mobile	13812341234	4
A00001	Ip	12.34.56.78	120
				A00002	Mobile	13812341234	4
…	…	…	…

the WATVC table may be expressed as follows:

Account	Type	Value	Cnt
				A00001	MerchantName	payment for general purpose	280000
A00001	emailpost	@abc123.com	2920
				A00002	MobilePrefix	13800	420398
…	…	…	…

In the invention, the account pair sharing the over-strong correlation medium can be found according to the strong correlation medium, and the training data is constructed based on the found related information of the account pair. For example, two accounts sharing a strong association medium may be selected according to the SATVC table to form an account pair, so as to obtain corresponding training data, as shown in the SATVC table shown above, account a00001 and account a00002 share the strong association medium of Mobile phone number (Mobile), so account a00001 and account a00002 may form an account pair, and the related information thereof is used as a piece of training data.

And/or, an account pair can be discovered according to the business feedback information, and training data can be constructed based on the discovered related information of the account pair. For example, according to a risk account list fed back by the service, information of any two risk accounts may be combined into one piece of training data, or information of one risk account and one risk-free account may be combined into one piece of training data. For another example, an account may be selected from multiple accounts belonging to a group according to group information fed back by the service, and training data may be constructed according to information of the selected account pair.

Training data marking

The indicia of the training data is used to indicate whether the account pair is in control of the same operator.

Under the condition that the account pair sharing the over-strong correlation medium is found according to the strong correlation medium, and the training data is constructed based on the relevant information of the found account pair, the training data corresponding to the found account pair can be marked according to the service feedback information. For example, whether an account is a risk account may be determined according to a risk list fed back by the service, and when both accounts in an account pair corresponding to the training data are risk accounts, the training data may be marked as being controlled by the same operator, that is, the training data is marked as a positive sample. On the contrary, under the condition that one account of the two account pairs corresponding to the training data is a risk account and the other account is a risk-free account, or both accounts are risk-free accounts, the training data can be marked as not being controlled by the same operator, that is, the training data is marked as a negative sample. For another example, the marking may also be performed according to a group list of business feedback, for example, for an account pair belonging to the same group, it may be marked as the same operator control, i.e. the training data is marked as a positive sample. Conversely, account pairs that do not belong to the same group may be marked as not being under the control of the same operator, i.e. marking the training data as negative examples.

Under the condition that the account pairs are found according to the service feedback information and the training data are constructed based on the found related information of the account pairs, the training data can be marked according to the types of the account pairs forming the training samples and/or whether the account pairs forming the training samples share the over-strong correlation medium.

Taking a risk scenario as an example, the training data corresponding to the account pair can be marked according to whether the account pair belongs to the same group; wherein the indicia is used to indicate that the account pair is in control of the same operator if the account pair belongs to the same partnership, and to indicate that the account pair is not in control of the same operator if the account pair does not belong to the same partnership. For example, assuming there are M groups, the member sets are G respectively₁，G₂，…，G_NFor belonging to the same group (e.g. G)₁) The account pair of (2) may be marked as a positive sample, where the positive sample means that the account pair corresponding to the training data is controlled by the same operator (i.e., the same group). For account pairs belonging to different groups, the account pairs can be marked as negative examples, and the negative examples mean that the account pairs corresponding to the training data are not controlled by the same operator.

In addition, the training data corresponding to the account pair can be marked according to whether the two accounts in the account pair are risk accounts or not; when the two accounts in the account pair are both risk accounts and share an excessively strong association medium, the account pair can be marked as being controlled by the same operator, that is, training data corresponding to the account pair is marked as a positive sample. Under the condition that two accounts in the account pair are risk accounts and the account pair does not share an excessively strong association medium, the training data corresponding to the account pair can be marked not to be controlled by the same operator, namely the training data corresponding to the account pair is marked as a negative sample. When only one account of the two accounts of the account pair is a risk account, the training data corresponding to the account pair may be marked as not being controlled by the same operator, that is, the training data corresponding to the account pair is marked as a negative sample.

In an embodiment of the present invention, in step S120, the feature extraction processing is performed on each piece of training data in the training data set to obtain a training sample set, and the feature calculation process includes:

feature calculation

For each piece of training data in the training data set, a strong correlation medium and/or a weak correlation medium shared by an account pair corresponding to the training data may be determined first.

In the case that the account pair corresponding to the training data shares the Strong association medium, the corresponding Strong association Feature (Strong connecting Feature) may be calculated according to the Strong association medium shared by the account pair. By way of example, the strong correlation characteristics may include, but are not limited to, one or more of a first number of correlations, a first maximum correlation, a first cumulative sum of correlations. The first relevance number is used to characterize the number of strongly relevant media that account pairs share the same dimension. The first maximum association degree is used for representing the maximum first association degree in the first association degrees corresponding to at least part of strong association media in the strong association media sharing the same dimension, the first association degree is related to the number of accounts sharing the strong association media, and the association relationship between the association degree and the number of accounts sharing the strong association media can be defined in various ways. For example, the first association degree may be a negative association with the number of accounts sharing the medium with strong association, where the negative association means that the larger the number of accounts sharing the medium with strong association, the smaller the first association degree is, and conversely, the smaller the number of accounts sharing the medium with strong association, the larger the first association degree is. The relationship between the two can be characterized in various ways, for example, it can be an inverse proportional relationship. In addition, the first association degree and the number of accounts using the strong association medium may also be a negative correlation in general, wherein a partial non-negative correlation (e.g., a positive correlation) may exist. Optionally, when calculating the first correlation degree, a time decay factor may be further taken into account, taking the shared strongly correlated medium as ip as an example, and when calculating the first correlation degree, a time interval in which a certain ip is shared may be further taken into account. And accumulating the first relevance and the sum of the first relevance corresponding to at least part of strongly-associated media which share the same dimension and are used for characterizing the account.

The following table shows the strong association between account a00001 and account a 00002.

Acc1	Acc2	Ip_cnt	Ip_max	Ip_sum	…
						A00001	A00002	3	0.009	0.012	…
…	…	…	…	…	…

The second line in the table means that accounts a00001 and a00002 share the medium Ip with too strong association, Ip _ cnt means that the number of associations (i.e. the first association) of accounts a00001 and a00002 in the Ip dimension is 3, i.e. the number of commonly used ips is 3, Ip _ max means that the strongest association (i.e. the first maximum association) of accounts a00001 and a00002 is 0.009. the association is inversely related to the number of accounts shared in an Ip, i.e. the more the number of accounts shared in an Ip is, the smaller the association (i.e. weaker) of accounts a00001 and a00002 in the Ip is, whereas the fewer the number of accounts shared in an Ip is, the greater the association (i.e. stronger) of accounts a00001 and a00002 in the Ip is.

Under the condition that the account pair corresponding to the training data shares the Weak association medium, the corresponding Weak association Feature (Weak connection Feature) can be calculated according to the Weak association medium shared by the account pair. By way of example, the weak correlation characteristics may include, but are not limited to, one or more of a second correlation number, a second maximum correlation, a second cumulative sum of correlations. The second correlation number is used to characterize the number of account pairs that share a weakly correlated medium of the same dimension. The second maximum association degree is used to represent a maximum second association degree in second association degrees corresponding to at least some weak association media in the weak association media sharing the same dimension, where the second association degree is related to the number of accounts using the weak association media, a specific association manner between the second association degree and the number of accounts using the strong association media may be determined according to an actual situation, such as negative association, and a specific calculation manner of the second association degree may refer to the above description of the first association degree, which is not described herein again. And accumulating the second relevance degrees and the sum of the second relevance degrees corresponding to at least part of weak relevance media which share the same dimension and are used for characterizing the account. For the second correlation number, the second maximum correlation degree, and the cumulative sum of the second correlation degrees, reference may be made to the description of the first correlation number, the first maximum correlation degree, and the cumulative sum of the first correlation degrees, which is not described herein again.

The features extracted from a piece of training data and the corresponding labels constitute a corresponding piece of modelable training sample.

In one embodiment of the present invention, optionally, the feature of one training sample may further include a comparison feature. I.e. the features extracted from the respective training data also comprise comparison features. The comparison features are used to characterize differences between the relevant information of the account pairs. The comparison feature refers to a feature other than the association feature (strong association feature/weak association feature) but helpful for identifying whether or not the two accounts are the same as the operator. For example, the comparison characteristics may include, but are not limited to, a difference in registration time, a difference in previous transaction time, an address similarity, a mailbox similarity, a mobile phone number similarity, a home location distance, a permanent location distance, whether a friend relationship exists, whether call records exist, whether mutual transfer has occurred, and the like.

In an embodiment of the present invention, after the same-operator recognition model is used to recognize whether the two accounts are controlled by the same operator, if the actual service feedback result is inconsistent with the recognition result, a new training sample may be constructed based on the relevant information of the two accounts, and the same-operator recognition model is updated by using the new training sample.

For example, after identifying whether two accounts are controlled by the same operator by using the same operator identification model, corresponding business decision operation can be executed for one or more of the two accounts according to the identification result, and the business feedback result is collected. And under the condition that the service feedback result shows that the prediction result is inaccurate, new training data can be formed based on the relevant information of the two accounts, the new training data is marked according to the service feedback, and the characteristics of the training data are calculated to obtain a new training sample. The co-operator recognition model may be incrementally trained based on the new training samples to further improve the accuracy of the model.

The invention can be applied to various application scenarios. For example, the invention can be used for identity with operators between accounts related to finance to identify accounts with business risks of cheating, bill swiping, credit card frying, card raising, cash register and the like. The method and the device can also be used for identifying the Internet account with the operator, for example, the method and the device can be used for identifying the Internet account with the operator when a mobile phone number is changed or a password is forgotten.

According to different application scenes, the types of account pairs corresponding to training samples in a training sample set used in the process of constructing the same operator recognition model are different. For example, when constructing the same-operator recognition model for recognizing whether two accounts related to finance are controlled by the same operator, two accounts in the account pair corresponding to the training sample may be accounts related to finance, such as a debit card account and a credit card account. When a same-operator recognition model for recognizing whether the accounts corresponding to different mobile phone numbers are controlled by the same operator is constructed, two accounts in the account pair corresponding to the training sample can be accounts corresponding to different mobile phone numbers, for example, the accounts can be accounts bound with different mobile phone numbers. In the case of constructing a recognition model of the same operator for recognizing the same operator when the password is forgotten, one account of the account pair corresponding to the training sample is an existing account, and the other account is a temporary account, where a session before login may be used as the temporary account, and the related information of the temporary account may include, but is not limited to, device information, a list of application software installed in the device, browsing behavior before login, and the like.

The invention can also be applied to other scenes with the same identification requirements of operators. For example, the invention may also be used to identify whether two different devices correspond to the same operator, or whether the same device corresponds to the same operator at different time periods. For another example, the present invention can also be used to identify whether two different mobile phone numbers and identification numbers (for example, an application using the identification number as an account) correspond to the same operator. Therefore, the two accounts in the account pair corresponding to the training sample can also be two different devices, two different time periods in which the same device is located, a mobile phone number, an identification number and the like.

Thus, the process of constructing the same operator recognition model is described in conjunction with FIG. 1.

Fig. 2 shows a flowchart of a same operator identification method according to an exemplary embodiment of the present invention. The method shown in fig. 2 may be implemented entirely by a computer program in software, and the method shown in fig. 2 may also be executed by a specifically configured computing device.

Referring to fig. 2, in step S210, prediction data is obtained. Wherein the prediction data comprises information about two accounts.

Step S220, performing feature extraction processing on the prediction data to obtain a prediction sample.

Reference herein to two accounts is to the two accounts that need to be predicted if they are under the control of the same operator. The two accounts are different in type according to different application scenarios. As described above, the present invention may be used for corporate operator identification between accounts involved in finance to identify accounts that present business risks of cheating, fraud, billing, credit stir-up, card maintenance, cash register, etc. The method and the device can also be used for identifying the Internet account with the operator, for example, the method and the device can be used for identifying the Internet account with the operator when a mobile phone number is changed or a password is forgotten. Thus, as an example, the two accounts may be financial related accounts; or one account is an existing account, and the other account is a temporary account; or the two accounts may be accounts corresponding to different mobile phone numbers. In addition, the two accounts can also be two accounts corresponding to different identification numbers, or two different devices, or two different time periods in which the same device is located, and the like.

The information related to the account may be various information related to the account, such as one or more of a variety of information that may include, but is not limited to, natural person information, account information, operational information, social information, information that the account is a passive party to the event, and other information related to the account. For the information about the natural person, the account information, the operation information, the social information, the information that the account is used as the event passive party, and other information related to the account, reference may be made to the above description, which is not repeated herein.

The features of the prediction samples comprise strongly correlated features and/or weakly correlated features, the strongly correlated features refer to features related to a strongly correlated medium shared by the two accounts, and the weakly correlated features refer to features related to a weakly correlated medium shared by the two accounts. For the concepts of medium, strong correlation medium and weak correlation medium, reference may be made to the above description, and details are not repeated here.

Based on the relevant information of the two accounts, the media and the media value included in each account can be determined. From this, a strongly and/or weakly associated medium that is common to both accounts may be determined. From the strongly associated medium that is shared by both accounts, the corresponding strongly associated features can be calculated. Corresponding weak association features may be computed based on the weak association medium that is shared by the two accounts.

The strong correlation characteristics may include, but are not limited to, one or more of a first number of correlations, a first maximum correlation, a cumulative sum of first correlations. The first association number is used for representing the number of strong association media with which two accounts share the same dimension, the first maximum association degree is used for representing the maximum first association degree in the first association degrees corresponding to at least part of the strong association media with which the two accounts share the same dimension, wherein the first association degree is related to (for example, negative) the number of accounts sharing the strong association media, and the first association degree is accumulated and is used for representing the sum of the first association degrees corresponding to at least part of the strong association media with which the two accounts share the same dimension.

The weak correlation characteristics may include, but are not limited to, one or more of a second correlation number, a second maximum correlation, a second cumulative sum of correlations. The second association number is used for representing the number of weak association mediums which are used for sharing the same dimension by the two accounts, the second maximum association degree is used for representing the maximum second association degree in the second association degrees corresponding to at least part of the weak association mediums which are used for sharing the same dimension by the two accounts, wherein the second association degree is related to (for example, negative) the number of the accounts which use the weak association mediums, and the second association degree is accumulated and is used for representing the sum of the second association degrees corresponding to at least part of the weak association mediums which are used for sharing the same dimension by the two accounts.

In one embodiment of the invention, the features of the prediction sample further comprise a comparison feature for characterizing a difference between the related information of the two accounts. For example, the comparison characteristics may include, but are not limited to, a difference in registration time, a difference in previous transaction time, an address similarity, a mailbox similarity, a mobile phone number similarity, a home location distance, a permanent location distance, whether a friend relationship exists, whether call records exist, whether mutual transfer has occurred, and the like.

For the specific implementation process of the feature extraction process, reference may be made to the above description, and details are not repeated here.

In step S230, the prediction sample is input to the co-operator identification model to obtain the identification result output by the co-operator identification model for predicting whether the two accounts are controlled by the same operator.

The operator identification model may be a model constructed by the method shown in fig. 1, which may be a machine learning model, or a model constructed based on human experience, such as a score card model.

Taking the same operator identification model as a machine learning model as an example, the output identification result is the prediction probability that two accounts belong to the same operator control, which is obtained by predicting the two accounts, namely the prediction score (the size is between 0 and 1). It is considered that there may be some deviation in the trained prediction scores from the output of the operator recognition model. According to the invention, the prediction value obtained by predicting the prediction model aiming at the prediction sample can be corrected posteriorly according to the actual service feedback information so as to correct the prediction deviation with the operator recognition model.

The following is an exemplary description of the process of performing a posteriori correction by means of interpolation calculation. It should be appreciated that the posterior correction may also be implemented in other ways, such as by way of a bucket calculation.

As an example, a first ordered set may be first constructed according to the magnitude order of the predicted scores predicted by the operator recognition model for a plurality of predicted samples, where the first ordered set includes a plurality of first elements, and each first element includes a predicted score of the operator recognition model for one predicted sample and an actual business feedback score of the predicted score. The actual service feedback score is used for representing whether the prediction sample is a real result controlled by the same operator, the actual service feedback score is 0 or 1, and the actual service feedback score can be obtained according to the actual service feedback information. In this embodiment, an actual service feedback score of 0 indicates that the prediction samples are not controlled by the same operator, and an actual service feedback score of 1 indicates that the prediction samples are controlled by the same operator. A revised score for the predicted score may be calculated based on actual business feedback scores in a predetermined number of first elements near a predicted score in the first ordered set. Wherein the modified score is related to the sum of the actual service feedback scores in a predetermined number of first elements (may be positive correlation, such as a positive proportional relationship), and related to the predetermined number (may be negative correlation, such as an inverse proportional relationship). The predicted score can thus be corrected based on the calculated correction score.

For example, the first sorted set X may be represented as { (X)₁,y₁),(x₂,y₂),…,(x_n,y_n)}. Wherein x is₁≤x₂≤…≤x_n。x_iIs a prediction score obtained by predicting a prediction sample i by the model, and is a real number between 0 and 1, y_iIs the actual traffic feedback score of the prediction sample i, which is 0 or 1. Then for some prediction score x in the first ordered set_iCorrected score p thereof_iCan be expressed as

width is the window size of the calculated a posteriori responsivity estimate, being a positive integer. The value of width can be set according to actual conditions, and generally, the value of width is about 100 better under the condition of sufficient samples. The following will exemplify width 4. Let X be { …, (0.18,1), (0.19,0), (0.2,0), (0.21,1), (0.22,0), (0.23,0), (0.24,1), (0.25,1), (0.26,0), (0.27,1), (0.28,1) }. If the points at which the corrected score needs to be calculated are 0.2 and 0.25, then p_0.2Since a total of 2 responses occurred from 0.18 to 0.22, the numerator is 2 and the denominator is 5. p is a radical of_0.25Since a total of 3 responses occurred from 0.23 to 0.27, the numerator is 3 and the denominator is 5. Therefore, the corrected score of any predicted score in the first ordered set can be calculated based on the mode, and therefore any predicted score in the first ordered set can be corrected a posteriori.

Further, after the plurality of predicted scores are corrected a posteriori, a second ordered set may be constructed according to the magnitude order of the predicted scores, where the second ordered set includes a plurality of second elements, and each second element includes a predicted score of a same operator recognition model for one predicted sample and a corrected score of the predicted score. Wherein the modified score of the predicted score may be calculated based on the above manner.

In the case where a second element that is the same as the prediction score is present in the second ordered set with respect to the prediction score obtained by predicting the same operator identification model with respect to the new prediction sample, the prediction score of the new prediction sample may be corrected based on the corresponding correction score. Under the condition that a second element which is the same as the predicted score does not exist in the second ordered set, the corrected score of the predicted score can be calculated by utilizing an interpolation mode on the basis of the second ordered set, and the new predicted score is corrected according to the calculated corrected score. Therefore, the posterior probability of all prediction samples (including future new samples) can be calculated by means of interpolation. Wherein the interpolation may include, but is not limited to, using linear interpolation, quadratic interpolation, cubic interpolation, spline interpolation, and the like.

For example, the second ordered set P may be represented as { (x)_i1,p_i1),(x_i2,p_i2),…,(x_iM,P_iM)}. Taking the linear difference as an example, the prediction score x obtained by predicting the same operator identification model aiming at a new prediction sample_iThe formula for calculating the modified score of (a) may be expressed as,

wherein p is_corRefers to the predictive score x_iCorrected score of, p_x-Refers to the ratio x in the second ordered set P_iThe maximum corrected score, p, of the small predicted scores_x+Is the ratio of x_iMinimum modified score, x, of large predicted scores₊Means with p_x+Corresponding prediction score, x_-Means with p_x-The corresponding prediction score.

Taking the example shown above for the elements included in the first sorted set X, a revised score of 0.22 may be calculated by:

the above exemplary description is given of a process of constructing the first ordered set and the second ordered set by using the prediction scores obtained by predicting the plurality of prediction samples by using the trained same-operator recognition model and the actual service feedback scores of the prediction samples, and performing posterior correction on the prediction scores output by the prediction model for the prediction samples based on the first ordered set and the second ordered set.

In addition, the first ordered set can also be constructed in the process of constructing (i.e. training) the same-operator identification model, or after the same-operator identification model is constructed, based on the prediction score obtained by predicting the training sample by the same-operator identification model and the mark score of the training sample (which can be obtained according to the label of the training sample, the mark score of the positive sample is 1, and the mark score of the negative sample is 0). At this point, the first element in the first ordered set includes the predicted score of the training sample and the labeled score of the training sample. And the modified scores for each of the predicted scores in the first sorted set may also be calculated and the second sorted set constructed in a similar manner as described above. Therefore, after the same operator identification model is subsequently used for predicting the prediction sample to obtain the prediction score, the prediction score of the prediction sample can be corrected according to the first ordered set and/or the second ordered set. For example, for the predicted scores that appear in the second sorted set, the predicted scores may be directly modified with corresponding modified scores in the second sorted set. For the predicted scores that do not appear in the second sorted set, the modified scores for the predicted scores may be calculated by interpolation. For a specific calculation process, see the above description, which is not repeated herein.

At the same time as outputting the recognition result (e.g., the prediction score), interpretation information for interpreting the recognition result may also be output. As an example, the interpretation information may comprise first interpretation information relating to strongly and/or weakly associated features of the predicted sample, and/or may further comprise second interpretation information relating to other features than the strongly and weakly associated features, such as comparison features.

As an example, the scoring interpretations (i.e., interpretation information) output while the scores are output may be classified into interpretation information related to strong associations, weak associations and comparison features according to features, or may be classified into interpretation information of categories such as natural person information, account information, transaction records, non-dynamic account records, social relations and the like according to data sources, so as to help business analysts quickly locate case clues. Assuming that the model scores 0.97 for A and B, the following clues may be output:

according to the recognition result obtained by predicting the same operator recognition model aiming at the prediction sample, corresponding business processing can be executed based on a preset decision logic. For example, business impacts may be generated by manual or automatic decision-making based on recognition results from a human recognition model. The manual decision-making mode mainly depends on business experience, data analysis technology and case analysis.

And the decision logic may also be adjusted and/or optimized by way of pattern and/or data analysis. For example, the decision logic is set to intercept the high-risk client, check the short message of the medium-risk user, and release the low-risk user. But the investigation of cases shows that the code printing platform is used by the black products to complete short message verification actions, and the secondary verification mode is not enough to effectively control risks, so that face recognition can be introduced as a supplement of the verification mode. For another example, a situation of a group is severe in a certain period of time, and the analysis finds that the case rate of the dangerous user in the original model is remarkably increased, so that the decision result can meet the requirement of controlling the risk by adjusting the threshold.

As an example, in the case where the prediction score predicted by the co-operator identification model for the prediction sample is greater than the third predetermined threshold and less than the fourth predetermined threshold, the verification may be performed for at least one of the two accounts. In the case that the predicted score is larger than a fourth predetermined threshold, the operation authority of at least one of the two accounts can be downgraded. Wherein, in case the prediction score is smaller than a third predetermined threshold, no processing may be done.

Further, when the fact that the accounts in the account pair controlled by the same operator complete verification through the third-party platform is found, or the fact that the accounts exceeding the preset number complete verification through the third-party platform, the verification mode can be changed; and/or the third predetermined threshold and/or the fourth predetermined threshold may be adjusted lower in the event that an account pair with a predictive score greater than the first predetermined threshold and less than the second predetermined threshold is found to have an increased probability of belonging to the same operator.

In summary, the present invention can be used in a wind control scenario to identify risky accounts, for example, by identifying whether two accounts are controlled by the same operator, it can be identified whether the two accounts have business risks such as cheating, bill swiping, credit stir-frying, card maintenance, cash register, etc. In addition, the method and the device can also be used in non-wind control scenes, such as identification by operators when the mobile phone number is changed or the password is forgotten. The identification with the operator when the mobile phone number is changed means that whether the account corresponding to the new mobile phone number and the account corresponding to the old mobile phone number are controlled by the same natural person or not can be identified by using the identification with the operator model constructed by the invention, and if the accounts are identified to be the same natural person, account combination or account association can be carried out. The identification with the operator when the password is forgotten means that the user can be used as a temporary user id, namely a temporary account before logging in, then the characteristics (strong association characteristics, weak association characteristics and comparison characteristics) between the temporary account and the account forgotten the password can be determined according to the temporary account information such as device information, a mobile phone app list, browsing behaviors and the like and the information of the account forgotten the password, whether the temporary account and the account forgotten the password are the same natural person or not is judged, and if the temporary account and the account forgotten the password are identified to be the same natural person, the operations of password resetting, user authentication and the like can be carried out on the account forgotten the password, so that the user forgotten.

The method for constructing the same-operator identification model can also be realized as a device for constructing the same-operator identification model. Fig. 3 illustrates a block diagram of an apparatus for constructing a co-operator recognition model according to an exemplary embodiment of the present invention. Wherein the functional elements of the means for constructing the model for identifying the human operator may be realized in hardware, software, or a combination of hardware and software which embody the principles of the present invention. It will be appreciated by those skilled in the art that the functional units described in fig. 3 may be combined or divided into sub-units to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional units described herein.

Functional units that the construction apparatus of the operator recognition model can have and operations that each functional unit can perform are briefly described below, and details related thereto may be referred to the above description, and are not repeated here.

Referring to fig. 3, the operator-recognition-model building apparatus 300 includes a data acquisition unit 310, a feature extraction unit 320, and a model building unit 330.

The data obtaining unit 310 is configured to obtain a training data set; the training data set comprises at least one piece of training data, each training sample corresponds to one account pair, and the mark of each piece of training data is used for indicating whether the corresponding account pair is controlled by the same operator, wherein the same operator refers to the same natural person or the same group.

The feature extraction unit 320 is configured to perform feature extraction processing on the training data set to obtain a training sample set; the characteristics of the training samples in the training sample set comprise strong association characteristics and/or weak association characteristics, the strong association characteristics refer to characteristics related to a strong association medium shared by account pairs, the weak association characteristics refer to characteristics related to a weak association medium shared by the account pairs, the strong association medium refers to a medium with the aggregated account number smaller than a first preset threshold, the weak association medium refers to a medium with the aggregated account number larger than a second preset threshold, the medium is used for representing association carriers of a plurality of accounts in a certain dimension, and marks of the training samples are used for indicating whether the account pairs are controlled by the same operator; the same operator refers to the same natural person or the same party. For the medium, the strong correlation medium, the weak correlation medium, the strong correlation characteristic, and the weak correlation characteristic, reference may be made to the above description, and details are not repeated here.

Optionally, the features of the training samples in the training sample set may further include comparison features for characterizing a difference between the related information of the account pair. The comparison feature refers to a feature other than the association feature (strong association feature/weak association feature) but helpful for identifying whether or not the two accounts are the same as the operator. For example, the comparison characteristics may include, but are not limited to, a difference in registration time, a difference in previous transaction time, an address similarity, a mailbox similarity, a mobile phone number similarity, a home location distance, a permanent location distance, whether a friend relationship exists, whether call records exist, whether mutual transfer has occurred, and the like.

The data acquisition unit 310 may receive the training data set from the outside. Or the data obtaining unit 310 may also collect the related information of multiple accounts, and construct the training data set based on the related information of multiple accounts. The related information of the account mentioned here may be various information related to the account, such as one or more of information that may include, but is not limited to, natural person information, account information, operation information, social information, information that the account is a passive party of the event, and other information related to the account. For the information about the natural person, the account information, the operation information, the social information, the information that the account is used as the event passive party, and other information related to the account, reference may be made to the above description, which is not repeated herein.

As an example, the data obtaining unit 310 may find an account pair sharing an excessively strong correlation medium according to the strong correlation medium, construct training data based on the found related information of the account pair, and mark training data corresponding to the found account pair according to the service feedback information; and/or the data obtaining unit 310 may also discover an account pair according to the service feedback information, construct training data based on the discovered information about the account pair, and mark training data corresponding to the discovered account pair. For the construction process of the training data set, see the above description, and will not be described herein again.

Optionally, the data obtaining unit 310 may mark training data corresponding to the account pair according to whether the account pair belongs to the same group; the mark is used for indicating that the account pair is controlled by the same operator when the account pair belongs to the same group, and indicating that the account pair is not controlled by the same operator when the account pair does not belong to the same group; and/or the data obtaining unit 310 may also mark the training data corresponding to the account pair according to whether both the two accounts in the account pair are risk accounts; the mark is used for indicating that the account pair is controlled by the same operator when both the two accounts in the account pair are risk accounts, and the mark is used for indicating that the account pair is not controlled by the same operator when only one of the two accounts in the account pair is a risk account.

The model building unit 330 is configured to build a co-operator recognition model according to the training sample set, where the co-operator recognition model is used to recognize whether two accounts are controlled by the same operator.

As an example, the model building unit 330 may be trained using supervised learning to obtain the same operator recognition model; or the model building unit 330 may also build the co-operator recognition model based on human experience.

Optionally, the model building unit 330 may be further configured to, after recognizing whether the two accounts are controlled by the same operator using the same operator recognition model, build a new training sample based on the relevant information of the two accounts if the actual service feedback result is inconsistent with the recognition result, and update the same operator recognition model using the new training sample.

It should be understood that, according to an exemplary embodiment of the present invention, a specific implementation of the apparatus 300 for constructing a co-operator recognition model may be implemented with reference to the related specific implementation described in conjunction with fig. 1, and will not be described in detail herein.

The method for identifying the same operator can also be realized as a device for identifying the same operator. Fig. 4 illustrates a block diagram of an apparatus for constructing a co-operator recognition model according to an exemplary embodiment of the present invention. Wherein the functional elements with the human recognition means can be realized in hardware, software or a combination of hardware and software implementing the principles of the present invention. It will be appreciated by those skilled in the art that the functional units described in fig. 4 may be combined or divided into sub-units to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional units described herein.

In the following, functional units that the operator identification device may have and operations that each functional unit may perform are briefly described, and for the details related thereto, reference may be made to the above description, which is not repeated herein.

Referring to fig. 4, the same-operator identifying apparatus 400 includes an obtaining unit 410, an extracting unit 420, and a processing unit 430.

The obtaining unit 410 is configured to obtain prediction data; wherein the prediction data comprises information about two accounts.

The extracting unit 420 is configured to perform feature extraction processing on the prediction data to obtain a prediction sample.

The related information of the account mentioned here may be various information related to the account, such as one or more of information that may include, but is not limited to, natural person information, account information, operation information, social information, information that the account is a passive party of the event, and other information related to the account. For the information about the natural person, the account information, the operation information, the social information, the information that the account is used as the event passive party, and other information related to the account, reference may be made to the above description, which is not repeated herein.

The characteristics of the prediction sample comprise strong correlation characteristics and/or weak correlation characteristics, the strong correlation characteristics refer to characteristics related to a strong correlation medium shared by two accounts, the weak correlation characteristics refer to characteristics related to a weak correlation medium shared by the two accounts, the strong correlation medium refers to a medium with the number of aggregated accounts being smaller than a first preset threshold value, the weak correlation medium refers to a medium with the number of aggregated accounts being larger than a second preset threshold value, and the medium is used for representing correlation carriers of a plurality of accounts in a certain dimension. For the medium, the strong correlation medium, the weak correlation medium, the strong correlation characteristic, and the weak correlation characteristic, reference may be made to the above description, and details are not repeated here.

Optionally, the features of the prediction sample may further comprise a comparison feature for characterizing a difference between the related information of the two accounts. The comparison feature refers to a feature other than the association feature (strong association feature/weak association feature) but helpful for identifying whether or not the two accounts are the same as the operator. For example, the comparison characteristics may include, but are not limited to, a difference in registration time, a difference in previous transaction time, an address similarity, a mailbox similarity, a mobile phone number similarity, a home location distance, a permanent location distance, whether a friend relationship exists, whether call records exist, whether mutual transfer has occurred, and the like.

The processing unit 420 is configured to input the prediction sample into the same-operator recognition model to obtain a recognition result output by the same-operator recognition model for predicting whether two accounts are controlled by the same operator, where the same operator refers to the same natural person or the same group.

As an example, the recognition result is a predicted score, and the same-operator recognition apparatus 400 may further include a first constructing unit, a first calculating unit, and a first correcting unit, which are not shown in the drawing.

The first construction unit is used for constructing a first ordered set according to the magnitude sequence of the prediction scores obtained by predicting the same operator identification model for a plurality of prediction samples, wherein the first ordered set comprises a plurality of first elements, and each first element comprises the prediction score of the same operator identification model for one prediction sample and the actual service feedback score of the prediction sample. The first calculating unit is used for calculating a modified score of the predicted score according to the actual service feedback scores in a preset number of first elements near a certain predicted score in the first ordered set, wherein the modified score is related to the sum of the actual service feedback scores in the preset number of first elements (positive correlation can be achieved, and positive proportional relation can be achieved), and related to the preset number (negative correlation can be achieved, and inverse proportional relation can be achieved). The first correcting unit is used for correcting the predicted score according to the calculated corrected score.

Further, the same-operator identifying apparatus 400 may further include a second constructing unit, a second calculating unit, and a second correcting unit, which are not shown in the drawings.

The second construction unit is used for constructing a second ordered set according to the magnitude order of the prediction scores, wherein the second ordered set comprises a plurality of second elements, and each second element comprises the prediction score of the operator recognition model for one prediction sample and a modified score of the prediction score. The second calculation unit is used for calculating a predicted score obtained by predicting the same operator identification model aiming at a new prediction sample, and calculating a corrected score of the predicted score by using an interpolation mode based on the second ordered set. And the second correcting unit is used for correcting the new prediction score according to the calculated correction score.

As an example, the same-operator identifying apparatus 400 may further include an interpretation unit not shown in the drawings. The interpretation unit is used for outputting interpretation information used for interpreting the recognition result, wherein the interpretation information comprises: first interpretation information relating to strongly and/or weakly correlated features of the prediction samples; and/or second interpretation information related to the category of the related information.

As an example, the same-operator identifying apparatus 400 may further include a verification unit and/or a degradation processing unit, which are not shown in the drawings.

The verification unit is used for verifying at least one of the two accounts when the prediction score is larger than a first preset threshold and smaller than a second preset threshold. And the degradation processing unit is used for degrading the operation authority of at least one of the two accounts under the condition that the predicted score is larger than a second preset threshold value.

The verification unit changes a verification mode under the condition that the verification of the accounts in the account pair controlled by the same operator is completed through a third-party platform or the verification of the accounts with the number exceeding a preset number is completed through the third-party platform; and/or the degradation processing unit reduces the third threshold value and/or the fourth threshold value in the case that the probability that the account pair with the predicted score larger than the third threshold value and smaller than the fourth threshold value belongs to the same operator is found to be increased.

It should be understood that the specific implementation of the same-operator identifying apparatus 400 according to the exemplary embodiment of the present invention may be implemented with reference to the related specific implementation described in conjunction with fig. 2, and will not be described in detail herein.

The apparatus shown in fig. 3, 4 may be configured as software, hardware, firmware, or any combination thereof, respectively, that performs certain functions. These means may correspond, for example, to an application-specific integrated circuit, to pure software code, or to a combination of software and hardware elements or modules. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).

The method for constructing the co-operator recognition model, the co-operator recognition method, and the apparatus according to the exemplary embodiments of the present invention are described above with reference to fig. 1 to 4. It should be understood that the above-described method may be implemented by a program recorded on a computer-readable medium, for example, according to an exemplary embodiment of the present invention, there may be provided a computer-readable storage medium storing instructions on which a computer program for executing the method of constructing the co-operator recognition model shown in fig. 1 or the co-operator recognition method shown in fig. 2 is recorded.

The computer program in the computer readable medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, and the like, and it should be noted that, in addition to the steps shown in fig. 1 and fig. 2, the computer program may also be used to perform additional steps other than the above steps or perform more specific processing when performing the above steps when performing the method for constructing the co-operator identification model or the co-operator identification method, and the contents of the additional steps and the further processing have already been described with reference to fig. 1 and fig. 2, and will not be described again to avoid repetition.

It should be noted that the same-operator recognition model building device and the same-operator recognition device according to the exemplary embodiment of the present invention may completely depend on the execution of the computer program to realize the corresponding functions, that is, each device corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to realize the corresponding functions.

Alternatively, each of the devices shown in fig. 3 and 4 may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, exemplary embodiments of the present invention may also be implemented as a computing device including a storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform a method of building a co-operator recognition model or a co-operator recognition method.

In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions described above.

The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some of the operations described in the method of constructing the co-operator recognition model or the co-operator recognition method according to the exemplary embodiments of the present invention may be implemented by software, some of the operations may be implemented by hardware, and further, the operations may be implemented by a combination of hardware and software.

The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.

Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.

Operations involved in a method of constructing a model for operator recognition or a method of operator recognition according to exemplary embodiments of the present invention may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.

For example, as described above, a computing device for constructing a co-operator recognition model or recognizing a co-operator according to an exemplary embodiment of the present invention may include a storage component and a processor, wherein the storage component stores therein a set of computer-executable instructions that, when executed by the processor, perform the above-mentioned method of constructing a co-operator recognition model or the above-mentioned method of co-operator recognition.

While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims

1. A construction method of a same-operator recognition model comprises the following steps:

collecting related information of a plurality of accounts;

analyzing the collected related information of the plurality of accounts to determine media and values thereof included by each account, wherein the media are used for representing associated carriers of the plurality of accounts in a certain dimension, the media are divided into a first media and a second media, the first media are used for representing original associated carriers of the plurality of accounts in the certain dimension, and the second media are new media derived based on the first media;

constructing a training data set; the training data set comprises at least one piece of training data, each piece of training data corresponds to one account pair, the mark of each piece of training data is used for indicating whether the corresponding account pair is controlled by the same operator, the same operator refers to the same natural person or the same group, and the step of constructing the training data set comprises the following steps: selecting two accounts sharing the same strong correlation medium to form an account pair, constructing training data based on the related information of the constructed account pair, and marking the training data corresponding to the constructed account pair according to service feedback information, wherein the strong correlation medium is a medium with the aggregated account number smaller than a first preset threshold value;

performing feature extraction processing on the training data set to obtain a training sample set; the features of the training samples in the training sample set comprise strongly-associated features, and the strongly-associated features refer to features related to strongly-associated media shared by the account pairs;

constructing a same operator recognition model based on the training sample set; the same operator identification model is used for identifying whether the two accounts are controlled by the same operator.

2. The method of claim 1, wherein the step of constructing a training data set further comprises:

and discovering an account pair according to the service feedback information, constructing training data based on the relevant information of the discovered account pair, and marking the training data corresponding to the discovered account pair.

3. The method of claim 2, wherein the step of tagging the corresponding training data for the discovered account comprises:

marking training data corresponding to the account pair according to whether the account pair belongs to the same group; wherein the indicia is used to indicate that the account pair is in control of the same operator if the account pair belongs to the same group, and to indicate that the account pair is not in control of the same operator if the account pair does not belong to the same group; and/or

Marking training data corresponding to the account pair according to whether two accounts in the account pair are risk accounts or not; the mark is used for indicating that the account pair is controlled by the same operator when both the two accounts in the account pair are risk accounts, and the mark is used for indicating that the account pair is not controlled by the same operator when only one of the two accounts in the account pair is a risk account.

4. The method of claim 1, wherein the information related to the account comprises at least one of:

natural person information including information related to a natural person acquired when an account is registered and/or used;

the account information comprises information which is stored by the server and related to the account;

operation information including information related to operation behavior generated by the account;

social information including social relationship type information related to a natural person corresponding to the account;

the account is used as the information of the event passive party;

other information related to the account.

5. The method of claim 1, wherein the step of constructing a co-operator recognition model comprises:

training in a supervised learning mode to obtain a same operator recognition model; or

And constructing a same operator recognition model based on a manual experience mode.

6. The method of claim 1, wherein,

the second medium is a new medium composed of a partial field in the first medium;

alternatively, the second medium is a new medium composed of a plurality of the first media;

or the second medium is obtained by associating the first medium based on a preset association mode.

7. The method of claim 1, wherein,

the strongly associated features include at least one of: a first correlation number, a first maximum correlation degree and a first cumulative sum of correlation degrees; the first association number is used for representing the number of the account pairs sharing strong association media with the same dimension, the first maximum association degree is used for representing the maximum first association degree in the first association degrees corresponding to at least part of the account pairs sharing the strong association media with the same dimension, wherein the first association degree is related to the number of the account pairs sharing the strong association media, and the first association degree is accumulated and used for representing the sum of the first association degrees corresponding to the account pairs sharing the at least part of the strong association media with the same dimension;

and/or

The features of the training samples in the training sample set further include weak association features, the weak association features refer to features related to a weak association medium shared by the account pairs, the weak association medium refers to a medium with an aggregated account number greater than a second predetermined threshold, and the weak association features include at least one of: the second correlation number, the second maximum correlation degree and the second correlation degree are accumulated; the second association number is used for representing the number of the account pairs sharing the weak association medium with the same dimension, the second maximum association degree is used for representing the maximum second association degree in the second association degrees corresponding to at least part of the weak association media in the account pairs sharing the same dimension, wherein the second association degree is related to the number of the accounts sharing the weak association medium, and the second association degree is accumulated and used for representing the sum of the second association degrees corresponding to at least part of the weak association media sharing the same dimension.

8. The method of claim 1, wherein the features of the training samples in the set of training samples further comprise:

a comparison feature to characterize a difference between the relevant information of the account pair.

9. The method of claim 1, further comprising:

and after the same operator recognition model is used for recognizing whether the two accounts are controlled by the same operator, under the condition that the actual service feedback result is inconsistent with the recognition result, constructing a new training sample based on the related information of the two accounts, and updating the same operator recognition model by using the new training sample.

10. The method of claim 1, wherein,

two accounts of the pair of accounts are financial-related accounts;

or one account of the two accounts in the account pair is an existing account, and the other account is a temporary account;

or, two accounts in the account pair are accounts corresponding to different mobile phone numbers.

11. A co-operator identification method, comprising:

obtaining prediction data; wherein the prediction data comprises information about two accounts;

determining media and media values included by each account based on the related information of the two accounts, wherein the media are used for representing associated carriers of a plurality of accounts in a certain dimension, the media are divided into a first media and a second media, the first media are used for representing original associated carriers of the plurality of accounts in the certain dimension, and the second media are new media derived based on the first media;

performing feature extraction processing on the prediction data to obtain a prediction sample; wherein the features of the prediction sample comprise strongly correlated features, the strongly correlated features refer to features related to a strongly correlated medium shared by the two accounts, and the strongly correlated medium refers to a medium with the number of aggregated accounts being smaller than a first predetermined threshold;

inputting the prediction sample into a same-operator recognition model to obtain a recognition result output by the same-operator recognition model and used for predicting whether the two accounts are controlled by the same operator, wherein the same operator refers to the same natural person or the same group, and the same-operator recognition model is obtained by using the construction method of any one of claims 1 to 10.

12. The method of claim 11, wherein,

13. The method of claim 11, wherein,

the strongly associated features include at least one of: a first correlation number, a first maximum correlation degree and a first cumulative sum of correlation degrees; the first association number is used for representing the number of strong association media which are shared by the two accounts and have the same dimension, the first maximum association degree is used for representing the maximum first association degree in the first association degrees corresponding to at least part of the strong association media which are shared by the two accounts and have the same dimension, wherein the first association degree is related to the number of the accounts which share the strong association media, and the first association degree is accumulated and is used for representing the sum of the first association degrees corresponding to at least part of the strong association media which are shared by the two accounts and have the same dimension;

and/or

The features of the prediction sample further include a weak association feature, the weak association feature being a feature related to a weak association medium shared by the account pair, the weak association medium being a medium with an aggregated account number greater than a second predetermined threshold, the weak association feature including at least one of: the second correlation number, the second maximum correlation degree and the second correlation degree are accumulated; the second association degree is used for representing the number of weak association media with the same dimension shared by the two accounts, the second maximum association degree is used for representing the maximum second association degree in the second association degrees corresponding to at least part of the weak association media with the same dimension shared by the two accounts, wherein the second association degree is related to the number of the accounts sharing the weak association media, and the second association degree is accumulated and is used for representing the sum of the second association degrees corresponding to at least part of the weak association media with the same dimension shared by the two accounts.

14. The method of claim 11, wherein the predicting characteristics of the sample further comprises:

a comparison feature for characterizing a difference between the relevant information of the two accounts.

15. The method of claim 11, wherein the information related to the two accounts comprises at least one of:

the account is used as the information of the event passive party;

other information related to the account.

16. The method of claim 11, wherein,

the two accounts are accounts relating to finance;

or one account of the two accounts is an existing account, and the other account is a temporary account;

or, the two accounts are accounts corresponding to different mobile phone numbers.

17. The method of claim 11, wherein the recognition result is a prediction score, the method further comprising:

constructing a first ordered set according to the magnitude sequence of the prediction scores obtained by predicting the same operator identification model aiming at a plurality of prediction samples, wherein the first ordered set comprises a plurality of first elements, and each first element comprises the prediction score of the same operator identification model aiming at one prediction sample and the actual service feedback score of the prediction sample;

calculating a corrected score of the predicted score according to actual service feedback scores of a predetermined number of first elements near a predicted score in the first ordered set, wherein the corrected score is related to the sum of the actual service feedback scores of the predetermined number of first elements and is related to the predetermined number;

and correcting the predicted score according to the calculated corrected score.

18. The method of claim 17, further comprising:

constructing a second ordered set according to the magnitude order of the predicted scores, wherein the second ordered set comprises a plurality of second elements, and each second element comprises the predicted score of the same operator recognition model for one predicted sample and a modified score of the predicted score;

calculating a corrected value of the predicted value by an interpolation mode based on the second ordered set aiming at the predicted value obtained by predicting the same operator identification model aiming at the new predicted sample;

and correcting the new prediction score according to the calculated correction score.

19. The method of claim 11, further comprising: outputting interpretation information for interpreting the recognition result;

wherein the interpretation information includes: first interpretation information relating to strongly and/or weakly correlated features of the prediction samples; and/or second interpretation information relating to features other than strongly and/or weakly associated features of the prediction samples.

20. The method of claim 11, wherein the recognition result is a prediction score, the method further comprising:

verifying for at least one of the two accounts if the predicted score is greater than a third predetermined threshold and less than a fourth predetermined threshold;

and/or the presence of a gas in the gas,

and when the predicted score is larger than a fourth preset threshold value, performing degradation processing on the operation authority of at least one of the two accounts.

21. The method of claim 20, further comprising:

when the fact that the accounts in the account pair controlled by the same operator complete verification through the third-party platform is found, or the fact that the accounts exceeding the preset number complete verification through the third-party platform, the verification mode is changed;

and/or the presence of a gas in the gas,

and in the case that the probability that the account pair with the predicted score larger than a third predetermined threshold and smaller than a fourth predetermined threshold belongs to the same operator is found to be increased, reducing the third predetermined threshold and/or the fourth predetermined threshold.

22. An apparatus for constructing a co-operator recognition model, comprising:

the data acquisition unit is used for acquiring relevant information of a plurality of accounts and analyzing the acquired relevant information of the plurality of accounts to determine media and values thereof included by each account, the media are used for representing relevant carriers of the plurality of accounts in a certain dimension, the media are divided into a first media and a second media, the first media are used for representing original relevant carriers of the plurality of accounts in the certain dimension, the second media are new media derived based on the first media and are based on the relevant information of the plurality of accounts, and a training data set is constructed; the training data set comprises at least one piece of training data, each training sample corresponds to one account pair, the mark of each piece of training data is used for indicating whether the corresponding account pair is controlled by the same operator, the same operator refers to the same natural person or the same group, and the step of constructing the training data set comprises the following steps: discovering account pairs sharing the same strong correlation medium according to the strong correlation medium, constructing training data based on the found related information of the account pairs, and marking the training data corresponding to the found account pairs according to service feedback information, wherein the strong correlation medium is a medium with the aggregated account number smaller than a first preset threshold value;

the characteristic extraction unit is used for extracting the characteristics of the training data set to obtain a training sample set; the features of the training samples in the training sample set comprise strongly-associated features, and the strongly-associated features refer to features related to strongly-associated media shared by the account pairs;

the model construction unit is used for constructing a same operator recognition model according to the training sample set; the same operator identification model is used for identifying whether the two accounts are controlled by the same operator.

23. The apparatus of claim 22, wherein,

the data acquisition unit also discovers account pairs according to the service feedback information, constructs training data based on the discovered related information of the account pairs, and marks the training data corresponding to the discovered account pairs.

24. The apparatus of claim 23, wherein,

the data acquisition unit marks training data corresponding to the account pair according to whether the account pair belongs to the same group; wherein the indicia is used to indicate that the account pair is in control of the same operator if the account pair belongs to the same group, and to indicate that the account pair is not in control of the same operator if the account pair does not belong to the same group; and/or

The data acquisition unit marks training data corresponding to the account pair according to whether two accounts in the account pair are risk accounts or not; the mark is used for indicating that the account pair is controlled by the same operator when both the two accounts in the account pair are risk accounts, and the mark is used for indicating that the account pair is not controlled by the same operator when only one of the two accounts in the account pair is a risk account.

25. The apparatus of claim 22, wherein the information related to the account comprises at least one of:

the account is used as the information of the event passive party;

other information related to the account.

26. The apparatus of claim 22, wherein,

the model building unit is trained in a supervised learning mode to obtain a same operator recognition model; or

The model building unit builds the same-operator recognition model based on a manual experience mode.

27. The apparatus of claim 22, wherein,

28. The apparatus of claim 22, wherein,

and/or

29. The apparatus of claim 22, wherein the characteristics of the training samples in the set of training samples further comprise:

30. The apparatus of claim 22, further comprising:

the model construction unit is further used for constructing a new training sample based on the relevant information of the two accounts and updating the same-operator recognition model by using the new training sample under the condition that the actual service feedback result is inconsistent with the recognition result after the same-operator recognition model is used for recognizing whether the two accounts are controlled by the same operator.

31. The apparatus of claim 22, wherein,

the two accounts are accounts relating to finance;

32. A co-operator identification device comprising:

an obtaining unit configured to obtain prediction data; wherein the prediction data comprises information about two accounts; determining media and media values included by each account based on the related information of the two accounts, wherein the media are used for representing associated carriers of a plurality of accounts in a certain dimension, the media are divided into a first media and a second media, the first media are used for representing original associated carriers of the plurality of accounts in the certain dimension, and the second media are new media derived based on the first media;

the extraction unit is used for performing feature extraction processing on the prediction data to obtain a prediction sample; wherein the features of the prediction sample comprise strongly correlated features, the strongly correlated features refer to features related to a strongly correlated medium shared by the two accounts, and the strongly correlated medium refers to a medium with the number of aggregated accounts being smaller than a first predetermined threshold;

a processing unit, configured to input the prediction sample into a co-operator recognition model to obtain a recognition result output by the co-operator recognition model and used for predicting whether the two accounts are controlled by the same operator, where the same operator refers to the same natural person or the same group, and the co-operator recognition model is obtained by using the construction apparatus according to any one of claims 22 to 31.

33. The apparatus of claim 32, wherein,

34. The apparatus of claim 32, wherein,

and/or

35. The apparatus of claim 32, wherein the characteristics of the prediction samples further comprise:

36. The apparatus of claim 32, wherein the information related to the two accounts comprises at least one of:

the account is used as the information of the event passive party;

other information related to the account.

37. The apparatus of claim 32, wherein,

the two accounts are accounts relating to finance;

38. The apparatus of claim 32, wherein the recognition result is a predictive score, the apparatus further comprising:

a first constructing unit, configured to construct a first ordered set according to a magnitude order of prediction scores obtained by the same-operator identification model through prediction for multiple prediction samples, where the first ordered set includes multiple first elements, and each first element includes a prediction score of the same-operator identification model for one prediction sample and an actual service feedback score of the prediction sample;

a first calculating unit, configured to calculate a modified score of the predicted score according to an actual service feedback score in a predetermined number of first elements near a predicted score in the first ordered set, where the modified score is related to a sum of actual service feedback scores in the predetermined number of first elements and is related to the predetermined number;

and the first correcting unit is used for correcting the prediction score according to the calculated correction score.

39. The apparatus of claim 38, further comprising:

a second constructing unit, configured to construct a second ordered set according to the magnitude order of the prediction scores, where the second ordered set includes a plurality of second elements, and each of the second elements includes a prediction score of the co-operator recognition model for one prediction sample and a modified score of the prediction score;

the second calculation unit is used for calculating a predicted score obtained by predicting the same operator identification model aiming at a new prediction sample, and calculating a corrected score of the predicted score by utilizing an interpolation mode based on the second ordered set;

and the second correcting unit is used for correcting the new prediction score according to the calculated correction score.

40. The apparatus of claim 32, further comprising:

an interpretation unit configured to output interpretation information for interpreting the recognition result;

41. The apparatus of claim 32, further comprising:

the verification unit is used for verifying at least one account of the two accounts when the prediction score output by the co-operator identification model is larger than a third preset threshold and smaller than a fourth preset threshold;

and/or the presence of a gas in the gas,

and the degradation processing unit is used for performing degradation processing on the operation authority of at least one of the two accounts under the condition that the predicted score is larger than a fourth preset threshold value.

42. The apparatus of claim 41, wherein,

the verification unit changes a verification mode under the condition that the verification of the accounts in the account pair controlled by the same operator is completed through a third-party platform or the verification of the accounts with the number exceeding a preset number is completed through the third-party platform;

and/or the presence of a gas in the gas,

the degradation processing unit reduces the third predetermined threshold and/or the fourth predetermined threshold in the case that the probability that an account pair with a predicted score greater than the third predetermined threshold and less than the fourth predetermined threshold belongs to the same operator is found to be increased.

43. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 21.

44. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 21.