CN115481687A - Account identification method and device, electronic equipment and storage medium - Google Patents

Account identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115481687A
CN115481687A CN202211137741.5A CN202211137741A CN115481687A CN 115481687 A CN115481687 A CN 115481687A CN 202211137741 A CN202211137741 A CN 202211137741A CN 115481687 A CN115481687 A CN 115481687A
Authority
CN
China
Prior art keywords
account
similar
candidate
target
target account
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211137741.5A
Other languages
Chinese (zh)
Inventor
张戎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202211137741.5A priority Critical patent/CN115481687A/en
Publication of CN115481687A publication Critical patent/CN115481687A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to an account identification method, an account identification device, an electronic device and a storage medium, wherein the account identification method comprises the following steps: aiming at the characteristic information of each dimension of a target account, searching candidate accounts which are similar to the target account in the characteristic information of the dimension from a plurality of accounts to be identified; determining comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account; and screening out suspicious account numbers similar to the target account number from all the candidate account numbers according to the comprehensive similarity between each candidate account number and the target account number, wherein the suspicious account numbers are different from users of the target account number. According to the account identification method, the account identification device, the electronic equipment and the storage medium, the problem of how to identify the simulated account can be solved, the whole account identification method is controllable in terms of calculation amount, and the calculation consumption is low.

Description

Account identification method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an account identification method and apparatus, an electronic device, and a storage medium.
Background
In the information age, a user can perform content interaction through an account applied by the user on an internet platform. On an internet platform, there may be some users who impersonate other accounts in order to get more attentives or browsing volumes, so that the account numbers of the impersonator and the impersonator are similar.
Disclosure of Invention
The invention provides an account identification method, an account identification device, electronic equipment and a storage medium, which can solve the problem of how to identify a simulated account and ensure the recall rate and accuracy of similar suspicious account screening. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, an account identification method is provided, including: aiming at the characteristic information of each dimension of a target account, searching candidate accounts which are similar to the target account in the characteristic information of the dimension from a plurality of accounts to be identified; determining comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account; and screening out suspicious account numbers similar to the target account number from all the candidate account numbers according to the comprehensive similarity between each candidate account number and the target account number, wherein the suspicious account numbers are different from users of the target account number.
Optionally, the characteristic information of the account includes at least one of the following items: the name of the account, the head portrait of the account, the background image of the personal homepage of the account, the content issued by the account and the content of the account live broadcast.
Optionally, the searching, for the feature information of each dimension of the target account, a candidate account that is similar to the target account in the feature information of the dimension from among a plurality of accounts to be identified includes: and searching candidate account numbers similar to the target account number in the feature information of the dimension from the plurality of account numbers to be identified by using each search method in at least one search method aiming at the feature information of each dimension of the target account number.
Optionally, the searching, for feature information of each dimension of the target account, of a candidate account that is similar to the target account in the feature information of the dimension from the plurality of accounts to be identified using each of at least one search method includes: aiming at the characteristic information of each dimension of the target account, searching accounts with vectors similar to the vectors converted from the characteristic information of the dimension of the target account from the plurality of accounts to be identified by using a vector-based similar search method; and/or searching an account with the same dimension characteristic information as that of the target account from the plurality of accounts to be identified according to the characteristic information of each dimension of the target account.
Optionally, the determining, based on feature information of each candidate account and each dimension of the target account, a comprehensive similarity between each candidate account and the target account includes: determining the similarity of the target account and the characteristic information of each candidate account about each dimension, and determining the comprehensive similarity between each candidate account and the target account based on the determined similarity of the characteristic information about each dimension.
Optionally, the screening, according to the comprehensive similarity between each candidate account and the target account, suspicious accounts similar to the target account from all candidate accounts includes: predicting whether each candidate account is similar to the target account by using each prediction method in at least one prediction method according to the comprehensive similarity between each candidate account and the target account; screening all candidate account numbers based on a prediction result obtained by using the at least one prediction method to obtain at least one similar account number; and determining suspicious account numbers similar to the target account number from the at least one similar account number based on the user information of the target account number and the user information of the at least one similar account number.
Optionally, the predicting whether each candidate account is similar to the target account by using each of at least one prediction method according to the comprehensive similarity between each candidate account and the target account includes: inputting the comprehensive similarity between each candidate account and the target account into a rule engine to obtain a first prediction result which is output by the rule engine and is related to whether each candidate account is similar to the target account; and/or inputting the comprehensive similarity between each candidate account and the target account into a pre-trained similar prediction model to obtain a second prediction result which is output by the similar prediction model and is related to whether each candidate account is similar to the target account or not.
Optionally, the determining, based on the user information of the target account and the user information of the at least one similar account, a suspicious account similar to the target account from the at least one similar account includes: for each similar account, determining the similar account as a suspicious account similar to the target account under the condition that the binding information of the target account is different from the binding information of the similar account and/or under the condition that the position information of the target account is different from the position information of the similar account.
Optionally, the account identification method further includes: and monitoring each suspicious account number similar to the target account number.
According to a second aspect of the embodiments of the present disclosure, there is provided an account identification apparatus, including: a candidate account searching unit configured to: aiming at the characteristic information of each dimension of a target account, searching candidate accounts which are similar to the target account in the characteristic information of the dimension from a plurality of accounts to be identified; a similarity determination unit configured to: determining comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account; a suspicious account screening unit configured to: and screening out suspicious account numbers similar to the target account number from all the candidate account numbers according to the comprehensive similarity between each candidate account number and the target account number, wherein the suspicious account numbers are different from users of the target account number.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform an account identification method according to the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform an account identification method according to the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement an account identification method according to the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
according to the account identification method, the account identification device, the electronic equipment and the storage medium, the recall rate and accuracy of screening of similar suspicious accounts are guaranteed through the candidate account search aiming at the characteristic information of each dimension and the process of screening the similar suspicious accounts from the candidate accounts, the problem of how to identify the simulated accounts can be solved, and the method, the device, the electronic equipment and the storage medium are integrally controllable in terms of operand and have low calculation consumption.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flowchart illustrating an account identification method according to an exemplary embodiment.
Fig. 2 is an overall framework diagram illustrating an account identification method according to an exemplary embodiment.
Fig. 3 is a block diagram illustrating an account identification apparatus according to an example embodiment.
Fig. 4 is a block diagram of an electronic device 400 according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in other sequences than those illustrated or described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; and (3) comprises A and B. For another example, "at least one of step one and step two is performed", which means the following three parallel cases: (1) executing the step one; (2) executing the step two; and (3) executing the step one and the step two.
In the information age, a user can perform content interaction through an account applied by the user on an internet platform. Similar account numbers may exist on the internet platform, and personalized setting information and published contents between the similar account numbers may be similar or identical, for example, there may exist some users, in order to obtain more attendees or browsing volumes, setting names, head portraits, published contents and the like of their own account numbers according to related information of other account numbers with more attendees or browsing volumes, so as to imitate other account numbers, so that account numbers of an imitator and an imitated person are similar.
The account identification method, the account identification device, the electronic equipment and the storage medium ensure the recall rate and accuracy of similar suspicious account screening by searching the candidate accounts aiming at the characteristic information of each dimension and screening similar suspicious accounts from the candidate accounts, can solve the problem of how to identify simulated accounts, and is integrally controllable in terms of operand and low in calculation consumption.
An account identification method, apparatus, electronic device, and storage medium according to the present disclosure will be described in detail below with reference to fig. 1 to 4.
Fig. 1 is a flowchart illustrating an account identification method according to an exemplary embodiment. Referring to fig. 1, in step 101, for feature information of each dimension of a target account, a candidate account that is similar to the target account in the feature information of the dimension may be searched from a plurality of accounts to be identified. That is, an account having characteristic information similar to that of the dimension of the target account is searched for as a candidate account from among a plurality of accounts to be identified.
It is noted that at least one (or more) of the exemplary embodiments of the present disclosure refers to one or more (one or more).
According to an exemplary embodiment of the present disclosure, the account information (including but not limited to account characteristic information, account user information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) involved are both information and data that are authorized or sufficiently authorized by the account user.
According to an exemplary embodiment of the present disclosure, the target account may be an account on a certain internet platform, and the plurality of accounts to be identified may be accounts on the internet platform other than the target account.
According to an example embodiment of the present disclosure, the characteristic information of the account may include, but is not limited to, multi-dimensional User Generated Content (UGC) data, for example, the characteristic information of the account may include, but is not limited to, at least one of: the name of the account, the head portrait of the account, the background image of the personal homepage of the account, the content issued by the account and the content of the account live broadcast. It should be noted that the name of the account can be, but is not limited to, a nickname of the account; the content published by the account can be multimedia resource content published by the account, which can include, but is not limited to, text, images, audio, video, and various forms of published content formed by combining the foregoing. Exemplary embodiments of the present disclosure can make the searched candidate account numbers and the finally screened similar suspicious account numbers more accurate using the multi-dimensional data listed as above.
According to an exemplary embodiment of the present disclosure, one or more search methods may be adopted to search for candidate account numbers, and based on this, first, for feature information of each dimension of the target account number, a candidate account number similar to the target account number in the feature information of the dimension may be searched for from the plurality of account numbers to be identified using each of at least one search method.
Here, the search method may include, but is not limited to, a vector-based similarity search method, a similarity search method based on the same feature information, wherein the vector-based similarity search method may be: the feature information of the target account and the feature information of the multiple accounts to be identified are converted into a vector form, and then the account with higher vector similarity between the corresponding vector and the target account is searched. The similar search method based on the identity of the feature information may be a similar search method based on the identity of at least one feature information.
Specifically, a vector-based similarity search method may be used to search, from the multiple account numbers to be identified, for feature information of each dimension of the target account number, an account number whose vector into which the feature information of the dimension is converted is similar to the vector into which the feature information of the dimension of the target account number is converted, and/or an account number whose feature information of the dimension is the same as the feature information of the dimension of the target account number is searched from the multiple account numbers to be identified, for feature information of each dimension of the target account number. The account whose feature information of the dimension is the same as that of the target account may be an account whose feature information of the dimension is the same as that of the target account, for example, but not limited to, at least one of the following: the account number with the same name as the target account number, the account number with the same avatar as the target account number, the account number with the same background image of the personal homepage of the target account number, the account number with at least one piece of same content issued by the target account number, the account number with the same live broadcast as at least one piece of target account number, and the like. Exemplary embodiments of the present disclosure enable multi-recall of candidate accounts by performing a search using the above-described search method. It should be noted that the two search methods listed here are merely exemplary, the disclosure is not limited thereto, and other search methods and schemes using different usage orders of the various search methods are also protected.
The characteristic information of each dimension of the target account can be merged, the accounts searched by using various searching methods in at least one searching method can be merged, and the merged accounts are all candidate accounts. For example, for the feature information of the first dimension of the target account, two search methods are used, 5 candidate accounts are searched by the first search method, 6 candidate accounts are searched by the second search method, and for the feature information of the second dimension of the target account, two search methods are used, 8 candidate accounts are searched by the first search method, 2 candidate accounts are searched by the second search method, and at this time, all the candidate accounts are 21 searched accounts.
The exemplary embodiment of the present disclosure can ensure the recall rate of the candidate account by searching using at least one search method for each kind of feature information of the target account. It should be understood that each kind of feature information for the target account is merely used as an example, and actually, the present disclosure may also be used for searching more flexibly by combining a plurality of kinds of feature information for the target account for different scenarios.
According to an exemplary embodiment of the present disclosure, an internet platform has a large number of accounts, and in some scenarios, candidate account search may need to be performed for a plurality of target account numbers, and based on this, the search may be performed in a set manner, specifically, a set of target account numbers may be obtained first, where the set of target account numbers includes at least one target account number, then step 101 may be performed for each target account number in the set of target account numbers, a candidate account number of the target account number is searched, and finally, a candidate account number set may be determined based on a search result of each target account number in the set of target account numbers. For example, the correspondence between the candidate account number set and the target account number set may be expressed as the following formula (1):
{(user_id,user_id′):user_id≈user_id′,user_id∈A,user_id′∈C} (1)
the method comprises the steps that A is a target account set, C is a candidate account set, user _ id is a target account in the target account set, and user _ id' is a candidate account in the candidate account set.
As an example, the correspondence between the candidate account number set and the target account number set may be represented as the following table 1:
table 1 table of correspondence between candidate account set and target account set
Target Account in A Candidate account number in C
XXXX1 YYYY1
XXXX2 YYYY2
XXXX3 YYYY3
XXXX4 YYYY4
In the table, a is a target account set, C is a candidate account set, target account XXXX1 corresponds to candidate account YYYY1 (YYYY 1 is searched based on XXXX 1), target account XXXX2 corresponds to candidate account YYYY2 (YYYY 2 is searched based on XXXX 2), target account XXXX3 corresponds to candidate account yyyyyy 3 (YYYY 3 is searched based on XXXX 3), and target account XXXX4 corresponds to candidate account yyyyy 4 (yyyyyy 4 is searched based on XXXX 4).
In step 102, a comprehensive similarity between each candidate account and the target account may be determined based on feature information of each dimension of each candidate account and the target account.
According to an exemplary embodiment of the disclosure, the similarity between each candidate account and the target account can be embodied by the similarity between the feature information of each dimension of each candidate account and the feature information of each dimension of the target account. Specifically, the similarity of the target account number and the characteristic information of each candidate account number about each dimension can be determined, and the comprehensive similarity between each candidate account number and the target account number is determined based on the determined similarity of the characteristic information about each dimension. For example, the determined similarity in each dimension may be taken as the integrated similarity. According to the exemplary embodiment of the disclosure, the comprehensive similarity between the target account and each candidate account is determined, and the subsequent screening step is performed based on the comprehensive similarity, so that the subsequently screened suspicious account similar to the target account can be more accurate.
According to an exemplary embodiment of the present disclosure, for each candidate account, the similarity of each kind of feature information (name of the account, avatar of the account, background image of the personal homepage of the account, content issued by the account, content live broadcast by the account) listed in the above embodiments may be determined between the candidate account and the target account.
For example, the way of determining the name similarity of each candidate account and the target account may be, but is not limited to, any of the following ways: a Jaccard coefficient (Jaccard similarity) based approach, an edit distance based approach, a longest common subsequence based approach, and the like; the method for determining the similarity of the head portrait of each candidate account and the target account and the similarity of the background image of the personal homepage may be, but is not limited to, any of the following methods: a Cosine distance (Cosine distance) based mode, an L2 distance based mode, a hash distance (hash distance) based mode, and the like; the method for determining the content similarity of the publication of each candidate account and the target account may be, but is not limited to, any of the following methods: a Cosine distance (Cosine distance) based mode of a vector, an L2 distance based mode, and the like; the manner of determining the content similarity of each candidate account with the live broadcast of the target account may be, but is not limited to, any of the following manners: based on a manner of determining whether the same or similar facial data exists, based on a manner of determining whether each candidate account is close to the current geographic location of the target account, etc.
In step 103, suspicious account numbers similar to the target account number may be screened from all candidate account numbers according to the comprehensive similarity between each candidate account number and the target account number, where the suspicious account numbers are different from users of the target account number.
According to an exemplary embodiment of the disclosure, the suspicious account and the user of the target account similar to the target account are different, in other words, the user of the suspicious account similar to the target account may be a imitator, and correspondingly, the user of the target account is a simulated person.
According to an exemplary embodiment of the present disclosure, first, it may be predicted whether each candidate account is similar to the target account by using each of at least one prediction method according to the comprehensive similarity between each candidate account and the target account. Here, the prediction method may include, but is not limited to, a prediction method based on a rule engine, a prediction method based on a similar prediction model trained in advance.
Specifically, the comprehensive similarity between each candidate account and the target account may be input to a rule engine to obtain a first prediction result output by the rule engine as to whether each candidate account is similar to the target account, and/or the comprehensive similarity between each candidate account and the target account may be input to a pre-trained similarity prediction model to obtain a second prediction result output by the similarity prediction model as to whether each candidate account is similar to the target account.
Here, a threshold value and a determination condition regarding the similarity of each type of feature information may be preset in the rule engine, and a first prediction result regarding that the candidate account number is similar to the target account number may be output if the determination condition is satisfied, and a first prediction result regarding that the candidate account number is not similar to the target account number may be output if the determination condition is not satisfied. For example, the rule engine is preset with a name similarity threshold of 0.9, an avatar similarity threshold of 0.9, a background image similarity threshold of 0.9, a distributed content similarity threshold of 0.8, and a live content similarity threshold of 0.8, and the determination conditions are as follows: the name similarity is greater than 0.9, the head portrait similarity is greater than 0.9, the background image similarity of the personal homepage is greater than 0.9, the published content similarity is greater than 0.8, and the live content similarity is greater than 0.8. And under the condition that the judgment condition is met, outputting a '1' representation candidate account number which is similar to the target account number, or outputting a '0' representation candidate account number which is not similar to the target account number.
Here, the pre-trained similarity prediction model may be, but is not limited to, a pre-trained classifier (including, but not limited to, random forest, gradient boosting decision tree GBDT, XGBoost, etc.) for predicting whether each candidate account is similar to the target account, and the pre-trained classifier may be obtained by training a training sample, where the training sample may be at least one account pair labeled with a label for determining whether the candidate account is similar, and a similarity degree between each account pair with respect to each feature information. The exemplary embodiment of the present disclosure enables accurate screening of similar accounts by performing prediction using the above-described prediction method. It should be noted that the two prediction methods listed here are merely exemplary, and the present disclosure is not limited thereto, and also protects schemes using other prediction methods and different usage orders of the various prediction methods.
Then, all candidate account numbers may be screened based on the prediction results obtained using the at least one prediction method, resulting in at least one screened similar account number. Specifically, a result of whether the target account is similar to each candidate account may be obtained by combining the prediction results obtained by using at least one prediction method, and at least one similar account may be obtained by screening out the candidate accounts determined to be similar, for example, the corresponding candidate account may be screened out when there is a prediction result obtained by one prediction method that is similar, and for example, the corresponding candidate account may be screened out only when the prediction results obtained by all prediction methods are similar. Exemplary embodiments of the present disclosure can make the screened candidate account numbers more accurate by using at least one prediction method.
Finally, a suspicious account number similar to the target account number can be determined from the at least one similar account number based on the user information of the target account number and the user information of the at least one similar account number.
Specifically, for each similar account, the similar account may be determined as a suspicious account similar to the target account if the binding information of the target account and the binding information of the similar account are different, and/or if the location information of the target account and the location information of the similar account are different. Here, the binding information may be, but is not limited to, a bound mobile phone number, a bound mailbox address; the location information may be, but is not limited to, geographical location information.
According to an exemplary embodiment of the present disclosure, an internet platform has a large number of accounts, and in some scenarios, similar account numbers may need to be screened for multiple target account numbers, and based on this, the screening may be implemented in a collective manner, specifically, on the basis of a candidate account number set that has been searched in the above embodiments, first, step 102 and step 103 may be performed for each candidate account number in the candidate account number set, to determine whether the candidate account number is similar to the corresponding target account number, and then, based on a result of the determination, a similar account number set may be obtained. For example, the correspondence between the set of similar accounts and the set of target accounts may be represented by the following formula (2):
B={user:user≈user′anduser′∈A} (2)
the method comprises the steps of collecting a target account set, collecting a similar account set, collecting a user' as the target account in the target account set, and collecting a use as the account similar to the target account in the similar account set. It should be noted that the set of similar accounts is a subset of the set of candidate accounts. As an example, the screening process for a set of similar accounts can be represented as table 2 below:
table 2 screening table of similar account sets
Figure BDA0003852077650000091
In the table, a is a target account set, C is a candidate account set, target account XXXX1 corresponds to candidate account YYYY1, target account XXXX2 corresponds to candidate account YYYY2, target account XXXX3 corresponds to candidate account YYYY3, and target account XXXX4 corresponds to candidate account yyyyy 4; "0" and "1" are first prediction results output by the rule engine, wherein "0" indicates that the candidate account is not similar to the target account, and "1" indicates that the candidate account is similar to the target account; "0" and "1" are second prediction results output by the similar prediction model, where "0" indicates that the candidate account is not similar to the target account, and "1" indicates that the candidate account is similar to the target account.
According to an exemplary embodiment of the present disclosure, after step 103, each suspect account number similar to the target account number may also be monitored. The monitoring method may include, but is not limited to, deleting similar content of a suspicious account similar to a target account, controlling a content browsing volume of the suspicious account similar to the target account, prohibiting use of the suspicious account similar to the target account, and the like. The exemplary embodiment of the present disclosure can effectively control the condition of the simulated account by monitoring each suspicious account similar to the target account.
Fig. 2 is an overall framework diagram illustrating an account identification method according to an exemplary embodiment. The account identification method of the present disclosure is generally described below with reference to fig. 2.
Referring to fig. 2, the overall framework of the account identification method includes a recall layer, a culling layer, and a handling layer.
First, a target account set can be obtained, wherein the target account set comprises at least one target account, and the at least one target account is an account on a certain internet platform.
Then, for a recall layer, for each target account in a target account set, a candidate account similar to the target account in feature information of each dimension of the target account is searched for from accounts other than the target account on the internet platform according to the feature information of each dimension of the target account, so as to obtain a candidate account set.
Specifically, for the feature information of each dimension of the target account, a candidate account which is similar to the target account in the feature information of the dimension may be searched for from accounts other than the target account on the internet platform by using each of at least one search method. The characteristic information of the account number can comprise: the name of the account, the head portrait of the account, the background image of the personal homepage of the account, the content issued by the account, and the content of the account live broadcast. Based on this, each of the following may be performed using each of at least one search method from among accounts other than the target account on the internet platform: the method comprises the steps of searching an account with a name similar to that of a target account, searching an account with an avatar similar to that of the target account, searching an account with a background image of a personal homepage similar to that of the personal homepage of the target account, searching an account with published content similar to that published by the target account, searching an account with live content similar to that of the target account, and merging searched results to obtain at least one candidate account corresponding to the target account.
Next, for the fine selection layer, determining the comprehensive similarity between each candidate account and the corresponding target account based on the characteristic information of each candidate account in the candidate account set and the characteristic information of the corresponding target account, screening the candidate accounts according to the comprehensive similarity between each candidate account and the corresponding target account to obtain similar accounts, and determining whether each similar account is suspicious based on the user information of each similar account and the user information of the corresponding target account, thereby obtaining a suspicious account set.
Specifically, for each candidate account, determining name similarity, avatar similarity, background image similarity of a personal homepage, published content similarity and live content similarity between the candidate account and a corresponding target account, inputting the similarity serving as a characteristic into a rule engine and a similar prediction model respectively to obtain a first prediction result output by the rule engine and a second prediction result output by the similar prediction model respectively, screening the candidate accounts to obtain similar accounts, and determining whether each similar account is suspicious based on user information of each similar account and user information of the corresponding target account.
Finally, for the disposal layer, each of the suspicious accounts that are similar to the target accounts in the set of target accounts may be monitored (i.e., each account in the set of suspicious accounts is monitored). The monitoring method may include, but is not limited to, deleting similar content of a suspicious account similar to a target account, controlling a content browsing volume of the suspicious account similar to the target account, prohibiting use of the suspicious account similar to the target account, and the like.
Fig. 3 is a block diagram illustrating an account identification apparatus according to an example embodiment. Referring to fig. 3, the account recognition apparatus 300 includes a candidate account search unit 301, a similarity determination unit 302, and a suspicious account screening unit 303.
The candidate account searching unit 301 may search, for feature information of each dimension of a target account, a candidate account that is similar to the target account in the feature information of the dimension from among a plurality of accounts to be identified.
The similarity determination unit 302 may determine a comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account.
The suspicious account screening unit 303 may screen suspicious accounts similar to the target account from all candidate accounts according to the comprehensive similarity between each candidate account and the target account, where users of the suspicious accounts and users of the target accounts are different.
As an example, the characteristic information of the account number may include at least one of: the name of the account, the head portrait of the account, the background image of the personal homepage of the account, the content issued by the account, and the content of the account live broadcast.
As an example, the candidate account search unit 301 may be configured to: and searching candidate accounts which are similar to the target account in the feature information of the dimension from the plurality of accounts to be identified by using each search method in at least one search method aiming at the feature information of each dimension of the target account.
As an example, the candidate account search unit 301 may be configured to: aiming at the characteristic information of each dimension of the target account, searching accounts with vectors similar to the vectors converted from the characteristic information of the dimension of the target account from the plurality of accounts to be identified by using a vector-based similar search method; and/or searching an account with the same dimension characteristic information as that of the target account from the plurality of accounts to be identified according to the characteristic information of each dimension of the target account.
As an example, the similarity determination unit 302 may be configured to: determining the similarity of the target account and the characteristic information of each candidate account about each dimension, and determining the comprehensive similarity between each candidate account and the target account based on the determined similarity of the characteristic information about each dimension.
As an example, the suspicious account screening unit 303 may be configured to: predicting whether each candidate account is similar to the target account by using each prediction method in at least one prediction method according to the comprehensive similarity between each candidate account and the target account; screening all candidate account numbers based on a prediction result obtained by using the at least one prediction method to obtain at least one similar account number; and determining suspicious account numbers similar to the target account number from the at least one similar account number based on the user information of the target account number and the user information of the at least one similar account number.
As an example, the suspicious account screening unit 303 may be configured to: inputting the comprehensive similarity between each candidate account and the target account into a rule engine to obtain a first prediction result which is output by the rule engine and is related to whether each candidate account is similar to the target account; and/or inputting the comprehensive similarity between each candidate account and the target account into a pre-trained similar prediction model to obtain a second prediction result which is output by the similar prediction model and is related to whether each candidate account is similar to the target account or not.
As an example, the suspicious account screening unit 303 may be configured to: for each similar account, determining the similar account as a suspicious account similar to the target account under the condition that the binding information of the target account is different from the binding information of the similar account and/or under the condition that the position information of the target account is different from the position information of the similar account.
As an example, the account number identification apparatus may further include: a monitoring unit (not shown). The monitoring unit may be configured to monitor each suspect account number that is similar to the target account number.
Fig. 4 is a block diagram of an electronic device 400 according to an example embodiment.
Referring to fig. 4, the electronic device 400 includes at least one memory 401 and at least one processor 402, the at least one memory 401 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 402, perform an account identification method in accordance with the present disclosure.
By way of example, the electronic device 400 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions. Here, the electronic device 400 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 400 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the electronic device 400, the processor 402 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor 402 may execute instructions or code stored in the memory 401, wherein the memory 401 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory 401 may be integrated with the processor 402, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 401 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 401 and the processor 402 may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the processor 402 can read files stored in the memory.
In addition, the electronic device 400 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of electronic device 400 may be connected to each other via a bus and/or a network.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform an account identification method according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD + R, DVD-RW, DVD + RW, BD-ROM, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drive (HDD), solid State Disk (SSD), card storage (such as a multimedia card, secure Digital (SD) card or extreme digital (XD) card), a tape, a floppy disk, an optical data storage device, a hard disk, a solid state disk, and any other device configured to store and provide computer programs and any associated data, data files and data structures in a non-transitory manner to a processor or a computer such that the computer programs and any associated data, data files and data structures are provided to the processor or computer such that the computer programs can be executed or the computer. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer device, such as a client, a host, a proxy appliance, a server, or the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across networked computer systems such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, in which instructions are executable by a processor of a computer device to perform an account identification method according to the present disclosure.
According to the account identification method, the account identification device, the electronic equipment and the storage medium, the recall rate and accuracy of screening of suspicious accounts are guaranteed through the processes of searching of candidate accounts based on the characteristic information and screening of similar suspicious accounts from the candidate accounts, the problem of how to identify simulated accounts can be solved, and the method, the device, the electronic equipment and the storage medium are integrally controllable in terms of operand and have low calculation consumption.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (12)

1. An account identification method is characterized by comprising the following steps:
aiming at the characteristic information of each dimension of a target account, searching candidate accounts which are similar to the target account in the characteristic information of the dimension from a plurality of accounts to be identified;
determining comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account;
and screening out suspicious account numbers similar to the target account number from all the candidate account numbers according to the comprehensive similarity between each candidate account number and the target account number, wherein the suspicious account numbers are different from users of the target account number.
2. The account identification method according to claim 1, wherein the characteristic information of the account includes at least one of:
the name of the account, the head portrait of the account, the background image of the personal homepage of the account, the content issued by the account, and the content of the account live broadcast.
3. The account identification method according to claim 1, wherein searching, for the feature information of each dimension of the target account, a candidate account that is similar to the target account in the feature information of the dimension from among a plurality of accounts to be identified, comprises:
and searching candidate account numbers similar to the target account number in the feature information of the dimension from the plurality of account numbers to be identified by using each search method in at least one search method aiming at the feature information of each dimension of the target account number.
4. The account identification method according to claim 3, wherein the searching, for the feature information of each dimension of the target account, for a candidate account that is similar to the target account in the feature information of the dimension from the plurality of accounts to be identified using each of at least one search method comprises:
aiming at the characteristic information of each dimension of the target account, searching accounts with vectors similar to the vectors converted from the characteristic information of the dimension of the target account from the plurality of accounts to be identified by using a vector-based similar search method;
and/or searching an account with the same dimension characteristic information as that of the target account from the plurality of accounts to be identified according to the characteristic information of each dimension of the target account.
5. The account identification method according to claim 1, wherein the determining of the comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account comprises:
determining the similarity of the target account and the characteristic information of each candidate account about each dimension, and determining the comprehensive similarity between each candidate account and the target account based on the determined similarity of the characteristic information about each dimension.
6. The account identification method according to claim 5, wherein the screening out suspicious accounts similar to the target account from all the candidate accounts according to the comprehensive similarity between each candidate account and the target account comprises:
predicting whether each candidate account is similar to the target account by using each prediction method in at least one prediction method according to the comprehensive similarity between each candidate account and the target account;
screening all candidate account numbers based on a prediction result obtained by using the at least one prediction method to obtain at least one similar account number;
and determining a suspicious account number similar to the target account number from the at least one similar account number based on the user information of the target account number and the user information of the at least one similar account number.
7. The account identification method of claim 6, wherein the predicting whether each candidate account is similar to the target account using each of at least one prediction method according to the comprehensive similarity between each candidate account and the target account comprises:
inputting the comprehensive similarity between each candidate account and the target account into a rule engine to obtain a first prediction result which is output by the rule engine and is related to whether each candidate account is similar to the target account;
and/or inputting the comprehensive similarity between each candidate account and the target account into a pre-trained similar prediction model to obtain a second prediction result which is output by the similar prediction model and is related to whether each candidate account is similar to the target account or not.
8. The account identification method of claim 6, wherein the determining a suspicious account similar to the target account from the at least one similar account based on the user information of the target account and the user information of the at least one similar account comprises:
for each similar account, determining the similar account as a suspicious account similar to the target account under the condition that the binding information of the target account is different from the binding information of the similar account and/or under the condition that the position information of the target account is different from the position information of the similar account.
9. The account identification method according to claim 1, further comprising:
and monitoring each suspicious account number similar to the target account number.
10. An account identification device, comprising:
a candidate account searching unit configured to: aiming at the characteristic information of each dimension of a target account, searching candidate accounts which are similar to the target account in the characteristic information of the dimension from a plurality of accounts to be identified;
a similarity determination unit configured to: determining comprehensive similarity between each candidate account and the target account based on feature information of each dimension of each candidate account and the target account;
a suspicious account screening unit configured to: and screening out suspicious account numbers similar to the target account number from all the candidate account numbers according to the comprehensive similarity between each candidate account number and the target account number, wherein the suspicious account numbers are different from users of the target account number.
11. An electronic device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the account identification method of any of claims 1 to 9.
12. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform an account identification method according to any one of claims 1 to 9.
CN202211137741.5A 2022-09-19 2022-09-19 Account identification method and device, electronic equipment and storage medium Pending CN115481687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211137741.5A CN115481687A (en) 2022-09-19 2022-09-19 Account identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211137741.5A CN115481687A (en) 2022-09-19 2022-09-19 Account identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115481687A true CN115481687A (en) 2022-12-16

Family

ID=84424157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211137741.5A Pending CN115481687A (en) 2022-09-19 2022-09-19 Account identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115481687A (en)

Similar Documents

Publication Publication Date Title
CN111680219B (en) Content recommendation method, device, equipment and readable storage medium
CN110866181B (en) Resource recommendation method, device and storage medium
WO2018188576A1 (en) Resource pushing method and device
US10409874B2 (en) Search based on combining user relationship datauser relationship data
US9064212B2 (en) Automatic event categorization for event ticket network systems
US7747616B2 (en) File search method and system therefor
US11727019B2 (en) Scalable dynamic acronym decoder
US20200110842A1 (en) Techniques to process search queries and perform contextual searches
US20150161529A1 (en) Identifying Related Events for Event Ticket Network Systems
WO2019062021A1 (en) Method for pushing loan advertisement in application program, electronic device, and medium
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
US20210216904A1 (en) Knowledge Aided Feature Engineering
US20210004693A1 (en) Real-Time On the Fly Generation of Feature-Based Label Embeddings Via Machine Learning
US11907977B2 (en) Collaborative text detection and text recognition
US20210397845A1 (en) Automatic identification of misleading videos using a computer network
JP6307822B2 (en) Program, computer and training data creation support method
JP7040535B2 (en) Security information processing equipment, information processing methods and programs
WO2021081914A1 (en) Pushing object determination method and apparatus, terminal device and storage medium
CN113537151A (en) Training method and device of image processing model, and image processing method and device
CN115481687A (en) Account identification method and device, electronic equipment and storage medium
US9928303B2 (en) Merging data analysis paths
CN113223017A (en) Training method of target segmentation model, target segmentation method and device
KR20220076765A (en) Method, system, and computer program for setting categories of community
JP2012043258A (en) Retrieval system, retrieval device, retrieval program, recording medium and retrieval method
CN114143612B (en) Video display method, device, electronic equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination