CN113283908B

CN113283908B - Target group identification method and device

Info

Publication number: CN113283908B
Application number: CN202110641677.3A
Authority: CN
Inventors: 王璐
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2023-07-18
Anticipated expiration: 2041-06-09
Also published as: CN113283908A

Abstract

The embodiment of the invention discloses a target group identification method and a device, wherein the method comprises the following steps: acquiring entity information stored in a platform log, determining a connection relation between the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relation; determining a group to be identified according to the graph to be identified and at least one preset abnormal mode; deleting entity information and connection relations in the communities to be identified from the graphs to be identified so as to update the graphs to be identified and determine new communities to be identified; and when the quantity of the entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and the at least one abnormal mode, and determining a target group according to the risk value. By the technical scheme of the embodiment of the invention, the accurate identification of the target group with the improper act of recharging and purchasing the virtual currency is realized.

Description

Target group identification method and device

Technical Field

The embodiment of the invention relates to the technical field of big data risk control, in particular to a target group identification method and device.

Background

On the live platform, there is an improper act of recharging to purchase virtual currency, such as: the act of purchasing virtual currency is performed using an off-the-shelf illegal credit card. The above behavior may cause problems of abnormal consumption and impaired benefits of the platform.

At present, the act of illegally recharging virtual money can be identified and intercepted by confirming whether the city of the user recharging and the consumption is the same. However, some parties may tamper with the IP address by improper means to hide the city information. In this case, the behavior of illegally charging and purchasing virtual money cannot be recognized, and the benefits of the platform are greatly impaired.

Disclosure of Invention

The embodiment of the invention provides a target group identification method and device, which are used for accurately identifying target groups with the behaviors of illegally recharging and purchasing virtual currency.

In a first aspect, an embodiment of the present invention provides a method for identifying a target group, where the method includes:

acquiring entity information stored in a platform log, determining a connection relation between the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relation;

Determining a group to be identified according to the graph to be identified and at least one preset abnormal mode;

deleting entity information and connection relations in the communities to be identified from the graphs to be identified so as to update the graphs to be identified and determine new communities to be identified;

and when the quantity of the entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and the at least one abnormal mode, and determining a target group according to the risk value.

In a second aspect, an embodiment of the present invention further provides an apparatus for identifying a target group, where the apparatus includes:

the diagram to be identified construction module is used for acquiring entity information stored in a platform log, determining the connection relation between the entity information according to the entity information, and constructing a diagram to be identified based on the entity information and the connection relation;

the group to be identified determining module is used for determining the group to be identified according to the graph to be identified and at least one preset abnormal mode;

the map to be identified updating module is used for deleting entity information and connection relations in the communities to be identified from the map to be identified so as to update the map to be identified and determine a new community to be identified;

And the target group determining module is used for determining the risk value of each group to be identified according to each group to be identified and the at least one abnormal mode when the number of the entity information in the map to be identified meets the preset condition, and determining the target group according to the risk value.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of identifying a target community according to any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for identifying a target community according to any of the embodiments of the present invention.

According to the technical scheme, the entity information stored in the platform log is obtained, the connection relation between the entity information is determined according to the entity information, the graph to be identified is constructed based on the entity information and the connection relation so as to determine different communities to be identified, the communities to be identified are determined according to the graph to be identified and at least one preset abnormal mode, the entity information and the connection relation in the communities to be identified are deleted from the graph to be identified, the graph to be identified is updated, a new community to be identified is determined, when the quantity of the entity information in the graph to be identified meets preset conditions, the risk value of each community to be identified is determined according to each community to be identified and at least one abnormal mode, and the target community is determined according to the risk value, so that the problem that behaviors of illegal recharging and virtual currency consumption are difficult to accurately identify is solved, and the technical effect of accurately identifying the target communities with behaviors of illegal recharging and purchasing virtual currency is achieved.

Drawings

In order to more clearly illustrate the technical solution of the exemplary embodiments of the present invention, a brief description is given below of the drawings required for describing the embodiments. It is obvious that the drawings presented are only drawings of some of the embodiments of the invention to be described, and not all the drawings, and that other drawings can be made according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for identifying a target group according to an embodiment of the invention;

FIG. 2 is a schematic diagram of an anomaly mode I according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of an abnormal mode II according to a first embodiment of the present invention;

FIG. 4 is a schematic diagram of an abnormal pattern III according to a first embodiment of the present invention;

fig. 5 is a flowchart of a method for identifying a target group according to a second embodiment of the present invention;

fig. 6 is a schematic structural diagram of an identification device for a target group according to a third embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flow chart of a target group identification method according to an embodiment of the present invention, where the method may be applied to the case of identifying a group that illegally consumes virtual money by recharging, and the method may be performed by a target group identification device, where the device may be implemented in the form of software and/or hardware, and the hardware may be an electronic device, optionally, an electronic device may be a mobile terminal, a PC, or the like.

As shown in fig. 1, the method of this embodiment specifically includes the following steps:

s110, acquiring entity information stored in a platform log, determining connection relations among the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relations.

The platform log may be a behavior log related to virtual currency stored in the platform. The entity information may include information of the platform user, and may also include information related to the user's manipulation of the virtual currency. The connection relationship may be whether or not the entity information is connected. The graph to be identified can be a graph structure constructed according to the information of each entity and the connection relation and is used for subsequently determining the group to be identified so as to identify the target group.

By way of example, the entity information may include a user account number, a user mailbox, a user refill IP (Internet Protocol ), a user refill device identification, a user payment IP, a user payment device identification, a third party platform account number, a recipient name, a recipient address, and the like. The user account may be an account registered by the user on the platform, may be a UID (user identification ), and the attribute information bound with the user account may include a user mobile phone number. The user mailbox may be a mailbox filled in by the user when registering an account on the platform. The user recharging IP may be an IP address when the user performs virtual currency recharging, and the attribute information bound with the user recharging IP may include a city to which the user recharging IP belongs. The user recharging device identification may be an identification of a device used by the user to recharge the virtual currency. The user payment IP may be an IP address when the user performs virtual currency consumption, and the attribute information bound with the user payment IP may include a city to which the user payment IP belongs. The user payment device identification may be an identification of a device used by the user in making the virtual currency consumption. The third party platform account may be a recharging platform account for which the user recharges the user account, for example: the attribute information bound to the third party platform account number may include a user mobile phone number on the third party platform, such as a payment facilitation platform account number, an apple store account number, a bank card account number, etc. The consignee name may be name information that the user fills in when registering with the third party platform. The receiving address may be address information filled in by the user when registering on the third party platform, and may include provinces, cities, streets, house numbers, and the like.

In particular, entity information required for identifying the target community may be obtained from the platform log. And establishing a connection relation of the entity information according to the relationship of recharging, consumption, binding and the like, and constructing a diagram to be identified according to the entity information and the connection relation, wherein the diagram to be identified is used for describing the association of the entity relations so as to identify the target group.

For example, if the user account a uses the IP address IP1 during payment, the entity information of the user account a has a connection relationship with the entity information of the IP 1.

It should be noted that, for the user who normally charges and consumes the virtual currency, the user charging IP, the user charging device identifier, the user payment IP, and the user payment device identifier rarely change. However, for the group of unfair recharging consuming virtual currency, the IP address and the device are often replaced, so that the user recharging IP, the user recharging device identifier, the user payment IP and the user payment device identifier are changed, and the information usually has a certain aggregation property, so that the identification is convenient. There are also some groups that illegally recharge the virtual currency, and in order to hide the IP address and device information, illegally disguise the information to avoid being recognized by the platform. However, although the IP address and the device information are disguised, the real information bound to the platform and the third party platform necessarily exposes the corporate relationship. Thus, these real information, such as: the mobile phone number, the name of the receiver, the receiving address and the like have important roles in constructing the diagram to be identified and identifying the target group.

It should be further noted that, the benefit of constructing the map to be identified is that it is capable of simplifying the connection relationship between the entity information and displaying the core information between the entity information.

S120, determining a group to be identified according to the graph to be identified and at least one preset abnormal mode.

The abnormal mode may be a preset mode diagram for distinguishing abnormal behaviors. The community to be identified may be a community in which a portion of the graph to be identified has associated entity information.

Specifically, the graph to be identified can be divided according to at least one preset abnormal mode, and entity information corresponding to each divided part is used as a group to be identified.

Optionally, before determining the group to be identified according to the graph to be identified and at least one preset abnormal mode, at least one abnormal mode can be constructed; wherein the anomaly pattern includes at least one of:

the abnormal mode I and the source city entity information and the destination city entity information corresponding to the account entity information are different.

The account entity information may be a user account in the entity information, the source city entity information may be city information to which an IP bound with a user recharging IP in the entity information belongs, and the destination city entity information may be city information to which an IP bound with a user payment IP in the entity information belongs.

Specifically, when the virtual currency is normally recharged and consumed, the cities to which the IP addresses used in recharging and consuming are consistent, i.e., the recharging and consuming of the virtual currency, such as the virtual currency substitution recharging, are completed in the same city. However, the illegal recharging and the consumption of virtual money are usually performed in one city and in another city. Therefore, a pattern in which source city entity information and destination city entity information corresponding to account entity information are different may be regarded as an abnormal pattern, as shown in fig. 2.

The source equipment identification entity information and the destination equipment identification entity information corresponding to the account entity information are different in the abnormal mode II.

The source equipment identification entity information may be a user recharging equipment identification in the entity information, and the destination equipment identification entity information may be a user payment equipment identification in the entity information.

Specifically, when the virtual currency is normally charged and consumed, the devices used in the charging and the consuming should be the same, i.e. the device identifications should be identical. However, improper recharging and consumption of virtual currency is typically done on one device and on another device, such as virtual currency generation recharging. Accordingly, a mode in which source device identification entity information and destination device identification entity information corresponding to account entity information are different may be regarded as an abnormal mode, as shown in fig. 3.

And the abnormal mode III is different from the first mobile phone number entity information corresponding to the account number entity information and the second mobile phone number entity information corresponding to the third party account number entity information.

The first mobile phone number entity information may be a user mobile phone number bound with a user account in the entity information, and the second mobile phone number entity information may be a user mobile phone number bound with a third party platform account of the user.

Specifically, when the virtual currency is normally recharged and consumed, the second mobile phone number corresponding to the third party account number used when the virtual currency is recharged and the first mobile phone number corresponding to the user account number consuming the virtual currency should be the same, i.e. the mobile phone number bound by the user account number and the third party platform account number of the user is the same. However, when the virtual currency is charged illegally and consumed, some of the user accounts are typically charged with the virtual currency using some of the illegitimate third party platform accounts, and the mobile phone numbers bound between the user accounts and the third party platform accounts are different, for example, charging is performed using a third party platform account other than the user. Therefore, a mode in which the first mobile phone number entity information corresponding to the account entity information is different from the second mobile phone number entity information of the third party account entity information corresponding to the account entity information may be used as an abnormal mode, as shown in fig. 4.

In addition to the above three types of abnormality patterns, other patterns including abnormal behavior related to improper recharging and consumption of virtual money may be extended, and in this embodiment, the description will not be given.

It should also be noted that the benefit of constructing an exception pattern is: the patterns of improper recharging consumption recorded in the history record on the platform can be added into the abnormal pattern set, so that the prior information in the history record can be fully used. Each abnormal mode is used as the basis for identifying the target group, and the problems of misjudgment and missed judgment can exist, but if various abnormal modes are combined, the accuracy and coverage rate of target group identification can be obviously increased.

And S130, deleting entity information and connection relations in the communities to be identified from the graphs to be identified so as to update the graphs to be identified and determine new communities to be identified.

Specifically, after determining one community to be identified, the next community to be identified needs to be determined from the graph to be identified. Therefore, the entity information and the connection relation in the community to be identified can be deleted from the graph to be identified, and the rest of the entity information and the connection relation are used as a new graph to be identified. Further, the next community to be identified may be determined from the new graph to be identified.

And S140, when the number of the entity information in the graph to be identified meets the preset condition, determining the risk value of each group to be identified according to each group to be identified and at least one abnormal mode, and determining the target group according to the risk value.

The preset condition may be a preset amount of entity information, for example: when the number of the entity information in the map to be identified is smaller than or equal to the preset number of the entity information, the remaining entity information can be considered not to belong to any group to be identified. The risk value may be a value that is used to measure the illegitimate top-up consumption virtual currency that the community to be identified has. The target community may be an identified community with an unfair top-up consuming virtual currency act.

Specifically, when the number of entity information in the map to be identified satisfies a preset condition, a plurality of communities to be identified may be determined. Further, for each group to be identified, the risk value of each group to be identified can be calculated by the graph structure constituted by the abnormal pattern and each group to be identified. And whether the group to be identified is the target group can be judged according to the magnitude of the risk value.

Alternatively, the risk value of the community to be identified may be compared with a preset risk value threshold, and the community to be identified with the risk value higher than the risk value threshold is taken as the target community.

It should be noted that, the influencing factor of the risk value threshold is related to the false recognition tolerance, if the tolerance of false recognition is higher, the risk value threshold can be properly lowered to identify more suspected communities, otherwise, the risk value threshold can be properly raised.

Optionally, after the target group is identified, a secondary verification can be performed manually, so as to ensure the accuracy of target group identification. And, corresponding treatments can also be performed for the target community, for example: for user accounts in the target group, measures such as account freezing or transaction prohibition are adopted for treatment.

Example two

Fig. 5 is a flow chart of a target group identification method according to a second embodiment of the present invention, and the determination method for the groups to be identified and the determination method for the risk values of the groups to be identified can be referred to the technical solution of the present embodiment based on the foregoing embodiments. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.

As shown in fig. 5, the method of this embodiment specifically includes the following steps:

s210, acquiring entity information stored in a platform log, determining connection relations among the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relations.

Specifically, after the entity information stored in the platform log is acquired, the connection relationship between the entity information can be determined. And generating a graph structure according to the entity information and the connection relation, and recording the graph structure as a graph to be identified.

Alternatively, the map to be identified may be constructed based on the following:

and determining each node information of the graph to be identified based on each entity information, and constructing the graph to be identified based on the node information and the connection relation.

Wherein each node information corresponds to one entity information or to two or more entity information having the same content.

Specifically, two or more entity information having the same content may be regarded as one node information, and further, a graph to be identified may be constructed based on the node information and the connection relationship. The method has the advantage that the problem that the constructed diagram to be identified is inaccurate due to the fact that the content is the same and the generated entity information is different caused by the problems of writing habit and the like can be avoided.

It may be determined whether at least two entity information are entity information having the same content based on the following manner:

if the at least two entity information is not address entity information, determining a character string corresponding to each entity information, and judging whether the character strings are identical or not to determine whether the at least two entity information is entity information with identical content or not.

Wherein the address entity information may be a shipping address in the entity information.

Specifically, whether the types of at least two entity information are consistent is firstly judged, and if the types of the at least two entity information are inconsistent, the at least two entity information cannot be entity information with the same content; if the two entity information are consistent, judging whether the at least two entity information are address entity information. If the at least two entity information is not address entity information, the at least two entity information can be judged directly by using a character string matching mode, and if the character strings are the same, the at least two entity information is considered to be entity information with the same content; if the character strings are different, it is considered that at least two entity information are not entity information having the same content.

And step two, if the at least two entity information are address entity information, determining character strings to be matched of the at least two entity information based on preset address dividing elements, and determining whether the at least two entity information are entity information with the same content according to the length of each character string to be matched.

The preset address dividing element can be element information of a country, a province, a city, a county, a village and the like.

Specifically, because address entity information is different due to different expression modes or deviation when address information is filled, different address entity information points to the same place. Therefore, whether at least two pieces of address entity information are entity information having the same content can be preliminarily determined by determining whether preset address dividing elements in the address entity information are identical. If the preset address dividing elements in the at least two address entity information are inconsistent, the at least two address entity information are not entity information with the same content; if the preset address dividing elements in the at least two address entity information are consistent, the at least two address entity information may be entity information with the same content, and further judgment is needed, the preset address dividing elements in the address entity information are partially deleted, and the rest part is used as a character string to be matched, so as to determine whether the at least two entity information are entity information with the same content according to the length of each character string to be matched.

Optionally, the manner of determining the character strings to be matched of the at least two entity information may be:

if the target fields corresponding to the preset address dividing elements of the at least two entity information are different, determining that the at least two entity information is not the entity information with the same content; if the target fields corresponding to the preset address dividing elements of the at least two entity information are the same, deleting the preset address dividing elements in the at least two entity information, and taking the rest part as a character string to be matched of the at least two entity information. The target field may be specific information in each preset address dividing element.

Further, it may be determined whether at least two entity information are entity information having the same content according to the length of each character string to be matched based on the following manner:

determining the conversion cost of two character strings to be matched according to the following formula:

wherein c(s) ₁ ，s ₂ ) Representing character strings s to be matched ₁ And character string s to be matched ₂ Is, s ₁ I represents the character string s to be matched ₁ Length of s ₂ I represents the character string s to be matched ₂ Length of gap(s) ₁ ，s ₂ ) Representing character strings s to be matched ₁ And the length difference of the character string sx to be matched.

Specifically, gap (s ₁ ，s ₂ ) The calculation method of (1) is as follows: first, the character string s to be matched ₁ As a reference, the character string s to be matched ₁ And character string s to be matched ₂ Is aligned with the initial character of the matching string s ₂ Character string s to be matched ₁ A small number of characters; then to match character string s ₂ As a benchmark, counting the character strings s to be matched ₁ Character string s to be matched ₂ A small number of characters. Finally, summing the two character numbers to obtain a character string s to be matched ₁ And character string s to be matched ₂ Length difference of (c). Dividing the length difference by the character string s to be matched ₁ And character string s to be matched ₂ To measure the length and value of the character string s to be matched ₁ And character string s to be matched ₂ Whether it is the same address.

If the transformation cost is smaller than the preset cost threshold, determining that the entity information corresponding to the two character strings to be matched is the same entity information.

The preset cost threshold may be a preset value for determining whether the entity information corresponding to the two character strings to be matched is the same entity information.

Illustratively, the two address entity information are respectively: soft valley in mountain area of mountain in Wuhan City of Hubei provincePart garden F3 and mountain area valley software garden F3 building 17 of Wuhan city in Hubei province. The preset address dividing elements in the two address entity information can be determined to be completely identical in Hubei province, wuhan city and Guangdong mountain area. Furthermore, two character strings to be matched can be obtained as follows: character string s to be matched ₁ : optical valley software garden F3 and character string s to be matched ₂ : valley software garden F3 building 17. Can determine the character string s to be matched ₂ No more than character string s to be matched ₁ Few characters, and thus the number of characters is 0. The character string s to be matched can also be determined ₁ Character string s to be matched ₂ The few characters are "building 17", and the number of characters is 4. Thus, gap(s) ₁ ，s ₂ )＝4，|s ₁ |＝7，|s ₂ |=11, and further, a transform cost is determined

S220, determining a weight matrix based on the graph to be identified and at least one preset abnormal mode.

The weight matrix may be a matrix for measuring correlation between each entity information and the abnormal pattern in the graph to be identified.

Specifically, the weight matrix may be generated according to the number of times that each two entity information in the body to be identified belongs to the same abnormal mode.

Alternatively, for every two pieces of entity information, the weight values of the two pieces of entity information may be determined according to the following formula:

W ^M _ij ＝#((i，j)∈M)

wherein W is ^M _ij The weight value between the entity information i and the entity information j is represented, M represents an abnormal mode set, # ((i, j) epsilon M) represents the total times that the entity information i and the entity information j belong to the set M;

and determining a weight matrix according to each weight value.

Specifically, the total times that every two pieces of entity information belong to the abnormal mode set are counted, and the weight value W of the two pieces of entity information is determined ^M _ij Further, a weight matrix W can be determined ^M 。

The weight matrix WM determined in the above manner is a symmetric matrix in which the main diagonal elements are all 0.

S230, determining a degree matrix of the weight matrix based on the weight matrix, and determining a Laplace matrix based on the weight matrix and the degree matrix.

The degree matrix can be a diagonal matrix formed by the sum of element values of each column of the weight matrix. The laplace matrix may be the difference between the degree matrix and the weight matrix.

Specifically, the log angle value may be determined according to the following formula:

wherein D is _ii Represents the ith logarithmic angle value, W ^M _ij A weight value representing between the entity information i and the entity information j;

and determining a degree matrix of the weight matrix according to the logarithmic angle value.

Further, the Laplace matrix is determined according to the following formula:

L ^M ＝D ^M -W ^M

wherein L is ^M Representing a Laplace matrix, D ^M Representing a degree matrix, W ^M Representing a weight matrix.

Exemplary, if the weight matrix W ^M Is thatFurther, the degree matrix D is determined ^M Is thatFinally, a Laplace matrix L can be determined ^M Is->

S240, determining a second small eigenvalue corresponding to the Laplacian matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplacian matrix.

Wherein the second smallest eigenvalue may be the second smallest value of all eigenvalues of the laplace matrix. The feature vector may be a feature vector corresponding to the second smallest feature value.

Specifically, the eigenvalues and eigenvectors of the laplace matrix may be calculated, and the eigenvalues are ordered to obtain the eigenvectors corresponding to the second small eigenvalues.

L ^M x＝λ ₂ x

Wherein L is ^M Representing a Laplace matrix, lambda ₂ Representing the second smallest eigenvalue of the laplace matrix, x representing the second smallest eigenvalue λ ₂ Corresponding feature vectors.

S250, sorting entity information in the graph to be identified according to element values in the feature vectors.

Specifically, the feature vector includes a plurality of element values, each of which corresponds to one of the entity information. It can be understood that the ith element in the feature vector corresponds to the ith entity information when constructing the weight matrix. Further, the element values in the feature vector are sorted in ascending order, and the entity information is reordered.

It should be noted that the benefits of S220-S250 are: firstly, the weight among the entity information in the graph to be identified can be calculated through the preset abnormal modes, and the more the number of the abnormal modes hit by the entity information is, the more abnormal the relation of the entity information is explained. In this way, the risk level and the graph structure of the graph to be identified can be effectively combined, and the later processing is facilitated. The Laplace matrix of the computational graph is a mathematical processing method, which has the following benefits: the above information can be mathematically processed and a matrix method can be used to extract the relevant information. Since the eigenvector corresponding to the second smallest eigenvalue of the laplace matrix is an approximation of the optimal segmentation vector, the eigenvector corresponding to the second smallest eigenvalue is selected as a representation of the entity information.

And S260, determining the division effectiveness corresponding to each preset division value according to each preset division value, the ordered entity information and the graph to be identified, and determining the community to be identified according to the division effectiveness.

The preset score value may be each value within a preset interval. The effectiveness of the division may be a value that measures the extent to which the division results in the destruction of the abnormal pattern when the community is to be identified.

Specifically, each preset score value may be determined according to a preset interval, for example: the preset interval is [ P, M ], wherein P and M are constants, P is smaller than M, and any value between P and M can be used as a preset dividing value. The feature vector corresponding to the second smallest feature value can be divided according to each preset division value, and as each element of the feature vector corresponds to one entity information, the ordered entity information is divided. The map to be identified can be divided according to the divided entity information, and the number of abnormal modes damaged during division is determined so as to determine the division effectiveness corresponding to each preset division value. If the greater the effectiveness of division is, the smaller the influence degree of the destruction abnormal mode is, the group divided based on the preset division value with the maximum effectiveness of division can be used as the group to be identified, and the rest can be used for the next division of the group to be identified.

Alternatively, the partition validity corresponding to each preset partition value may be determined based on the following steps:

step one, determining the number of damage modes generated by dividing the graph to be identified according to the preset division value and the ordered entity information aiming at each preset division value.

And step two, determining the partition effectiveness corresponding to the preset partition value based on the number of the destruction modes and the number of the entity information covered by at least one abnormal mode.

Specifically, the partition effectiveness corresponding to the preset partition value may be determined according to the following formula:

wherein v (cut) represents the partition effectiveness corresponding to the preset partition value cut, cutM epsilon M represents a mode of damage when the graph to be identified is partitioned according to the preset partition value cut and the ordered entity information, # (cutM epsilon M) represents the number of damage modes generated by partitioning the graph to be identified according to the preset partition value cut and the ordered entity information, P epsilon M represents the entity information P belongs to at least one abnormal mode, and # (P epsilon M) represents the number of entity information covered by at least one abnormal mode.

It should be noted that the principle of the above formula is: for a good partition, the determined abnormal mode is not destroyed as much as possible, so the number of abnormal modes passing through during the partition should be as small as possible, and the effectiveness of the partition is as great as possible.

And S270, deleting entity information and connection relations in the communities to be identified from the graphs to be identified so as to update the graphs to be identified and determine new communities to be identified.

It should be noted that, a preset division value with the greatest division effectiveness is selected as an optimal result, the entity information is divided by using the preset division value, and the division result is used as a first division result. Taking the rest part of the first division as a new diagram to be identified, repeatedly executing the steps (S220-S260) to divide, and continuously iterating until the quantity of the entity information of the rest part after the division meets the preset condition, and stopping the division.

And S280, when the number of entity information in the graph to be identified meets a preset condition, determining risk values of all the communities to be identified according to all the communities to be identified and at least one abnormal mode, and determining target communities according to the risk values.

Specifically, for each group to be identified, the risk value of each group to be identified can be calculated through a graph structure formed by the abnormal mode and each group to be identified. And whether the group to be identified is the target group can be judged according to the magnitude of the risk value.

Alternatively, the risk value for each community to be identified may be determined based on the following steps:

Step one, determining the quantity of entity information containing various abnormal modes of each group to be identified as the quantity of sub-entities to be identified according to each group to be identified.

Specifically, for each group to be identified, it may be determined that the number of entity information of each abnormal mode included in the group to be identified is the number of sub-entities to be identified. For example: the number of the sub-entities to be identified of the group A to be identified containing the abnormal pattern X is 5, the number of the sub-entities to be identified containing the abnormal pattern Y is 7, and the like.

And step two, determining the entity information quantity containing various abnormal modes in the diagram to be identified as the total entity quantity to be identified.

Specifically, the number of entity information of each abnormal mode contained in the graph to be identified can be determined as the total number of entities to be identified. For example: the number of sub-entities to be identified including the abnormal pattern X is 16, the number of sub-entities to be identified including the abnormal pattern Y is 36, and the like.

And thirdly, determining the risk value of the group to be identified according to the number of sub-entities to be identified, the total number of the entities to be identified and the number of entity information of the group to be identified.

Specifically, the risk value of the community to be identified may be determined according to the following formula:

wherein s (G) represents a risk value of the group G to be identified, # G represents the number of entity information in the group G to be identified, M represents an mth abnormal mode, M represents an abnormal mode set, # G (M) represents the number of total entities to be identified containing the mth abnormal mode in the graph to be identified, and # G (M) represents the number of sub-entities to be identified, wherein the group G to be identified contains the mth abnormal mode.

It should be noted that the principle of the above formula is: the risk value of the community to be identified is related to two factors. The first aspect is the size of the community to be identified, the larger the community size, the higher the aggregation level, the metric is measured using log (#g) in the above formula. The benefit of taking the logarithm here is that, due to the relationship between risk value and community sizeNot in a linear relationship, the risk value is already large and does not increase significantly when the community size reaches a certain level. The second aspect is that the number of entity information containing abnormal patterns in the community to be identified is larger, and the risk value is larger. To accurately describe the factors of the second aspect, the above formula employsTo measure the risk level of the abnormal pattern m, taking into account here the scarcity +_in the graph to be identified of the abnormal pattern m>The more scarce abnormal patterns illustrate the greater variability from normal behavior, i.e., the greater the degree of risk of abnormal patterns. At the same time, the number of entity information in the community to be identified is taken into account>The higher the duty cycle, the more pronounced the abnormal pattern of the community to be identified, the greater the risk value.

According to the technical scheme, the entity information stored in the platform log is obtained, the connection relation among the entity information is determined according to the entity information, the map to be identified is constructed based on the entity information and the connection relation, the weight matrix is determined based on the map to be identified and at least one preset abnormal mode, the degree matrix of the weight matrix is determined based on the weight matrix, the Laplace matrix is determined based on the weight matrix and the degree matrix, further, the second small characteristic value corresponding to the Laplace matrix and the characteristic vector corresponding to the second small characteristic value are determined according to the Laplace matrix, the entity information in the map to be identified is ordered according to the element values in the characteristic vector, so that the entity information is divided, the division effectiveness corresponding to the division values is determined according to the preset scores, the entity information after ordering and the map to be identified, the entity information and the connection relation in the map to be identified are deleted from the map to be identified according to the division effectiveness, the map to be identified is updated, the new map to be identified, when the value of the entity information in the map to be identified is satisfied with the preset quantity, the virtual community is not really equal to the value, the target value is determined, and the consumption performance of the virtual community is difficult to be identified is difficult to be satisfied according to the at least, and the abnormal value is determined, and the consumption performance is difficult to be accurately determined.

Example III

Fig. 6 is a schematic structural diagram of an identification device for a target group according to a third embodiment of the present invention, where the device includes: the system comprises a diagram to be identified construction module 310, a community to be identified determination module 320, a diagram to be identified updating module 330 and a target community determination module 340.

The diagram to be identified construction module 310 is configured to obtain entity information stored in a platform log, determine a connection relationship between the entity information according to the entity information, and construct a diagram to be identified based on the entity information and the connection relationship; the community to be identified determining module 320 is configured to determine a community to be identified according to the graph to be identified and at least one preset abnormal mode; a diagram to be identified updating module 330, configured to delete entity information and connection relationships in the group to be identified from the diagram to be identified, so as to update the diagram to be identified and determine a new group to be identified; the target group determining module 340 is configured to determine a risk value of each group to be identified according to each group to be identified and the at least one abnormal mode when the number of entity information in the map to be identified satisfies a preset condition, and determine a target group according to the risk value.

Optionally, the diagram to be identified construction module is further configured to determine each node information of the diagram to be identified based on each entity information, and construct the diagram to be identified based on the node information and the connection relationship; wherein each node information corresponds to one entity information or to two or more entity information having the same content; determining whether at least two entity information are entity information having the same content based on: if the at least two entity information is not address entity information, determining a character string corresponding to each entity information, and judging whether the character strings are identical or not to determine whether the at least two entity information is entity information with identical content or not; if the at least two entity information is address entity information, determining character strings to be matched of the at least two entity information based on preset address dividing elements, and determining whether the at least two entity information is entity information with the same content according to the length of each character string to be matched.

Optionally, the diagram to be identified construction module is further configured to determine that the at least two entity information is not entity information with the same content if target fields corresponding to preset address dividing elements of the at least two entity information are different; if the target fields corresponding to the preset address dividing elements of the at least two entity information are the same, deleting the preset address dividing elements in the at least two entity information, and taking the rest as character strings to be matched of the at least two entity information; determining the conversion cost of two character strings to be matched according to the following formula:

Wherein c(s) ₁ ，s ₂ ) Representing character strings s to be matched ₁ And character string s to be matched ₂ Is, s ₁ I represents the character string s to be matched ₁ Length of s ₂ I represents the character string s to be matched ₂ Length of gap(s) ₁ ，s ₂ ) Representing character strings s to be matched ₁ And character string s to be matched ₂ Length difference of (2);

and if the transformation cost is smaller than a preset cost threshold, determining that the entity information corresponding to the two character strings to be matched is the same entity information.

Optionally, the apparatus further includes: the abnormal pattern construction module is used for constructing at least one abnormal pattern; wherein the abnormal pattern includes at least one of:

the source city entity information corresponding to the account entity information is different from the destination city entity information;

the source equipment identification entity information corresponding to the account entity information is different from the destination equipment identification entity information;

the first mobile phone number entity information corresponding to the account entity information is different from the second mobile phone number entity information of the third party account entity information corresponding to the account entity information.

Optionally, the group to be identified determining module is further configured to determine a weight matrix based on the graph to be identified and at least one preset abnormal mode; determining a degree matrix of the weight matrix based on the weight matrix, and determining a laplace matrix based on the weight matrix and the degree matrix; determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix; sorting entity information in the graph to be identified according to the element values in the feature vector; determining the partition effectiveness corresponding to each preset partition value according to each preset partition value, the sorted entity information and the graph to be identified, and determining the group to be identified according to the partition effectiveness.

Optionally, the to-be-identified community determining module is further configured to determine, for each two entity information, a weight value of the two entity information according to the following formula:

W ^M _ij ＝#((i，j)∈M)

determining a weight matrix according to each weight value;

the log angle value is determined according to the following formula:

determining a degree matrix of the weight matrix according to the diagonal degree value;

the Laplace matrix is determined according to the following formula:

L ^M ＝D ^M -W ^M

wherein L is ^M Representing a Laplace matrix, D ^M Representing the degree matrix, W ^M Representing the weight matrix.

Optionally, the group to be identified determining module is further configured to determine, for each preset score value, a number of destruction modes generated by dividing the graph to be identified according to the preset score value and the ordered entity information; determining partition validity corresponding to the preset partition value based on the number of destruction modes and the number of entity information covered by the at least one abnormal mode.

Optionally, the group to be identified determining module is further configured to determine a division effective piece corresponding to the preset division value according to the following formula:

wherein v (cut) represents the effectiveness of division corresponding to a preset division value cut, cutM e M represents a mode of damage when dividing the graph to be identified according to the preset division value cut and the entity information after sorting, # (cutM e M) represents the number of damage modes generated by dividing the graph to be identified according to the preset division value cut and the entity information after sorting, P e M represents that the entity information P belongs to at least one abnormal mode, and # (P e M) represents the number of the entity information covered by the at least one abnormal mode.

Optionally, the target group determining module is further configured to determine, for each group to be identified, that the group to be identified includes entity information of different patterns as the number of sub-entities to be identified; determining the number of entity information containing various abnormal modes in the graph to be identified as the total number of entities to be identified; and determining the risk value of the community to be identified according to the number of the sub-entities to be identified, the total number of the entities to be identified and the number of the entity information of the community to be identified.

Optionally, the target community determining module is further configured to determine a risk value of the community to be identified according to the following formula:

Wherein s (G) represents a risk value of a group G to be identified, #g represents an entity information amount in the group G to be identified, M represents an mth abnormal mode, M represents an abnormal mode set, #g (M) represents a total entity amount to be identified including the mth abnormal mode in the graph to be identified, and #g (M) represents a sub entity amount to be identified including the mth abnormal mode in the group G to be identified.

The target group identification device provided by the embodiment of the invention can execute the target group identification method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that each unit and module included in the above apparatus are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present invention.

Example IV

Fig. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. Fig. 7 shows a block diagram of an exemplary electronic device 40 suitable for use in implementing the embodiments of the present invention. The electronic device 40 shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 7, the electronic device 40 is in the form of a general purpose computing device. Components of electronic device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, a bus 403 that connects the various system components (including the system memory 402 and the processing units 401).

Bus 403 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 40 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 40 and includes both volatile and non-volatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 404 and/or cache memory 405. Electronic device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive"). Although not shown in fig. 7, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 403 through one or more data medium interfaces. The system memory 402 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.

A program/utility 408 having a set (at least one) of program modules 407 may be stored in, for example, system memory 402, such program modules 407 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.

The electronic device 40 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), one or more devices that enable a user to interact with the electronic device 40, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 40 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 411. Also, electronic device 40 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 412. As shown, network adapter 412 communicates with other modules of electronic device 40 over bus 403. It should be appreciated that although not shown in fig. 7, other hardware and/or software modules may be used in connection with electronic device 40, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 401 executes various functional applications and data processing by running a program stored in the system memory 402, for example, implements the target group identification method provided by the embodiment of the present invention.

Example five

A fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a method of identifying a target community, the method comprising:

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of identifying a target group, comprising:

When the quantity of entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and the at least one abnormal mode, and determining a target group according to the risk value;

the determining the group to be identified according to the graph to be identified and at least one preset abnormal mode comprises the following steps:

determining a weight matrix based on the graph to be identified and at least one preset abnormal mode;

determining a degree matrix of the weight matrix based on the weight matrix, and determining a laplace matrix based on the weight matrix and the degree matrix;

determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix;

sorting entity information in the graph to be identified according to the element values in the feature vector;

determining the partition effectiveness corresponding to each preset partition value according to each preset partition value, the sorted entity information and the graph to be identified, and determining the group to be identified according to the partition effectiveness.

2. The method according to claim 1, wherein the constructing a graph to be identified based on the respective entity information and the connection relationship includes:

Determining each node information of the graph to be identified based on the entity information, and constructing the graph to be identified based on the node information and the connection relation;

wherein each node information corresponds to one entity information or to two or more entity information having the same content; determining whether at least two entity information are entity information having the same content based on:

if the at least two entity information is not address entity information, determining a character string corresponding to each entity information, and judging whether the character strings are identical or not to determine whether the at least two entity information is entity information with identical content or not;

if the at least two entity information is address entity information, determining character strings to be matched of the at least two entity information based on preset address dividing elements, and determining whether the at least two entity information is entity information with the same content according to the length of each character string to be matched.

3. The method according to claim 2, wherein the determining the character string to be matched of the at least two entity information based on the preset address dividing element comprises:

If the target fields corresponding to the preset address dividing elements of the at least two entity information are different, determining that the at least two entity information is not entity information with the same content;

if the target fields corresponding to the preset address dividing elements of the at least two entity information are the same, deleting the preset address dividing elements in the at least two entity information, and taking the rest as character strings to be matched of the at least two entity information;

correspondingly, the determining whether the at least two entity information are entity information with the same content according to the lengths of the character strings to be matched includes:

4. The method according to claim 1, further comprising, before said determining the community to be identified according to said graph to be identified and to at least one anomaly pattern preset:

Constructing at least one anomaly pattern;

wherein the abnormal pattern includes at least one of:

5. The method according to claim 1, wherein the determining a weight matrix based on the graph to be identified and at least one anomaly pattern preset comprises:

for each two pieces of entity information, determining weight values of the two pieces of entity information according to the following formula:

W ^M _ij ＝#((i，j)∈M)

determining a weight matrix according to each weight value;

accordingly, the determining the degree matrix of the weight matrix based on the weight matrix includes:

the log angle value is determined according to the following formula:

accordingly, the determining the laplacian matrix based on the weight matrix and the degree matrix includes:

the Laplace matrix is determined according to the following formula:

L ^M ＝D ^M -W ^M

6. The method according to claim 1, wherein determining the partition validity corresponding to each preset partition value according to each preset partition value, the sorted entity information and the map to be identified includes:

determining the number of damage modes generated by dividing the graph to be identified according to the preset division value and the ordered entity information aiming at each preset division value;

determining partition validity corresponding to the preset partition value based on the number of destruction modes and the number of entity information covered by the at least one abnormal mode.

7. The method of claim 6, wherein the determining the partition validity corresponding to the preset partition value based on the number of destruction modes and the number of entity information covered by the at least one abnormal mode comprises:

Determining the partition effectiveness corresponding to the preset partition value according to the following formula:

wherein v (cut) represents the effectiveness of division corresponding to a preset division value cut, cutM e M represents a mode of damage when dividing the graph to be identified according to the preset division value cut and the entity information after sorting, # (cutM e M) represents the number of damage modes generated by dividing the graph to be identified according to the preset division value cut and the entity information after sorting, P e M represents that the entity information P belongs to at least one abnormal mode, and # (P e M) represents the number of entity information covered by the at least one abnormal mode.

8. The method of claim 1, wherein said determining a risk value for each community to be identified based on each community to be identified and said at least one anomaly pattern comprises:

for each group to be identified, determining the number of entity information containing various abnormal modes of the group to be identified as the number of sub-entities to be identified;

determining the number of entity information containing various abnormal modes in the graph to be identified as the total number of entities to be identified;

and determining the risk value of the community to be identified according to the number of the sub-entities to be identified, the total number of the entities to be identified and the number of the entity information of the community to be identified.

9. The method according to claim 8, wherein the determining the risk value of the community to be identified according to the number of sub-entities to be identified, the total number of entities to be identified and the number of entity information of the community to be identified includes:

determining the risk value of the community to be identified according to the following formula:

10. An identification device for a target group, comprising:

the target group determining module is used for determining risk values of all groups to be identified according to the groups to be identified and the at least one abnormal mode when the number of entity information in the map to be identified meets a preset condition, and determining target groups according to the risk values;

the group to be identified determining module is further used for determining a weight matrix based on the graph to be identified and at least one preset abnormal mode; determining a degree matrix of the weight matrix based on the weight matrix, and determining a laplace matrix based on the weight matrix and the degree matrix; determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix; sorting entity information in the graph to be identified according to the element values in the feature vector; determining the partition effectiveness corresponding to each preset partition value according to each preset partition value, the sorted entity information and the graph to be identified, and determining the group to be identified according to the partition effectiveness.