CN113283908A

CN113283908A - Target group identification method and device

Info

Publication number: CN113283908A
Application number: CN202110641677.3A
Authority: CN
Inventors: 王璐
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-08-20
Anticipated expiration: 2041-06-09
Also published as: CN113283908B

Abstract

The embodiment of the invention discloses a method and a device for identifying a target group, wherein the method comprises the following steps: acquiring entity information stored in a platform log, determining a connection relation between the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relation; determining a group to be identified according to the graph to be identified and at least one preset abnormal mode; deleting entity information and connection relation in the community to be identified from the graph to be identified so as to update the graph to be identified and determine a new community to be identified; and when the quantity of the entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and the at least one abnormal mode, and determining a target group according to the risk value. By the technical scheme of the embodiment of the invention, the target group with the illegal behavior of recharging and purchasing the virtual currency can be accurately identified.

Description

Target group identification method and device

Technical Field

The embodiment of the invention relates to the technical field of big data risk control, in particular to a target group identification method and device.

Background

On the live broadcast platform, there is an illegal act of charging and purchasing virtual currency, such as: the act of purchasing virtual money is performed using an overseas illegal credit card. The above behavior may cause a problem that consumption is abnormal and the benefit of the platform is impaired.

At present, it is possible to recognize and intercept an act of purchasing virtual money with an improper top-up by confirming whether a user top-up and a consumed city are the same. However, some parties may tamper with IP addresses by unjust means to hide city information. In this case, the act of purchasing virtual money by unauthorized recharge cannot be recognized, and the benefit of the platform is greatly impaired.

Disclosure of Invention

The embodiment of the invention provides a method and a device for identifying a target group, which are used for accurately identifying the target group with an illegal behavior of recharging and purchasing virtual money.

In a first aspect, an embodiment of the present invention provides a method for identifying a target group, where the method includes:

acquiring entity information stored in a platform log, determining a connection relation between the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relation;

determining a group to be identified according to the graph to be identified and at least one preset abnormal mode;

deleting entity information and connection relation in the community to be identified from the graph to be identified so as to update the graph to be identified and determine a new community to be identified;

and when the quantity of the entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and the at least one abnormal mode, and determining a target group according to the risk value.

In a second aspect, an embodiment of the present invention further provides an apparatus for identifying a target group, where the apparatus includes:

the system comprises a to-be-identified graph building module, a to-be-identified graph generating module and a graph identifying module, wherein the to-be-identified graph building module is used for obtaining entity information stored in a platform log, determining a connection relation between the entity information according to the entity information, and building the to-be-identified graph based on the entity information and the connection relation;

the group to be recognized determining module is used for determining a group to be recognized according to the graph to be recognized and at least one preset abnormal mode;

the to-be-identified graph updating module is used for deleting entity information and connection relation in the to-be-identified group from the to-be-identified graph so as to update the to-be-identified graph and determine a new to-be-identified group;

and the target group determining module is used for determining the risk value of each group to be identified according to each group to be identified and the at least one abnormal mode when the quantity of the entity information in the graph to be identified meets a preset condition, and determining the target group according to the risk value.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method for identifying a target community as in any of the embodiments of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying a target group according to any one of the embodiments of the present invention.

The technical proposal of the embodiment of the invention determines the connection relation between the entity information by acquiring the entity information stored in the platform log and according to the entity information, and constructs a graph to be identified based on the entity information and the connection relation so as to determine different communities to be identified, determining a group to be identified according to the graph to be identified and at least one preset abnormal mode, deleting entity information and connection relation in the group to be identified from the graph to be identified, to update the graph to be identified and determine a new group to be identified, when the amount of entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and at least one abnormal pattern, and the target group is determined according to the risk value, so that the problem that the behaviors of improper recharging and virtual money consumption are difficult to accurately identify is solved, and the technical effect of accurately identifying the target group with the behaviors of improper recharging and virtual money purchase is realized.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.

Fig. 1 is a flowchart illustrating a method for identifying a target group according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a first abnormal mode according to a first embodiment of the present invention;

FIG. 3 is a diagram illustrating an abnormal mode two according to a first embodiment of the present invention;

fig. 4 is a schematic diagram of an abnormal mode three according to a first embodiment of the present invention;

fig. 5 is a flowchart illustrating a method for identifying a target group according to a second embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus for identifying a target group according to a third embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a method for identifying a target group according to an embodiment of the present invention, where the present embodiment is applicable to a case of identifying a group that improperly supplements with value and consumes virtual money, and the method may be executed by an identification apparatus of the target group, and the apparatus may be implemented in the form of software and/or hardware, where the hardware may be an electronic device, and optionally, the electronic device may be a mobile terminal, a PC terminal, and the like.

As shown in fig. 1, the method of this embodiment specifically includes the following steps:

s110, acquiring entity information stored in the platform log, determining the connection relation among the entity information according to the entity information, and constructing a graph to be identified based on the entity information and the connection relation.

The platform log may be a behavior log related to virtual currency stored in the platform. The entity information may include information of a platform user and may also include information related to a user's operation on the virtual money. The connection relation can be whether the entity information is connected or not. The graph to be recognized may be a graph structure constructed according to the entity information and the connection relationship, and is used for subsequently determining the group to be recognized, and further recognizing the target group.

Illustratively, the entity information may include a user account, a user mailbox, a user charging IP (Internet Protocol), a user charging device identifier, a user payment IP, a user payment device identifier, a third party platform account, a consignee name, a consignee address, and the like. The User account may be an account registered by the User on the platform, and may be a User Identification (UID), and the attribute information bound to the User account may include a mobile phone number of the User. The user mailbox may be a mailbox filled in by the user when the account is registered on the platform. The user recharge IP may be an IP address when the user performs virtual money recharge, and the attribute information bound with the user recharge IP may include a city to which the user recharge IP belongs. The user recharge device identification may be an identification of a device used by the user when recharging virtual money. The user payment IP may be an IP address of the user when the user performs virtual money consumption, and the attribute information bound to the user payment IP may include a city to which the user payment IP belongs. The user payment device identification may be an identification of a device used by the user for virtual currency consumption. The third party platform account may be a recharge platform account for the user to recharge the user account, for example: the account number of the platform of the payment instrument, the account number of the apple store, the account number of the bank card and the like, and the attribute information bound with the account number of the third-party platform can comprise a mobile phone number of a user on the third-party platform. The consignee name may be name information that the user fills in when registering with the third party platform. The shipping address may be address information that the user fills in when registering with the third party platform and may include provinces, cities, streets, house numbers, and the like.

Specifically, entity information required for identifying the target community may be acquired from the platform log. And then, establishing a connection relation according to the entity information and the relations such as recharging, consuming, binding and the like, and constructing a graph to be identified according to the entity information and the connection relation, wherein the graph to be identified is used for describing the association of each entity relation so as to identify the target group.

Illustratively, if the user account A uses the IP address IP1 during payment, the entity information of the user account A and the entity information of the IP1 have a connection relationship.

It should be noted that, for a user who normally charges and consumes virtual money, the user charge IP, the user charge device identifier, the user payment IP, and the user payment device identifier rarely change. However, for a group who consumes virtual money by improper recharging, the IP address and the device are frequently replaced, which causes the change of the user recharging IP, the user recharging device identifier, the user payment IP and the user payment device identifier, and these information usually have a certain aggregation and are easy to identify. There are some groups that consume virtual money with an improper charge, and in order to hide the IP address and device information, the information is disguised improperly to avoid being recognized by the platform. However, although the IP address and device information are disguised, the real information bound to the platform and the third party platform necessarily exposes a community relationship. Therefore, such real information as: the mobile phone number, the name of the receiver, the receiving address and the like play an important role in constructing the to-be-identified picture and identifying the target group.

It should be further noted that the advantage of constructing the graph to be identified is that it is capable of both simplifying the connection relationship between the entity information and showing the core information between the entity information.

And S120, determining a group to be identified according to the graph to be identified and at least one preset abnormal mode.

The abnormal pattern may be a preset pattern diagram for distinguishing abnormal behavior. The community to be identified may be a community in which a portion of the graph to be identified has associated entity information.

Specifically, the graph to be recognized may be divided according to at least one preset abnormal pattern, and the entity information corresponding to each divided part is used as a group to be recognized.

Optionally, before determining the group to be identified according to the graph to be identified and the preset at least one abnormal mode, at least one abnormal mode may be further constructed; wherein the abnormal pattern includes at least one of:

and the abnormal mode I is different from the source city entity information and the destination city entity information corresponding to the account number entity information.

The account entity information may be a user account in the entity information, the source city entity information may be city information to which an IP bound to a user recharge IP in the entity information belongs, and the destination city entity information may be city information to which an IP bound to a user payment IP in the entity information belongs.

Specifically, when the virtual currency is normally charged and consumed, the cities to which the IP addresses used in charging and consumption belong should be consistent, that is, the charging and consumption of the virtual currency, such as the virtual currency substitution behavior, are completed in the same city. However, an improper recharge and consumption of virtual money usually takes place in one city and in another. Therefore, a mode in which the source city entity information and the destination city entity information corresponding to the account entity information are different may be used as an abnormal mode, as shown in fig. 2.

And the abnormal mode II is different from the source equipment identification entity information and the destination equipment identification entity information corresponding to the account number entity information.

The source device identification entity information may be a user recharging device identification in the entity information, and the destination device identification entity information may be a user payment device identification in the entity information.

Specifically, when the virtual currency is normally charged and consumed, the devices used in charging and consumption should be the same, i.e. the device identifications should be consistent. However, improper recharging and consumption of virtual currency, typically recharging on one device and consumption on another device, such as virtual currency substitution. Therefore, a mode in which the source device identification entity information and the destination device identification entity information corresponding to the account entity information are different may be used as an abnormal mode, as shown in fig. 3.

And in the abnormal mode III, the first mobile phone number entity information corresponding to the account number entity information is different from the second mobile phone number entity information of the third party account number entity information corresponding to the account number entity information.

The first mobile phone number entity information can be a user mobile phone number bound with a user account in the entity information, and the second mobile phone number entity information can be a user mobile phone number bound with a third party platform account of the user.

Specifically, when the virtual money is charged and consumed normally, the second mobile phone number corresponding to the third party account used in charging the virtual money should be the same as the first mobile phone number corresponding to the user account consuming the virtual money, that is, the user account is the same as the mobile phone number bound to the third party platform account of the user. However, when the virtual money is improperly recharged and consumed, the virtual money is recharged for some user accounts by using some improper third party platform accounts, and the mobile phone numbers bound between the user accounts and the third party platform accounts are different, for example, the virtual money is recharged by using the third party platform account which is not the user. Therefore, a mode in which the first mobile phone number entity information corresponding to the account number entity information and the second mobile phone number entity information of the third party account number entity information corresponding to the account number entity information are different may be used as an abnormal mode, as shown in fig. 4.

It should be noted that, in addition to the above three abnormal patterns, other patterns including abnormal behaviors related to the improper recharging and the consumption of the virtual money may be expanded, and the explanation thereof will not be repeated in this embodiment.

It should also be noted that the benefit of constructing an abnormal pattern is: the modes of the illegal recharge consumption recorded in the history record on the platform can be added into the abnormal mode set, so that the prior information in the history record can be fully used. Each abnormal mode is used as a basis for identifying the target group, so that the problems of misjudgment and missed judgment can exist, but if the abnormal modes are combined, the accuracy and the coverage rate of target group identification can be obviously increased.

And S130, deleting the entity information and the connection relation in the community to be identified from the graph to be identified so as to update the graph to be identified and determine a new community to be identified.

Specifically, after a group to be identified is determined, a next group to be identified needs to be determined from the graph to be identified. Therefore, the entity information and the connection relationship in the community to be identified can be deleted from the graph to be identified, and the remaining entity information and connection relationship can be used as a new graph to be identified. Further, the next group to be identified may be determined from the new map to be identified.

S140, when the quantity of the entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and at least one abnormal mode, and determining a target group according to the risk value.

The preset condition may be a preset number of entity information, for example: when the number of the entity information in the graph to be identified is less than or equal to the preset number of the entity information, it may be considered that the remaining entity information does not belong to any group to be identified. The risk value may be a value for measuring that the group to be identified has an improper charge for consuming the virtual money. The target group may be a group identified as having an act of consuming virtual money with an improper charge.

Specifically, when the number of the entity information in the graph to be recognized satisfies the preset condition, a plurality of groups to be recognized may be determined. Furthermore, for each group to be identified, the risk value of each group to be identified can be calculated by the graph structure formed by the abnormal pattern and each group to be identified. And, whether the community to be identified is the target community can be judged according to the magnitude of the risk value.

Optionally, the risk value of the group to be identified may be compared with a preset risk value threshold, and the group to be identified whose risk value is higher than the risk value threshold may be used as the target group.

It should be noted that the influence factor of the risk value threshold is related to the tolerance of misrecognition, and if the tolerance of misrecognition is high, the risk value threshold may be appropriately decreased to identify more suspected groups, otherwise, the risk value threshold may be appropriately increased.

Optionally, after the target group is identified, secondary verification may be performed in a manual manner, so as to ensure accuracy of target group identification. And, corresponding treatment can also be performed for the target community, for example: and (4) dealing with the user account in the target group by means of account freezing or transaction forbidding and the like.

Example two

Fig. 5 is a flowchart illustrating a method for identifying a target group according to a second embodiment of the present invention, where on the basis of the foregoing embodiments, reference may be made to the technical solution of this embodiment for a determination method of a group to be identified and a determination method of a risk value of each group to be identified. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

As shown in fig. 5, the method of this embodiment specifically includes the following steps:

s210, entity information stored in the platform log is obtained, the connection relation between the entity information is determined according to the entity information, and a graph to be identified is constructed based on the entity information and the connection relation.

Specifically, after the entity information stored in the platform log is acquired, the connection relationship between the entity information may also be determined. And generating a graph structure according to the entity information and the connection relation, and recording as a graph to be identified.

Optionally, the graph to be recognized may be constructed based on the following manner:

and determining each node information of the graph to be recognized based on each entity information, and constructing the graph to be recognized based on the node information and the connection relation.

Wherein each node information corresponds to one entity information or two or more entity information having the same content.

Specifically, two or more pieces of entity information having the same content may be regarded as one piece of node information, and then, the graph to be identified is constructed according to the node information and the connection relationship. The method has the advantage that the problem that the constructed graph to be recognized is inaccurate due to different generated entity information and the same content caused by the problems of writing habits and the like can be avoided.

Whether at least two entity information are entity information having the same content may be determined based on:

step one, if the at least two pieces of entity information are not address entity information, determining a character string corresponding to each piece of entity information, and judging whether the character strings are the same so as to determine whether the at least two pieces of entity information are entity information with the same content.

The address entity information may be a receiving address in the entity information.

Specifically, whether the types of the at least two pieces of entity information are consistent or not is judged, and if the types of the at least two pieces of entity information are not consistent, the at least two pieces of entity information cannot be entity information with the same content; if the address entity information is consistent with the address entity information, judging whether the at least two entity information are the address entity information. If the at least two pieces of entity information are not address entity information, the at least two pieces of entity information can be judged by directly using a character string matching mode, and if the character strings are the same, the at least two pieces of entity information are considered to be entity information with the same content; and if the character strings are different, the at least two pieces of entity information are not the entity information with the same content.

And step two, if the at least two pieces of entity information are address entity information, determining character strings to be matched of the at least two pieces of entity information based on the preset address partition elements, and determining whether the at least two pieces of entity information are entity information with the same content according to the length of each character string to be matched.

The preset address division elements can be element information of countries, provinces, cities, counties, villages and the like.

Specifically, the address entity information may be different in expression mode or different in address information filling, so that different address entity information may actually point to the same location. Therefore, whether at least two pieces of address entity information are entity information having the same content can be preliminarily determined by determining whether preset address partition elements in the address entity information are consistent. If the preset address partition elements in the at least two pieces of address entity information are inconsistent, the at least two pieces of address entity information are not entity information with the same content; if the preset address partition elements in the at least two pieces of address entity information are consistent, the at least two pieces of address entity information may be entity information with the same content, and the next step of judgment is needed, the preset address partition elements in the address entity information are partially deleted, and the remaining part is used as a character string to be matched, so as to determine whether the at least two pieces of entity information are entity information with the same content according to the length of each character string to be matched.

Optionally, the mode of determining the character strings to be matched of the at least two pieces of entity information may be:

if the target fields corresponding to the preset address partition elements of the at least two pieces of entity information are different, determining that the at least two pieces of entity information are not entity information with the same content; and if the target fields corresponding to the preset address partition elements of the at least two pieces of entity information are the same, deleting the preset address partition elements in the at least two pieces of entity information, and taking the rest parts as character strings to be matched of the at least two pieces of entity information. The target field may be specific information in each preset address partition element.

Further, whether at least two pieces of entity information are entity information having the same content may be determined according to the length of each character string to be matched based on the following manner:

determining the transformation cost of two character strings to be matched according to the following formula:

wherein, c(s)₁，s₂) Representing a string s to be matched₁And a character string s to be matched₂Is transformed to a cost, | s₁I represents the string s to be matched₁Length, | s₂I represents the string s to be matched₂Length of (1), gap(s)₁，s₂) Representing a string s to be matched₁And the length difference of the character string sx to be matched.

In particular, gap(s)₁，s₂) The calculation method comprises the following steps: firstly, the character string s to be matched₁As a reference, a character string s to be matched₁And a character string s to be matched₂The initial character alignment of (1) and the statistics of the character string s to be matched₂String s to be matched₁A small number of characters;then using the character string s to be matched₂As a reference, the character string s to be matched is counted₁String s to be matched₂A small number of characters. Finally, summing the two character numbers to obtain a character string s to be matched₁And a character string s to be matched₂The length difference of (2). Dividing the length difference by the string s to be matched₁And a character string s to be matched₂To measure the string s to be matched₁And a character string s to be matched₂Whether it is the same address.

And if the conversion cost is less than the preset cost threshold value, determining that the entity information corresponding to the two character strings to be matched is the same entity information.

The preset cost threshold may be a preset numerical value used for determining whether the entity information corresponding to the two character strings to be matched is the same as the entity information.

Illustratively, the two address entity information are respectively: a light valley software garden F3 in flood mountain areas of Wuhan city in Hubei province and a light valley software garden F3 in flood mountain areas of Wuhan city in Hubei province. It can be determined that the preset address partition elements in the two pieces of address entity information are all in the provinces of Hubei, the cities of Wuhan and Han, and the mountainous areas, and are completely consistent. Further, two strings to be matched are obtained as follows: character string s to be matched₁: light valley software garden F3 and character string s to be matched₂: a light valley software garden F3 span 17 stories. The character string s to be matched can be determined₂No more than string s to be matched₁The number of characters is 0 because of the small number of characters. It is also possible to determine the string s to be matched₁String s to be matched₂The few characters are "building 17", and the number of characters is 4. Thus, gap(s)₁，s₂)＝4，|s₁|＝7，|s₂I 11, and then determining a transformation cost

S220, determining a weight matrix based on the graph to be recognized and at least one preset abnormal mode.

The weight matrix may be a matrix for measuring the correlation between each entity information in the graph to be identified and the abnormal pattern.

Specifically, the weight matrix may be generated according to the number of times that every two pieces of entity information in the to-be-identified object belong to the same abnormal pattern.

Optionally, for every two pieces of entity information, the weight values of the two pieces of entity information may be determined according to the following formula:

W^M _ij＝#((i，j)∈M)

wherein, W^M _ijRepresenting a weight value between the entity information i and the entity information j, wherein M represents an abnormal mode set, # ((i, j) epsilon M) represents the total times of the entity information i and the entity information j belonging to the set M;

and determining a weight matrix according to the weight values.

Specifically, the total times of every two entity information belonging to the abnormal pattern set are counted, and the weight value W of the two entity information is determined^M _ijFurther, a weight matrix W can be determined^M。

Note that the weight matrix WM determined in the above manner is a symmetric matrix in which the principal diagonal elements are all 0.

And S230, determining a degree matrix of the weight matrix based on the weight matrix, and determining a Laplace matrix based on the weight matrix and the degree matrix.

The degree matrix may be a diagonal matrix formed by the sum of the element values of each column of the weight matrix. The laplacian matrix may be a difference of the degree matrix and the weight matrix.

Specifically, the angle value may be determined according to the following formula:

wherein D is_iiDenotes the ith angle value, W^M _ijRepresenting a weight value between the entity information i and the entity information j;

and determining a degree matrix of the weight matrix according to the angle value.

Further, the laplace matrix is determined according to the following equation:

L^M＝D^M-W^M

wherein L is^MDenotes a Laplace matrix, D^MRepresenting a degree matrix, W^MA weight matrix is represented.

Illustratively, the weight matrix W^MIs composed of

Further, a degree matrix D is determined^MIs composed of

Finally, the Laplace matrix L may be determined^MIs composed of

And S240, determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix.

Wherein the second smallest eigenvalue may be the second smallest value among all eigenvalues of the laplacian matrix. The eigenvector may be the eigenvector corresponding to the second smallest eigenvalue.

Specifically, the eigenvalue and the eigenvector of the laplacian matrix may be calculated, and the eigenvalues are sorted to obtain the eigenvector corresponding to the second small eigenvalue.

L^Mx＝λ₂x

Wherein L is^MDenotes the Laplace matrix, λ₂The second smallest eigenvalue of the Laplace matrix is represented, and x represents the second smallest eigenvalue λ₂The corresponding feature vector.

And S250, sorting the entity information in the graph to be identified according to the element values in the feature vector.

Specifically, the feature vector includes a plurality of element values, and each element value corresponds to one entity information. It can be understood that the ith element in the feature vector corresponds to the ith entity information in the construction of the weight matrix. And then, sequencing the element values in the feature vector in an ascending mode, and reordering the entity information.

It should be noted that the benefits of S220-S250 are: the weight among the entity information in the graph to be identified can be calculated through a preset abnormal mode, and the more the number of the abnormal modes hit by the entity information is, the more abnormal the relation of the entity information is. In this way, the risk degree and the graph structure of the graph to be recognized can be effectively combined, and the subsequent processing is facilitated. The Laplace matrix of the calculation graph is a mathematical processing method, and the method has the advantages that: the above information can be mathematically transformed and a matrix method can be used to extract the relevant information. Since the eigenvector corresponding to the second smallest eigenvalue of the laplacian matrix is an approximate value of the optimal divided vector, the eigenvector corresponding to the second smallest eigenvalue is selected as the representation of the entity information.

And S260, determining the division effectiveness corresponding to each preset division value according to each preset division value, the sorted entity information and the to-be-identified graph, and determining the to-be-identified group according to the division effectiveness.

The preset division value may be each numerical value within a preset interval. The division validity may be a numerical value for measuring the degree of influence of breaking the abnormal pattern when the division results in the community to be identified.

Specifically, each preset division value may be determined according to a preset interval, for example: the preset interval is [ P, M ], wherein P and M are constants, and P is less than M, and any value between P and M can be used as a preset division value. And dividing the characteristic vector corresponding to the second small characteristic value according to each preset division value, wherein each element of the characteristic vector corresponds to one entity information, and the sorted entity information is divided. The graph to be recognized can be divided according to the divided entity information, and the number of damaged abnormal modes in the dividing process is determined so as to determine the dividing effectiveness corresponding to each preset dividing value. If the division effectiveness is larger, the influence degree of the abnormal mode is smaller, a group divided based on a preset division value with the maximum division effectiveness can be used as a group to be identified, and the rest part can be used for the next division of the group to be identified.

Optionally, the partitioning effectiveness corresponding to each preset partitioning value may be determined based on the following steps:

step one, aiming at each preset division value, determining the number of failure modes generated by dividing the graph to be recognized according to the preset division value and the sorted entity information.

And step two, determining the partitioning effectiveness corresponding to the preset partitioning value based on the number of the failure modes and the number of the entity information covered by at least one abnormal mode.

Specifically, the partitioning effectiveness corresponding to the preset partitioning value may be determined according to the following formula:

v (cut) represents the division effectiveness corresponding to the preset division value cut, cutM belongs to M and represents a mode of breaking when the graph to be recognized is divided according to the preset division value cut and the sorted entity information, # (cutM belongs to M) represents the quantity of the breaking modes generated when the graph to be recognized is divided according to the preset division value cut and the sorted entity information, P belongs to at least one abnormal mode and # represents the quantity of the entity information covered by at least one abnormal mode.

It should be noted that the principle of the above formula is: for a good partition, the determined abnormal pattern is not destroyed as much as possible, so the number of abnormal patterns passed by the partition should be as small as possible, and at this time, the effectiveness of the partition is as large as possible.

And S270, deleting the entity information and the connection relation in the community to be identified from the graph to be identified so as to update the graph to be identified and determine a new community to be identified.

It should be noted that the preset partition value with the maximum partition effectiveness is selected as the optimal result, the entity information is partitioned by using the preset partition value, and the partition result is used as the result of the first partition. And taking the remaining part of the first division as a new graph to be identified, repeatedly executing the steps (S220-S260) for division, and continuously iterating until the number of the entity information of the remaining part after division meets a preset condition, and stopping division.

S280, when the quantity of the entity information in the graph to be identified meets a preset condition, determining a risk value of each group to be identified according to each group to be identified and at least one abnormal mode, and determining a target group according to the risk value.

Specifically, for each group to be identified, the risk value of each group to be identified may be calculated by a graph structure formed by the abnormal pattern and each group to be identified. And, whether the community to be identified is the target community can be judged according to the magnitude of the risk value.

Alternatively, the risk value for each group to be identified may be determined based on the following steps:

step one, aiming at each group to be identified, determining the number of entity information of each abnormal mode contained in the group to be identified as the number of sub-entities to be identified.

Specifically, for each group to be identified, the number of entity information of each abnormal pattern included in the group to be identified may be determined as the number of sub-entities to be identified. For example: the group to be recognized a includes 5 number of the sub-entities to be recognized in the abnormal pattern X, 7 number of the sub-entities to be recognized in the abnormal pattern Y, and so on.

And step two, determining the entity information quantity containing each abnormal mode in the graph to be recognized as the total entity quantity to be recognized.

Specifically, the number of entity information of each abnormal pattern included in the graph to be recognized may be determined as the total number of entities to be recognized. For example: the number of the to-be-identified sub-entities in the to-be-identified graph including the abnormal pattern X is 16, the number of the to-be-identified sub-entities in the abnormal pattern Y is 36, and the like.

And step three, determining the risk value of the group to be identified according to the number of the sub-entities to be identified, the total number of the entities to be identified and the entity information number of the group to be identified.

In particular, the risk value of the group to be identified may be determined according to the following formula:

wherein, s (g) represents the risk value of the group g to be identified, # g represents the entity information quantity in the group g to be identified, M represents the M-th abnormal pattern, M represents the abnormal pattern set, # g (M) represents the total quantity of the entities to be identified including the M-th abnormal pattern in the graph to be identified, and # g (M) represents the quantity of the sub-entities to be identified including the M-th abnormal pattern in the group g to be identified.

It should be noted that the principle of the above formula is: the risk value of a community to be identified is related to two factors. The first aspect is the size of the community to be identified, the larger the community size, the higher the aggregation level, measured using log (# g) in the above formula. The advantage of taking the logarithm here is that since the relationship between the risk value and the size of the community is not linear, when the size of the community reaches a certain degree, the risk value is already large and will not increase significantly. The second aspect is the number of entity information that collectively contain abnormal patterns in the community to be identified, the larger the number, the larger the risk value. To accurately describe the factors of the second aspect, the above formula is adopted

Measure the risk degree of the abnormal pattern m, and consider the scarcity of the abnormal pattern m in the graph to be identified

A more scarce abnormal pattern indicates a greater difference from normal behavior, i.e. a higher degree of risk of the abnormal pattern. Meanwhile, the number ratio of entity information of the abnormal mode in the community to be identified is considered

The higher the occupancy, the more significant the abnormal pattern indicating the community to be identified, the greater the risk value.

The technical scheme of the embodiment of the invention comprises the steps of obtaining entity information stored in a platform log, determining the connection relation among the entity information according to the entity information, constructing a graph to be identified based on the entity information and the connection relation, determining a weight matrix based on the graph to be identified and at least one preset abnormal mode, determining a degree matrix of the weight matrix based on the weight matrix, determining a Laplace matrix based on the weight matrix and the degree matrix, further determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix, sequencing the entity information in the graph to be identified according to element values in the eigenvector so as to divide the entity information, determining the division effectiveness corresponding to each preset division value according to each preset division value, the sequenced entity information and the graph to be identified, and determining a group to be identified according to the division effectiveness, deleting entity information and connection relation in the group to be identified from the graph to be identified so as to update the graph to be identified and determine a new group to be identified, determining a risk value of each group to be identified according to each group to be identified and at least one abnormal mode when the quantity of the entity information in the graph to be identified meets a preset condition, and determining a target group according to the risk value, thereby solving the problem that the behaviors of improper recharging and virtual money consumption are difficult to accurately identify, and realizing the technical effect of accurately identifying the target group with the behaviors of improper recharging and virtual money purchasing.

EXAMPLE III

Fig. 6 is a schematic structural diagram of an apparatus for identifying a target group according to a third embodiment of the present invention, where the apparatus includes: a graph to be identified building module 310, a community to be identified determining module 320, a graph to be identified updating module 330 and a target community determining module 340.

The graph to be identified building module 310 is configured to obtain entity information stored in a platform log, determine a connection relationship between the entity information according to the entity information, and build a graph to be identified based on the entity information and the connection relationship; a group to be identified determining module 320, configured to determine a group to be identified according to the map to be identified and at least one preset abnormal pattern; a to-be-identified graph updating module 330, configured to delete entity information and connection relationships in the to-be-identified community from the to-be-identified graph, so as to update the to-be-identified graph and determine a new to-be-identified community; and the target group determining module 340 is configured to determine a risk value of each group to be identified according to each group to be identified and the at least one abnormal pattern when the quantity of the entity information in the graph to be identified meets a preset condition, and determine a target group according to the risk value.

Optionally, the graph to be identified building module is further configured to determine, based on the entity information, node information of the graph to be identified, and build the graph to be identified based on the node information and the connection relationship; wherein each of the node information corresponds to one entity information or two or more entity information having the same content; determining whether at least two entity information are entity information having the same content based on: if the at least two pieces of entity information are not address entity information, determining a character string corresponding to each piece of entity information, and judging whether the character strings are the same to determine whether the at least two pieces of entity information are entity information with the same content; if the at least two pieces of entity information are address entity information, determining character strings to be matched of the at least two pieces of entity information based on a preset address partition element, and determining whether the at least two pieces of entity information are entity information with the same content according to the length of each character string to be matched.

Optionally, the to-be-recognized graph constructing module is further configured to determine that the at least two pieces of entity information are not entity information with the same content if the target fields corresponding to the preset address partition elements of the at least two pieces of entity information are different; if the target fields corresponding to the preset address partition elements of the at least two pieces of entity information are the same, deleting the preset address partition elements of the at least two pieces of entity information, and taking the rest parts as character strings to be matched of the at least two pieces of entity information; determining the transformation cost of two character strings to be matched according to the following formula:

wherein, c(s)₁，s₂) Representing a string s to be matched₁And a character string s to be matched₂Is transformed to a cost, | s₁I represents the string s to be matched₁Length, | s₂I represents the string s to be matched₂Length of (1), gap(s)₁，s₂) Representing a string s to be matched₁And a character string s to be matched₂The length difference of (a);

and if the conversion cost is less than a preset cost threshold value, determining that the entity information corresponding to the two character strings to be matched is the same entity information.

Optionally, the apparatus further comprises: the abnormal pattern construction module is used for constructing at least one abnormal pattern; wherein the abnormal pattern comprises at least one of:

the source city entity information and the destination city entity information corresponding to the account number entity information are different;

the source equipment identification entity information and the destination equipment identification entity information corresponding to the account number entity information are different;

the first mobile phone number entity information corresponding to the account number entity information is different from the second mobile phone number entity information of the third party account number entity information corresponding to the account number entity information.

Optionally, the group to be identified determining module is further configured to determine a weight matrix based on the to-be-identified graph and at least one preset abnormal mode; determining a degree matrix of the weight matrix based on the weight matrix, and determining a laplacian matrix based on the weight matrix and the degree matrix; determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix; sorting the entity information in the graph to be identified according to the element values in the feature vector; and determining the division effectiveness corresponding to each preset division value according to each preset division value, the sorted entity information and the graph to be identified, and determining a group to be identified according to the division effectiveness.

Optionally, the to-be-identified group determining module is further configured to determine, for every two pieces of entity information, a weight value of the two pieces of entity information according to the following formula:

W^M _ij＝#((i，j)∈M)

determining a weight matrix according to each weight value;

the angle value is determined according to the following formula:

determining a degree matrix of a weight matrix according to the pair of angle values;

the laplace matrix is determined according to the following equation:

L^M＝D^M-W^M

wherein L is^MDenotes a Laplace matrix, D^MRepresenting said degree matrix, W^MRepresenting the weight matrix.

Optionally, the group to be recognized determining module is further configured to determine, for each preset division value, a number of failure modes generated by dividing the to-be-recognized graph according to the preset division value and the sorted entity information; and determining the partitioning effectiveness corresponding to the preset partitioning value based on the number of the failure modes and the number of the entity information covered by the at least one abnormal mode.

Optionally, the to-be-identified group determining module is further configured to determine a division effective component corresponding to the preset division value according to the following formula:

v (cut) represents the division effectiveness corresponding to a preset division value cut, cutM is a mode in which M breaks when the graph to be identified is divided according to the preset division value cut and the sorted entity information, # (cutM is a mode in which M) represents the number of the breaking modes generated when the graph to be identified is divided according to the preset division value cut and the sorted entity information, P is an abnormal mode in which M represents the entity information P, and # represents the number of the entity information covered by the abnormal mode.

Optionally, the target group determining module is further configured to determine, for each group to be identified, that the number of entity information in each abnormal mode included in the group to be identified is the number of sub-entities to be identified; determining the entity information quantity containing each abnormal mode in the graph to be identified as the total entity quantity to be identified; and determining the risk value of the group to be identified according to the number of the sub-entities to be identified, the total number of the entities to be identified and the number of the entity information of the group to be identified.

Optionally, the target group determining module is further configured to determine a risk value of the group to be identified according to the following formula:

wherein s (g) represents a risk value of the group g to be identified, # g represents an entity information amount in the group g to be identified, M represents an M-th abnormal pattern, M represents an abnormal pattern set, # g (M) represents a total entity amount to be identified including the M-th abnormal pattern in the graph to be identified, and # g (M) represents a sub-entity amount to be identified including the M-th abnormal pattern in the group g to be identified.

The target group identification device provided by the embodiment of the invention can execute the target group identification method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.

Example four

Fig. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 7 illustrates a block diagram of an exemplary electronic device 40 suitable for use in implementing embodiments of the present invention. The electronic device 40 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, electronic device 40 is embodied in the form of a general purpose computing device. The components of electronic device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401).

Bus 403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 40 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 40 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)404 and/or cache memory 405. The electronic device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. System memory 402 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 408 having a set (at least one) of program modules 407 may be stored, for example, in system memory 402, such program modules 407 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.

The electronic device 40 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), with one or more devices that enable a user to interact with the electronic device 40, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 40 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interface 411. Also, the electronic device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 412. As shown, the network adapter 412 communicates with the other modules of the electronic device 40 over the bus 403. It should be appreciated that although not shown in FIG. 7, other hardware and/or software modules may be used in conjunction with electronic device 40, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 401 executes various functional applications and data processing by running a program stored in the system memory 402, for example, to implement the identification method of a target group provided by the embodiment of the present invention.

EXAMPLE five

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for identifying a target community, the method including:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for identifying a target community, comprising:

2. The method according to claim 1, wherein the constructing the graph to be identified based on the entity information and the connection relationship comprises:

determining each node information of the graph to be identified based on each entity information, and constructing the graph to be identified based on the node information and the connection relation;

wherein each of the node information corresponds to one entity information or two or more entity information having the same content; determining whether at least two entity information are entity information having the same content based on:

if the at least two pieces of entity information are not address entity information, determining a character string corresponding to each piece of entity information, and judging whether the character strings are the same to determine whether the at least two pieces of entity information are entity information with the same content;

if the at least two pieces of entity information are address entity information, determining character strings to be matched of the at least two pieces of entity information based on a preset address partition element, and determining whether the at least two pieces of entity information are entity information with the same content according to the length of each character string to be matched.

3. The method according to claim 2, wherein the determining the character strings to be matched of the at least two entity information based on the preset address partition element comprises:

if the target fields corresponding to the preset address partition elements of the at least two pieces of entity information are different, determining that the at least two pieces of entity information are not entity information with the same content;

if the target fields corresponding to the preset address partition elements of the at least two pieces of entity information are the same, deleting the preset address partition elements of the at least two pieces of entity information, and taking the rest parts as character strings to be matched of the at least two pieces of entity information;

correspondingly, the determining whether the at least two pieces of entity information are entity information with the same content according to the length of each character string to be matched includes:

4. The method according to claim 1, before determining the group to be identified according to the graph to be identified and at least one preset abnormal pattern, further comprising:

constructing at least one abnormal pattern;

wherein the abnormal pattern comprises at least one of:

5. The method according to claim 1, wherein the determining the community to be identified according to the graph to be identified and at least one preset abnormal pattern comprises:

determining a weight matrix based on the graph to be recognized and at least one preset abnormal mode;

determining a degree matrix of the weight matrix based on the weight matrix, and determining a laplacian matrix based on the weight matrix and the degree matrix;

determining a second small eigenvalue corresponding to the Laplace matrix and an eigenvector corresponding to the second small eigenvalue according to the Laplace matrix;

sorting the entity information in the graph to be identified according to the element values in the feature vector;

and determining the division effectiveness corresponding to each preset division value according to each preset division value, the sorted entity information and the graph to be identified, and determining a group to be identified according to the division effectiveness.

6. The method according to claim 5, wherein the determining a weight matrix based on the graph to be recognized and at least one preset abnormal pattern comprises:

for every two pieces of entity information, determining the weight values of the two pieces of entity information according to the following formula:

W^M _ij＝#((i，j)∈M)

determining a weight matrix according to each weight value;

correspondingly, the determining a degree matrix of the weight matrix based on the weight matrix includes:

the angle value is determined according to the following formula:

determining a degree matrix of the weight matrix according to the pair of angle values;

correspondingly, the determining the laplacian matrix based on the weight matrix and the degree matrix includes:

the laplace matrix is determined according to the following equation:

L^M＝D^M-W^M

7. The method according to claim 5, wherein the determining the partitioning validity corresponding to each preset partitioning value according to each preset partitioning value, the sorted entity information and the graph to be recognized comprises:

determining the number of failure modes generated by dividing the graph to be recognized according to the preset division value and the sorted entity information aiming at each preset division value;

and determining the partitioning effectiveness corresponding to the preset partitioning value based on the number of the failure modes and the number of the entity information covered by the at least one abnormal mode.

8. The method according to claim 7, wherein the determining the partition validity corresponding to the preset partition value based on the number of failure modes and the number of entity information covered by the at least one abnormal mode comprises:

determining the dividing effectiveness corresponding to the preset dividing value according to the following formula:

9. The method of claim 1, wherein determining the risk value of each community to be identified according to the community to be identified and the at least one abnormal pattern comprises:

aiming at each group to be identified, determining the number of entity information of each abnormal mode contained in the group to be identified as the number of sub-entities to be identified;

determining the entity information quantity containing each abnormal mode in the graph to be identified as the total entity quantity to be identified;

and determining the risk value of the group to be identified according to the number of the sub-entities to be identified, the total number of the entities to be identified and the number of the entity information of the group to be identified.

10. The method according to claim 9, wherein the determining the risk value of the group to be identified according to the number of sub-entities to be identified, the total number of entities to be identified, and the number of entity information of the group to be identified comprises:

determining a risk value for the group to be identified according to the following formula:

11. An apparatus for identifying a target group, comprising: