CN109213801A - Data digging method and device based on incidence relation - Google Patents

Data digging method and device based on incidence relation Download PDF

Info

Publication number
CN109213801A
CN109213801A CN201810903048.1A CN201810903048A CN109213801A CN 109213801 A CN109213801 A CN 109213801A CN 201810903048 A CN201810903048 A CN 201810903048A CN 109213801 A CN109213801 A CN 109213801A
Authority
CN
China
Prior art keywords
main body
incidence relation
subset
target type
connection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810903048.1A
Other languages
Chinese (zh)
Inventor
梁琛
刘子奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810903048.1A priority Critical patent/CN109213801A/en
Publication of CN109213801A publication Critical patent/CN109213801A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

This specification provides a kind of data digging method based on incidence relation, and the incidence relation is established between several main bodys;The main body includes at least two types, and wherein at least one type is to excavate target type;The described method includes: all main bodys are divided into several connection subsets according to the incidence relation between main body;The connection subset includes at least one member's main body, includes all main bodys for having incidence relation with each member's main body in a connection subset;Data mining is carried out using the connection subset for belonging to the member's main body for excavating target type is contained at least two.

Description

Data digging method and device based on incidence relation
Technical field
This specification be related to technical field of data processing more particularly to a kind of data digging method based on incidence relation and Device.
Background technique
With the development of internet and universal, the various activities carried out based on network are all endlessly generating data, Many enterprises, government even individual etc. all know a large amount of user data.Data mining technology can be from a large amount of data It was found that the information such as valuable knowledge, mode, rule, provide auxiliary for scientific research, business decision, process control etc. and support, The important way utilized as data.
In application scenes, the data record for excavation can embody between the main body of same or different type Incidence relation.For example, incidence relation can be established by transferring accounts to be recorded between paying party and beneficiary;Remember in Account Logon In record, account and there is incidence relation between used equipment when logging in.Data mining based on incidence relation is widely applied In fields such as network security, trade marketings, the data mining demand in these fields usually constantly changes with business development, improves number Speed according to excavation to meeting business need and have great importance in time.
Summary of the invention
In view of this, this specification provides a kind of data digging method based on incidence relation, the incidence relation is established Between several main bodys;The main body includes at least two types, and wherein at least one type is to excavate target type;It is described Method includes:
According to the incidence relation between main body, all main bodys are divided into several connection subsets;The connection subset packet At least one member's main body is included, includes all main bodys that there is incidence relation with each member's main body in a connection subset;
Data mining is carried out using the connection subset for belonging to the member's main body for excavating target type is contained at least two.
This specification additionally provides a kind of data mining device based on incidence relation, and the incidence relation is established several Between a main body;The main body includes at least two types, and wherein at least one type is to excavate target type;Described device packet It includes:
It is connected to subset unit, for according to the incidence relation between main body, all main bodys to be divided into several connection Collection;The connection subset includes at least one member's main body, includes having to be associated with each member's main body in a connection subset All main bodys of relationship;
Execution unit is excavated, contains at least two the connection subset for belonging to the member's main body for excavating target type for using Carry out data mining.
A kind of computer equipment that this specification provides, comprising: memory and processor;Being stored on the memory can The computer program run by processor;When the processor runs the computer program, execute above-mentioned based on incidence relation Data digging method described in step.
This specification additionally provides a kind of computer readable storage medium, is stored thereon with computer program, the calculating When machine program is run by processor, step described in the above-mentioned data digging method based on incidence relation is executed.
By above technical scheme as it can be seen that in the embodiment of this specification, all main bodys with incidence relation are divided to In one connection subset, to contain at least two the connection for belonging to the member's main body for excavating target type in all connection subsets Subset carries out data mining as data source;Due to not including or only including the member's main body for belonging to excavation target type Being connected to influence of the subset to data mining results almost can be ignored, and the embodiment of this specification is having substantially no effect on excavation Under the premise of effect, reduce data volume to be treated when excavation, accelerate the speed of data mining, improves excavation effect Rate.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the data digging method based on incidence relation in this specification embodiment;
It only includes the largest connected of the node for belonging to excavation target type that Fig. 2, which is a kind of in this specification application example, The topology example figure of subgraph;
Fig. 3 is a kind of maximum for the node for belonging to excavation target type comprising two or more in this specification application example The topology example figure of connected subgraph;
Fig. 4 is a kind of hardware structure diagram for running the equipment of this specification embodiment;
Fig. 5 is a kind of building-block of logic of the data mining device based on incidence relation in this specification embodiment.
Specific embodiment
The embodiment of this specification proposes a kind of new data digging method based on incidence relation, and all main bodys are divided It is connected to subset for several, makes in each connection subset to include the main body that there is incidence relation with each of which member's main body, so that It less include that two all connection subsets for belonging to the member's main body for excavating target type carry out data mining as data source, quite In the connection subset for deleting the member's main body for not including or only belonging to comprising one excavation target type, reduces and need to handle Data volume, improve the speed of data mining, due to remain it is all belong to excavate target type member's main bodys between Incidence relation does not influence the effect of data mining substantially.
The embodiment of this specification may operate in any equipment with calculating and storage capacity, such as mobile phone, plate The equipment such as computer, PC (Personal Computer, PC), notebook, server;Can also by operate in two or The logical node of more than two equipment realizes the various functions in this specification embodiment.
In the embodiment of this specification, from the data source for being used to carry out data mining, it can extract between main body Incidence relation.Wherein, data source can be the record of various network activity, and network activity can be related to user, such as User initiates that request, server-side respond user's request, user A to user B buys commodity etc. by account;It can also be only It is related with the node of non-user in network, such as business service end is to database service end request data.Main body can be net It the movable participant of network and carries out when these network activities some or all of in required various resources.Wherein, The participant of network activity can be user account, the server-side for providing a certain network service etc.;Need various resources to be used It can be mark (the i.e. unique identification of equipment, as the Device-ID of Android device, the unique of apple equipment set of user equipment Standby identifier etc.), (InternationalMobile Equipment Identity, world movement are set by the IMEI of user equipment Standby identity code), WiFi (Wireless Fidelity) mark of user equipment access network, customer mobile terminal number, user equipment or fortune MAC (Media Access Control, media access control) address, user equipment or operation service of the equipment of row server-side The IP address of equipment at end etc. can also be identification card number, bank's card number of user etc. in some specific business procedures.
Since the resource used when participating in each side and the progress network activity of network activity is varied, network activity is related to And main body usually there is different types.How type of subject divides, can be according to different subjects pair in practical application scene The influence of data mining results determines that the embodiment of this specification is without limitation.For example, in the first application scenarios, certain The number of devices that a account uses has a certain impact to Result, can be using account as a kind of type of subject, by user The equipment used is as another type of subject;In second of application scenarios, using personal account and using collective's account into Influence of the identical network activity of row to Result is different, then can be using personal account as a kind of type of subject, will Collective's account is as another type of subject;In the third application scenarios, the network activity recorded in data source is without logging in It can carry out, and whether network activity do not influence Result substantially by same account, identical equipment, then it can be with The equipment that account and user are used is as a kind of type of subject.
The participation for usually require multiple main bodys of network activity, a specific network activity are each involved in it Incidence relation is established between a main body.For example, user A buys commodity to user B using mobile phone C, then this purchase commodity Network activity could set up incidence relation user A, mobile phone C and user B these three main bodys between any two.
In most of practical application scenes, data mining primarily directed to one or more certain types of main bodys come into Capable, in other words, the incidence relation between these certain types of main bodys, the emphasis paid close attention to when being data mining, and these are special Determine the incidence relation between the main body of type and the main body of non specified type and between the main body of non specified type to dig data The influence for digging result is then fairly limited.In the embodiment of this specification, these specific types are known as to excavate target type.In reality It, can specific requirements, main body class according to data mining using which type of subject as target type is excavated in the application scenarios of border The influence of the division, different type main body of type to Result etc. is because usually determining, without limitation.For example, identifying black production group In the application scenarios of partner, account is usually to excavate target type;In the application for predicting consumer consumption behavior with ustomer premises access equipment In scene, mobile phone and all conducts of both type of subject of tablet computer can be excavated into target type.
In the embodiment of this specification, the process of the data digging method based on incidence relation is as shown in Figure 1.
It include the main body of at least two types in the embodiment of this specification, in the data source for carrying out data mining, Wherein at least one type is to excavate target type.Based on the network activity recorded in data source, in several same types or Incidence relation is established between different types of main body.
It should be noted that can be according to the characteristics of practical application scene and data mining demand, to select data source In which network activity participant, and/or while carrying out network activity need which resource to be used as main body, and it is true The network activity of settled implementation establishes incidence relation between these main bodys when having which feature;Without limitation.
Step 110, according to the incidence relation between main body, all main bodys are divided into several connection subsets.Each company Logical subset includes at least one member's main body, includes having owning for incidence relation with each member's main body in a connection subset Main body.
According to the network activity recorded in data source, all main bodys that these available network activities are related to and this The incidence relation of a little main body formation when carrying out network activity.All main bodys are divided into several connections according to incidence relation Subset, so that all main bodys with incidence relation all become member's main body of the same connection subset, and connection Member's main body that member's main body of collection is connected to subset with other does not have incidence relation.That is, with one be connected to subset at The relevant main body of member's main body is all member's main body of this connection subset.In this way, member's main body of each connection subset It can directly or indirectly be connected by incidence relation, and be closed between member's main body of different connection subsets without association System.
The embodiment of this specification is to the specific side taken when dividing connection subset according to the incidence relation between main body Formula is without limitation, illustrated below.
It in one implementation, can be using main body as node, using incidence relation as side structure figures.Due to this specification reality The main body at least two types in example are applied, constructed figure is isomery figure.Each maximal connected subgraphs of isomery figure are searched, often A maximal connected subgraphs correspond to a connection subset, and all nodes of each maximal connected subgraphs are corresponding connection subset All member's main bodys.All nodes in isomery figure constitute the set of whole main bodys, and obtain the mistake of maximal connected subgraphs Journey is the process that each main body with incidence relation is divided into the same connection subset.Therefore, a maximal connected subgraphs In each node be corresponding connection subset member's main body, the sides of the maximal connected subgraphs corresponds to connection Concentrate the incidence relation between member's main body.
The concrete mode of maximal connected subgraphs is generated, the embodiment of this specification is equally without limitation.For example, can use Various existing connection algorithms obtain each maximal connected subgraphs of isomery figure.
For another example, maximal connected subgraphs can be generated in the following way: is made with two endpoints on certain side in isomery figure Newly gather for Element generation, if at least one in two endpoints be some have set element if by this have set merge Enter in new set (this has set and no longer exists due to being incorporated to new set), traverse it is all have set after new set is added to Have in set;Traverse isomery figure in all sides after, using obtain each have gather in all elements as one most All nodes of big connected subgraph.Specifically, take a line in isomery figure, if two endpoints on this side be node a and Node b generates the new set T using node a and node b as elementab;Search one by one it is each have set, if node a is certain A element for having set P will then have set P and be merged into new set TabIn, if node b is some element for having set Q To then have set Q and is merged into new set TabIn, traverse that all have will new set T after setabGather as having;To different All sides repeat the above process in composition, resulting each to have set i.e. corresponding to a maximal connected subgraphs.
Step 120, data are carried out using the connection subset for containing at least two the member's main body for belonging to excavation target type It excavates.
After obtaining all connection subsets, counts in each connection subset and belong to the number for excavating member's main body of target type Amount, if not including belonging to the member's main body for excavating target type or only including one to belong to excavation in some connection subset Member's main body of target type does not use the connection subset then in data mining.In other words, belonged to containing at least two The data source used when excavating all connection subsets of member's main body of target type as data mining.
When it only includes the main body of an excavation target type that one, which is connected in subset, what which was embodied is to dig Dig being associated between the main body and the main body of non-excavating target type of target type and between the main body of non-excavating target type Relationship, and cannot reflect the incidence relation between the main body for excavating target type;When do not include in a connection subset excavate mesh When marking the main body of type, what which embodied is the incidence relation between the main body of non-excavating target type, equally cannot The incidence relation between the main body of target type is excavated in reflection.Due to excavating the main body and non-excavating target type of target type Influence of the incidence relation to data mining results between main body and between the main body of non-excavating target type is fairly limited, Data mining is carried out after deleting both connection subsets in data source, data volume to be treated can be reduced, accelerates to excavate Speed, and Result is not influenced substantially.
There is no limit for algorithm used when in this specification embodiment to the concrete mode of data mining, data mining etc.. For example, feature extraction first can be carried out to the connection subset for containing at least two the member's main body for belonging to excavation target type, then The feature of extraction is subjected to data mining as the input of machine learning model;It will can also directly contain at least two and belong to Input of the connection subset of member's main body of target type as machine learning model is excavated, to carry out data mining.
For another example, in the implementation for carrying out connection subset division based on maximal connected subgraphs aforementioned, graphic calculation can be used Method carries out network structure feature extraction to the maximal connected subgraphs for belonging to the node for excavating target type are contained at least two, then makes Further data mining is carried out with the network structure feature of extraction.
As it can be seen that in the embodiment of this specification, all main bodys are divided into several connection subsets, in each connection subset Including having the main body of incidence relation with each of which member's main body, belong to excavation in all connection subsets to contain at least two The connection subset of member's main body of target type carries out data mining as data source;It is equivalent to delete and does not include or only include One belongs to the connection subset for excavating member's main body of target type, reduces data volume to be treated, accelerates data digging The speed of pick improves digging efficiency, and the influence to data mining results almost can be ignored.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
In an application example of this specification, the freight charges that third party's shopping platform provides a user the return of goods are nearly serviced, After user buys freight charges danger, when the merchandise return bought, the available compensation to a certain degree to back freight.In order to anti- Zhi Heichan clique nearly carries out large-scale insurance fraud using freight charges, needs to find the insurance fraud account of clique's form in time.
Since the usually used number of user equipment of Hei Chan clique is limited, inevitably will appear during insurance fraud more A account uses the situation of same user device, therefore when carrying out the data mining of Hei Chan clique discovery, with predetermined amount of time All login behavior records of interior account on a user device are as data source, using account and user equipment as two kinds of main body classes Type.Since the purpose of data mining is desirable to that this abnormal case can be logged on same user equipment by multiple accounts, It was found that the account of Hei Chan clique, therefore using account as excavation target type.
Using each account of data source logged in behavior record as a node, using each user equipment as one Node implements the login behavior (pass i.e. between account node and user equipment node using some user equipment with some account Connection relationship) it is used as side, generate the isomery figure including two types node.
Newly gathered using two endpoints on certain side in isomery figure as Element generation, if at least one in two endpoints is Some, which has element of set this is then had set, is merged into new set, traverse it is all have set after will newly gather conduct Has set;After traversing all sides in isomery figure, to obtain each having all elements in gathering as a maximum All nodes of connected subgraph.In a specific example, a kind of possible treatment process is as follows:
A line is obtained from isomery figure, if two endpoints on this side are node a and node b, generates new set Tab, by Have set in currently not yet existing, it will new set TabIt is added to and has in set, has collection after addition and be combined into Tab
Article 2 side is obtained from isomery figure, if two endpoints on this side are node c and node d, generates new set Tcd;By One lookup has set, this is had set if the element that some has set includes node c and is merged into new set TcdIn, This is had into set if the element that some has set includes node d and is merged into new set TcdIn;Due to node c and node D is not to have set TabElement, be not required to carry out set merging, new set be added to and is had in set, has collection after addition It is combined into TabAnd Tcd
Article 2 side is obtained from isomery figure, if two endpoints on this side are node c and node d, generates new set Tcd;By One lookup has set, this is had set if the element that some has set includes node c and is merged into new set TcdIn, This is had into set if the element that some has set includes node d and is merged into new set TcdIn;Due to node c and node D is not to have set TabElement, be not required to carry out set merging, new set be added to and is had in set, has collection after addition It is combined into TabAnd Tcd
Article 3 side is obtained from isomery figure, if two endpoints on this side are node a and node e, generates new set Tae;By One lookup has set, this is had set if the element that some has set includes node a and is merged into new set TaeIn, This is had into set if the element that some has set includes node e and is merged into new set TaeIn;Since node a is existing Set TabElement, by TabIt is merged into new set Tae, T after mergingaeThere are tri- node a, node b, node e elements;To newly it collect Close TaeIt is added to and has in set, has collection after addition and be combined into TaeAnd Tcd, originally have set TabBecause being merged into TaeAnd No longer exist.
After being repeated the above process to each side remaining in isomery figure, each of obtains having set and correspond to one most Big connected subgraph, each all elements having in set are all nodes in corresponding maximal connected subgraphs.
After obtaining all maximal connected subgraphs, the Account Type node counted in each maximal connected subgraphs (belongs to In excavate target type node) quantity, delete only include an Account Type node maximal connected subgraphs.Due to should With in example, login behavior must will be carried out using account, and each login behavior record is at least in an account and one Incidence relation is established between user equipment, thus in this application example there is no only include user device type node, without Maximal connected subgraphs including Account Type node.
Assuming that two maximal connected subgraphs difference are as shown in Figures 2 and 3, the section of an Account Type is represented in figure with dot Point represents the node of a user device type with rectangle.Maximal connected subgraphs shown in Fig. 2 have 4 user device types The node of node and 1 Account Type, maximal connected subgraphs shown in Fig. 3 have the node and 2 accounts of 2 user device types The node of type;Then delete the maximal connected subgraphs in Fig. 2, the maximal connected subgraphs in reserved graph 3.
When carrying out the data mining of Hei Chan clique discovery, all maximal connected subgraphs that use is not deleted are as data Source.
Present inventor has found in testing, to the isomery figure of more than one hundred million node sizes, with Node2Vec (node to Amount modeling) algorithm carry out network structure feature extraction when, using include it is all log in behavior records data sources, time-consuming up to 45 Hour;And use the maximal connected subgraphs for containing at least two Account Type node as data source, it is time-consuming under same parameter It only needs 8 hours, effect is very significant.
Corresponding with the realization of above-mentioned process, the embodiment of this specification additionally provides a kind of data mining based on incidence relation Device.The device can also be realized by software realization by way of hardware or software and hardware combining.It is implemented in software For, it is CPU (Central Process Unit, the central processing by place equipment as the device on logical meaning Device) by corresponding computer program instructions be read into memory operation formed.For hardware view, in addition to shown in Fig. 4 Except CPU, memory and memory, also typically included based on the equipment where the data mining device of incidence relation for carrying out Other hardware such as the chip of wireless signal transmitting-receiving, and/or for realizing other hardware such as board of network communicating function.
Fig. 5 show a kind of data mining device based on incidence relation of this specification embodiment offer, the association Relationship is established between several main bodys;The main body includes at least two types, and wherein at least one type is to excavate target Type;Described device includes connection subset unit and excavation execution unit, in which: connection subset unit is used for according between main body Incidence relation, by all main bodys be divided into several connection subset;The connection subset includes at least one member's main body, and one It include all main bodys that there is incidence relation with each member's main body in a connection subset;Excavate execution unit be used for using comprising At least two connection subsets for belonging to the member's main body for excavating target type carry out data mining.
In one example, the connection subset unit is specifically used for: constructing using main body as node, by side of incidence relation different Composition generates several maximal connected subgraphs of the isomery figure, using all nodes of each maximal connected subgraphs as one It is connected to member's main body of subset.
In above-mentioned example, the connection subset unit generates several maximal connected subgraphs of isomery figure, comprising: with isomery Two endpoints on certain side are newly gathered as Element generation in figure, if at least one in two endpoints is that some has set Element then has described set and is merged into new set, traverse it is all have to be added to new set after set have set In;After traversing all sides in isomery figure, to obtain each having all elements in gathering as a largest connected son All nodes of figure.
Optionally, the excavation execution unit is specifically used for: belonging to excavation target to containing at least two using nomography The maximal connected subgraphs of the node of type carry out network structure feature extraction.
Optionally, the execution unit that excavates is specifically used for: belonging to the member for excavating target type to containing at least two The connection subset of main body carries out feature extraction;Or, to contain at least two the connection for belonging to the member's main body for excavating target type Input of the subset as machine learning model.
Optionally, the type of subject includes: account, user equipment;The incidence relation includes: that some account uses certain A user equipment implements login behavior;The excavation target type includes: account.
The embodiment of this specification provides a kind of computer equipment, which includes memory and processor. Wherein, the computer program that can be run by processor is stored on memory;Computer program of the processor in operation storage When, execute each step of the data digging method based on incidence relation in this specification embodiment.To based on incidence relation The detailed description of each step of data digging method refer to before content, be not repeated.
The embodiment of this specification provides a kind of computer readable storage medium, is stored with computer on the storage medium Program, these computer programs execute the data in this specification embodiment based on incidence relation and dig when being run by processor Each step of pick method.Before being referred to the detailed description of each step of the data digging method based on incidence relation Content is not repeated.
The foregoing is merely the preferred embodiments of this specification, all the application's not to limit the application Within spirit and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitorymedia), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification can provide as the production of method, system or computer program Product.Therefore, the embodiment of this specification can be used complete hardware embodiment, complete software embodiment or combine software and hardware side The form of the embodiment in face.Moreover, it wherein includes that computer is available that the embodiment of this specification, which can be used in one or more, It is real in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form for the computer program product applied.

Claims (14)

1. a kind of data digging method based on incidence relation, the incidence relation is established between several main bodys;The master Body includes at least two types, and wherein at least one type is to excavate target type;The described method includes:
According to the incidence relation between main body, all main bodys are divided into several connection subsets;The connection subset includes extremely Few member's main body, it includes all main bodys for having incidence relation with each member's main body in subset that one, which is connected to,;
Data mining is carried out using the connection subset for belonging to the member's main body for excavating target type is contained at least two.
2. according to the method described in claim 1, the incidence relation according between main body, all main bodys is divided into several A connection subset, comprising: using main body as node, construct isomery figure by side of incidence relation, generate several of the isomery figure Maximal connected subgraphs are connected to member's main body of subset using all nodes of each maximal connected subgraphs as one.
3. according to the method described in claim 2, described several maximal connected subgraphs for generating isomery figure, comprising: with isomery Two endpoints on certain side are newly gathered as Element generation in figure, if at least one in two endpoints is that some has set Element then has described set and is merged into new set, traverse it is all have to be added to new set after set have set In;After traversing all sides in isomery figure, to obtain each having all elements in gathering as a largest connected son All nodes of figure.
4. according to the method described in claim 1, described use the member's main body for containing at least two and belonging to and excavating target type Connection subset carry out data mining, comprising: the node for excavating target type is belonged to containing at least two using nomography Maximal connected subgraphs carry out network structure feature extraction.
5. according to the method described in claim 1, described use the member's main body for containing at least two and belonging to and excavating target type Connection subset carry out data mining, comprising: to contain at least two belong to excavate target type member's main body connection son Collection carries out feature extraction;Or, to contain at least two the connection subset for belonging to the member's main body for excavating target type as machine The input of learning model.
6. according to the method described in claim 1, the type of subject includes: account, user equipment;The incidence relation packet Include: some account implements login behavior using some user equipment;The excavation target type includes: account.
7. a kind of data mining device based on incidence relation, the incidence relation is established between several main bodys;The master Body includes at least two types, and wherein at least one type is to excavate target type;Described device includes:
It is connected to subset unit, for according to the incidence relation between main body, all main bodys to be divided into several connection subsets;Institute Stating connection subset includes at least one member's main body, includes having incidence relation with each member's main body in a connection subset All main bodys;
Execution unit is excavated, for carrying out using the connection subset for containing at least two the member's main body for belonging to excavation target type Data mining.
8. device according to claim 7, the connection subset unit is specifically used for: using main body as node, to be associated with System is that side constructs isomery figure, several maximal connected subgraphs of the isomery figure is generated, with all of each maximal connected subgraphs Node is connected to member's main body of subset as one.
9. device according to claim 8, the connection subset unit generates several maximal connected subgraphs of isomery figure, It include: newly to be gathered using two endpoints on certain side in isomery figure as Element generation, if at least one in two endpoints is certain A element for having set then has described set and is merged into new set, traverses that all have will new set addition after set To having in set;Traverse isomery figure in all sides after, using obtain each have gather in all elements as one All nodes of maximal connected subgraphs.
10. device according to claim 7, the excavation execution unit is specifically used for: using nomography to comprising at least Two belong to the maximal connected subgraphs progress network structure feature extraction for excavating the node of target type.
11. device according to claim 7, the excavation execution unit is specifically used for: belonging to digging to containing at least two The connection subset for digging member's main body of target type carries out feature extraction;Or, belonging to excavation target type to contain at least two Member's main body input of the connection subset as machine learning model.
12. device according to claim 7, the type of subject includes: account, user equipment;The incidence relation packet Include: some account implements login behavior using some user equipment;The excavation target type includes: account.
13. a kind of computer equipment, comprising: memory and processor;Being stored on the memory can be by processor operation Computer program;When the processor runs the computer program, the step as described in claims 1 to 6 any one is executed Suddenly.
14. a kind of computer readable storage medium, is stored thereon with computer program, the computer program is run by processor When, execute the step as described in claims 1 to 6 any one.
CN201810903048.1A 2018-08-09 2018-08-09 Data digging method and device based on incidence relation Pending CN109213801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810903048.1A CN109213801A (en) 2018-08-09 2018-08-09 Data digging method and device based on incidence relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810903048.1A CN109213801A (en) 2018-08-09 2018-08-09 Data digging method and device based on incidence relation

Publications (1)

Publication Number Publication Date
CN109213801A true CN109213801A (en) 2019-01-15

Family

ID=64988431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810903048.1A Pending CN109213801A (en) 2018-08-09 2018-08-09 Data digging method and device based on incidence relation

Country Status (1)

Country Link
CN (1) CN109213801A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020147595A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system and device for obtaining relationship expression between entities, and advertisement recalling system
WO2020147594A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system
CN112541022A (en) * 2020-12-18 2021-03-23 网易(杭州)网络有限公司 Abnormal object detection method, abnormal object detection device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060149674A1 (en) * 2004-12-30 2006-07-06 Mike Cook System and method for identity-based fraud detection for transactions using a plurality of historical identity records
CN103778151A (en) * 2012-10-23 2014-05-07 阿里巴巴集团控股有限公司 Method and device for identifying characteristic group and search method and device
CN105740274A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Undirected graph-based user account searching method and device
CN105812195A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Method and device for computer to identify batch accounts
CN106301978A (en) * 2015-05-26 2017-01-04 阿里巴巴集团控股有限公司 The recognition methods of gang member account, device and equipment
CN107193894A (en) * 2017-05-05 2017-09-22 北京小度信息科技有限公司 Data processing method, individual discrimination method and relevant apparatus
CN107592296A (en) * 2017-08-02 2018-01-16 阿里巴巴集团控股有限公司 The recognition methods of rubbish account and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060149674A1 (en) * 2004-12-30 2006-07-06 Mike Cook System and method for identity-based fraud detection for transactions using a plurality of historical identity records
CN103778151A (en) * 2012-10-23 2014-05-07 阿里巴巴集团控股有限公司 Method and device for identifying characteristic group and search method and device
CN105740274A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Undirected graph-based user account searching method and device
CN105812195A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Method and device for computer to identify batch accounts
CN106301978A (en) * 2015-05-26 2017-01-04 阿里巴巴集团控股有限公司 The recognition methods of gang member account, device and equipment
CN107193894A (en) * 2017-05-05 2017-09-22 北京小度信息科技有限公司 Data processing method, individual discrimination method and relevant apparatus
CN107592296A (en) * 2017-08-02 2018-01-16 阿里巴巴集团控股有限公司 The recognition methods of rubbish account and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
柴明锐等: "《数据挖掘技术及在石油地质中的应用》", 30 September 2017 *
潘海为: "《医学图像数据挖掘关键技术的研究》", 30 September 2007, 黑龙江人民出版社 *
赵妍: "《面向大数据的挖掘方法研究》", 31 July 2016, 电子科技大学出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020147595A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system and device for obtaining relationship expression between entities, and advertisement recalling system
WO2020147594A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system
CN112541022A (en) * 2020-12-18 2021-03-23 网易(杭州)网络有限公司 Abnormal object detection method, abnormal object detection device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
Béres et al. Blockchain is watching you: Profiling and deanonymizing ethereum users
Pinna et al. A petri nets model for blockchain analysis
Gupta et al. Towards detecting fake user accounts in facebook
CN108389118A (en) The asset management system, method and device, electronic equipment
US20170286190A1 (en) Structural and temporal semantics heterogeneous information network (hin) for process trace clustering
CN111046429B (en) Method and device for establishing relationship network based on privacy protection
CN110941664A (en) Knowledge graph construction method, detection method, device, equipment and storage medium
CN107294974B (en) Method and device for identifying target group
WO2016022720A2 (en) Method and apparatus of identifying a transaction risk
CN109063966A (en) The recognition methods of adventure account and device
CN109344326A (en) A kind of method for digging and device of social circle
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
CN106897930A (en) A kind of method and device of credit evaluation
CN109213801A (en) Data digging method and device based on incidence relation
CN110224859B (en) Method and system for identifying a group
US11270227B2 (en) Method for managing a machine learning model
CN107592296A (en) The recognition methods of rubbish account and device
CN112215616A (en) Method and system for automatically identifying abnormal fund transaction based on network
CN113240505A (en) Graph data processing method, device, equipment, storage medium and program product
CN108694664A (en) Checking method and device, the electronic equipment of operation system
Kumar et al. RETRACTED ARTICLE: Big data analytics to identify illegal activities on Bitcoin Blockchain for IoMT
Wang et al. Identifying DApps and user behaviors on ethereum via encrypted traffic
US20200402061A1 (en) Cryptocurrency transaction pattern based threat intelligence
CN112508630B (en) Abnormal conversation group detection method and device, computer equipment and storage medium
CN113923028A (en) Network micro-isolation strategy self-generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190115

RJ01 Rejection of invention patent application after publication