CN106777284A - A kind of figure migration method for expressing based on label information - Google Patents

A kind of figure migration method for expressing based on label information Download PDF

Info

Publication number
CN106777284A
CN106777284A CN201611245749.8A CN201611245749A CN106777284A CN 106777284 A CN106777284 A CN 106777284A CN 201611245749 A CN201611245749 A CN 201611245749A CN 106777284 A CN106777284 A CN 106777284A
Authority
CN
China
Prior art keywords
node
migration
label information
probable value
neighbor node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611245749.8A
Other languages
Chinese (zh)
Inventor
李涛
王次臣
李华康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201611245749.8A priority Critical patent/CN106777284A/en
Publication of CN106777284A publication Critical patent/CN106777284A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of figure migration method for expressing based on label information, diagram data is loaded first, set up the data structure of each corresponding neighbor node of figure node and label information, it is each node in figure, calculate the probable value that the neighbor node of the node is arrived by migration, realization is randomly choosed several times from the neighbor node of the node, and the selected probability of each neighbor node meets the probable value being calculated;According to probable value obtained in the previous step and other migration parameters, start migration, obtain some migration paths;It is trained according to migration path, obtains the vector representation form of term vector, i.e. each figure node;The classification task of multi-tag, the classifying quality of check algorithm are carried out to figure node.The present invention can more embody the characteristic information of label in multi-tag classification task by the vector representation of the figure node for generating, so that the accuracy of multi-tag classification is obviously improved with the increase of the scale parameter of reference label information guiding migration.

Description

A kind of figure migration method for expressing based on label information
Technical field
The present invention is devised one kind and is believed using part labels for the multi-tag classification task of node in large-scale graph data Breath instructs the migration method between figure node, realizes the vector representation study to figure node.
Background technology
The expression study of diagram data provides possibility to carry out the mining analysis of diagram data using machine learning algorithm.Consider To in diagram data, node typically represents an entity object, while representing certain relation between two entity objects, Er Qieren Meaning a line, it is possible to use a pair of nodes are to unique sign.Therefore, for two elements in figure concept:Node and side, at present Diagram data represent that learning algorithm is all based on a node and represents a sample data, study represents a mark sheet for node Show.Node in figure is expressed as 3 meanings of characteristic vector:It is expressed as characteristic vector by by figure node, we can apply The ripe algorithm for having existed is excavated to diagram data, it is to avoid for the design data for being again different graph structures is individually calculated Method.
For vector data, there are the concept for data analysis of maturation, such as distance, inner product between vector etc., and The instrument of analysis, vector form is expressed as by figure node, in that context it may be convenient to carry out data analysis using the concept and property of vector Work.
For with the large-scale diagram data of complicated annexation, it is difficult to obtain the potential relation of diagram data.But, lead to Cross and figure node is expressed as low-dimensional vector, visual analysis and displaying can be carried out to the relation between node.
The method that traditional figure ode table dendrography is practised is included based on spectral method, based on optimization and based on probability production The figure ode table of model shows learning algorithm.With the popularization of deep learning thought, there is scholar to propose in recent years a kind of based on migration Figure ode table show learning algorithm.
Figure ode table based on migration shows that learning algorithm is the theoretical method that make use of word2vec, and in knowledge mapping Using entity, attribute and its between contact build semantic network thinking, by reverse thinking, by the knot in common graph structure Point has carried out analogy with the word unit in natural language processing, and the class of access path one by one in figure is compared into natural language A sentence in treatment;Using cooccurrence relation between each word is solved in probabilistic language model, (i.e. all of condition is general Rate parameter) method inquire into the attachment structure between figure node;Node in figure is generated using the method for generating term vector Vector representation method.The vector of the figure node obtained by this analogy algorithm, reflects corresponding diagram node and surrounding neighbours knot The architectural feature of point contact, while realizing the low-dimensional vector representation of figure node, this is just that some data based on diagram data are dug The classification of pick and parser, such as figure node, link prediction, community discovery etc., there is provided a new treatment or excellent The thinking of change.
Graph structure is a kind of fewer to the constraint between data, organizes the structure of more random data storage, this Result in it is that may be present in diagram data, relative to our aim of learning redundancy even mistake data relationship.When When carrying out migration in figure, being not added with guidance ground completely random migration will introduce substantial amounts of noise, and influence is special for figure node The extraction levied.For the specific treatment scene of diagram data, or under the aim of learning, it is not that figure node has the standard of similitude With.By defining certain rules guide migration, it is possible to achieve two knots with similitude under corresponding similar standard The character representation of point also has close space length.The classification problem of multi-tag is the FAQs in graphical data mining, It is the main task of evaluation figure ode table dendrography habit algorithm effect at present.On different diagram data collection, label may possess not Same implication, such as in social network diagram, label can represent the hobby of user or affiliated corporations etc..
The content of the invention
The present invention is directed in multi-tag classification task, in the figure node character representation learning algorithm based on migration, migration Random process devise using the label information of part to instruct migration.
To reach above-mentioned purpose, the present invention proposes a kind of figure migration method for expressing based on label information, comprising following step Suddenly:
S1:Loading diagram data, it is established that the data structure of each corresponding neighbor node of figure node and label information;
S2:It is each node in figure, calculates the probable value that the neighbor node of the node is arrived by migration, realizes from the node Neighbor node in randomly choose several times, the selected probability of each neighbor node meets the probable value being calculated;
S3:According to probable value obtained in the previous step and other migration parameters, start migration, obtain some migration roads Footpath;
S4:It is trained according to migration path, obtains the vector representation form of term vector, i.e. each figure node;
S5:The classification task of multi-tag, the classifying quality of check algorithm are carried out to figure node.
Further, in S2 steps it is tag attributes according to the node and its neighbor node, and the label information specified Ratio adjustable parameter p calculates the probable value that the neighbor node of the node is arrived by migration.
Realized from the random selection in the neighbor node of the node using alias method in S2 steps.
Other migration parameters in S3 steps include migration length.
Be trained in S4 steps is completed by calling word2vec algorithms.
The beneficial effects of the present invention are:
1, by setting the parameter in walk process, reference label information guiding migration is next in can adjusting walk process The ratio of step so that the number of the feature containing tag along sort information in the character representation of figure node it is flexible adjustable, realize In walk process be more fitted it is flexible between the target of this multi-tag classification and the more extensive whole diagram data of study Property.
2, the vector representation of the figure node generated by the algorithm can more embody label in multi-tag classification task Characteristic information, so that the accuracy of multi-tag classification is with the increase of the scale parameter of reference label information guiding migration It is obviously improved.
Brief description of the drawings
Fig. 1 is overall algorithm performs process of the invention.
Fig. 2 is the present invention for the neighbor node of each figure node is calculated by the flow chart of migration probability.
Fig. 3 is to use the specific migration flow of label instructions migration.
Specific embodiment
In order that the purpose of the present invention, technical scheme and advantage become more apparent, below in conjunction with accompanying drawing by specific real The present invention is described in more detail to apply example.It should be appreciated that specific embodiment described herein is only used to explain the present invention, and It is not used in the restriction present invention.
Algorithm overall calculation process is given below:
The first step:Loading diagram data, it is established that the data knot of each corresponding neighbor node of figure node and label information Structure;
Second step:It is each node in figure, according to the node and the tag attributes of its neighbor node, and the mark specified Information scales adjustable parameter p is signed, the probable value that the neighbor node of the node is arrived by migration is calculated, and use alias method realities Now randomly choosed from the neighbor node of the node several times, the selected probability of each neighbor node meets be calculated general Rate value.
3rd step:According to probable value obtained in the previous step and other migration parameters, such as migration length starts migration, Obtain some migration paths;
4th step:According to migration path, call word2vec algorithms to be trained, obtain term vector, that is, each The vector representation form of figure node;
5th step:The classification task of multi-tag, the classifying quality of check algorithm are carried out to figure node.
Fig. 1 is overall execution process of the invention, is specifically included:
Step 1:Loading diagram data and label information.Wherein, for ease of subsequent treatment, the data of different tissues form Collection is converted into the dictionary structure of unified similar connection table, i.e., the value of figure node is used as the key in dictionary, the neighbour of each node The label information for occupying node or the node is organized into a list, as corresponding value, so as to obtain expression diagram data Dictionary G, and represent label information dictionary T;
Step 2:The process of migration probability is calculated, its principle is as follows.If having N number of node in pending diagram data, trip The current node of process is walked for C, a node is selected from the neighbor node of C below as the next knot in migration path Point, it is assumed that C nodes have E neighbor node, are expressed as
Neighbors (C)={ n1,n2,n3,L,nE},0≤E<N (0.1)
The neighbor node for possessing common tag with node C in neighbors (C) is expressed as simultaneously
Common (C)={ m1,m2,m3,L,mk},0≤k≤E (0.2)
Obvious common (C) belongs to the subclass of neighbors (C).If D nodes are selected as the next of C nodes By migration, wherein D belongs to neighbors (C) set to node.In the realization of this algorithm, we require that node D belongs to common (C) probability of set
P (D ∈ common (C))=p, D ∈ neighbors (C) (0.3)
Wherein, Probability p is the migration parameter that we set before migration starts.To achieve it, our needs are Node C calculates one group of new variable, i.e., by calculating
A probability arrived by migration is distributed come each neighbor node for C nodes.
Fig. 2 lists a kind of neighbor node for calculating each figure node and realizes flow by the probability of migration in detail.Its Include, in counting the neighbor node of each node first, the index with the neighbours that the node has common tag, and with this Node has neighbours' number of common tag.If not there is the neighbours of common tag with the node, then, it is each neighbour Distribution identical is by the probability of migration.Otherwise, it is that each neighbor node is calculated by the probable value of migration using formula 1.4.
Then, this group of probable value is passed to the alias_setup methods in AliasMethod algorithms for we, is set up Alias_nodes variables in AliasMethod algorithms.Alias_nodes variables correspond again to a dictionary structure, wherein Key be still all nodes in figure, value is isometric with the node neighbor list, is adjusted to by migration probability sequence Two probability sequences after whole.In alias_draw methods in AliasMethod algorithms by using random number and this Two probability sequences are compared, and will return to a subscript index.When alias_draw methods repeatedly are called, return Subscript index probability distribution will meet the probability value sequence by migration that we specify.
Step 3:Start migration.Fig. 3 is given using the probability sequence being the previously calculated, and instructs the migration between node Implement process.Wherein, in next node of selection migration every time, by calling alias_draw way access current The alias_nodes variables of node select the index of next migration node, it is achieved thereby that there is the migration of guidance.
Step 4:According to migration set of paths obtained in the previous step, word2vec methods are called, calculate each figure node Vector representation.
Step 5:Many marks are carried out to the characteristic vector of figure node using common sorting algorithm (such as Logic Regression Models) Sign classification.
In sum, the present invention shows learning algorithm in migration for the existing large-scale figure ode table based on migration Compare excessively random in journey, the node characteristic matching needed in the character representation and application scenarios of the figure node for causing is spent low Problem, devises a kind of in multi-tag classification task, the method that migration is instructed using label information.By setting a ratio Parameter p, it is possible to achieve adjustment label information plays the power of directive function in walk process, and then has reached the spy of figure node Levy and represent flexible adjustable with the matching degree of the node label characteristics under multi-tag classification scene.
The foregoing is only of the invention and be preferable to carry out case, be not intended to limit the invention, although with reference to foregoing Embodiment has been described in detail to the present invention, and for a person skilled in the art, it still can be to foregoing each reality Apply the technical scheme described in example to be improved, or which part technology is replaced on an equal basis.It is all in spirit of the invention Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims (5)

1. a kind of figure migration method for expressing based on label information, it is characterised in that comprise the steps of:
S1:Loading diagram data, it is established that the data structure of each corresponding neighbor node of figure node and label information;
S2:It is each node in figure, calculates the probable value that the neighbor node of the node is arrived by migration, realizes the neighbour from the node Occupy and randomly choose several times in node, the selected probability of each neighbor node meets the probable value being calculated;
S3:According to probable value obtained in the previous step and other migration parameters, start migration, obtain some migration paths;
S4:It is trained according to migration path, obtains the vector representation form of term vector, i.e. each figure node;
S5:The classification task of multi-tag, the classifying quality of check algorithm are carried out to figure node.
2. the figure migration method for expressing based on label information according to claim 1, it is characterised in that be root in S2 steps The neighbour of the node is calculated according to the node and the tag attributes of its neighbor node, and the label information ratio adjustable parameter p for specifying Occupy the probable value that node is arrived by migration.
3. the figure migration method for expressing based on label information according to claim 1, it is characterised in that used in S2 steps Alias method are realized from the random selection in the neighbor node of the node.
4. the figure migration method for expressing based on label information according to claim 1, it is characterised in that its in S3 steps He includes migration length at migration parameter.
5. the figure migration method for expressing based on label information according to claim 1, it is characterised in that carried out in S4 steps Training is completed by calling word2vec algorithms.
CN201611245749.8A 2016-12-29 2016-12-29 A kind of figure migration method for expressing based on label information Pending CN106777284A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611245749.8A CN106777284A (en) 2016-12-29 2016-12-29 A kind of figure migration method for expressing based on label information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611245749.8A CN106777284A (en) 2016-12-29 2016-12-29 A kind of figure migration method for expressing based on label information

Publications (1)

Publication Number Publication Date
CN106777284A true CN106777284A (en) 2017-05-31

Family

ID=58928833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611245749.8A Pending CN106777284A (en) 2016-12-29 2016-12-29 A kind of figure migration method for expressing based on label information

Country Status (1)

Country Link
CN (1) CN106777284A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019072063A1 (en) * 2017-10-10 2019-04-18 阿里巴巴集团控股有限公司 Random walking and cluster-based random walking method, apparatus and device
CN110019989A (en) * 2019-04-08 2019-07-16 腾讯科技(深圳)有限公司 A kind of data processing method and device
US10901971B2 (en) 2017-10-10 2021-01-26 Advanced New Technologies Co., Ltd. Random walking and cluster-based random walking method, apparatus and device
WO2021024080A1 (en) * 2019-08-05 2021-02-11 International Business Machines Corporation Active learning for data matching
US11663275B2 (en) 2019-08-05 2023-05-30 International Business Machines Corporation Method for dynamic data blocking in a database system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019072063A1 (en) * 2017-10-10 2019-04-18 阿里巴巴集团控股有限公司 Random walking and cluster-based random walking method, apparatus and device
CN109658094A (en) * 2017-10-10 2019-04-19 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
US10901971B2 (en) 2017-10-10 2021-01-26 Advanced New Technologies Co., Ltd. Random walking and cluster-based random walking method, apparatus and device
TWI687820B (en) * 2017-10-10 2020-03-11 香港商阿里巴巴集團服務有限公司 Random walk, cluster-based random walk method, device and equipment
US10776334B2 (en) 2017-10-10 2020-09-15 Alibaba Group Holding Limited Random walking and cluster-based random walking method, apparatus and device
CN109658094B (en) * 2017-10-10 2020-09-18 阿里巴巴集团控股有限公司 Random walk, random walk method based on cluster, random walk device and equipment
WO2020207197A1 (en) * 2019-04-08 2020-10-15 腾讯科技(深圳)有限公司 Data processing method and apparatus, electronic device, and storage medium
CN110019989A (en) * 2019-04-08 2019-07-16 腾讯科技(深圳)有限公司 A kind of data processing method and device
US11450042B2 (en) 2019-04-08 2022-09-20 Tencent Technology (Shenzhen) Company Limited Data processing for generating a random walk sequence
CN110019989B (en) * 2019-04-08 2023-11-03 腾讯科技(深圳)有限公司 Data processing method and device
WO2021024080A1 (en) * 2019-08-05 2021-02-11 International Business Machines Corporation Active learning for data matching
GB2600369A (en) * 2019-08-05 2022-04-27 Ibm Active learning for data matching
US11409772B2 (en) 2019-08-05 2022-08-09 International Business Machines Corporation Active learning for data matching
US11663275B2 (en) 2019-08-05 2023-05-30 International Business Machines Corporation Method for dynamic data blocking in a database system

Similar Documents

Publication Publication Date Title
CN106777284A (en) A kind of figure migration method for expressing based on label information
CN107330115A (en) A kind of information recommendation method and device
CN106383816B (en) The recognition methods of Chinese minority area place name based on deep learning
CN108509411A (en) Semantic analysis and device
CN110263780A (en) Realize the method, apparatus and equipment of isomery figure, spatial configuration of molecules property identification
CN110033022A (en) Processing method, device and the storage medium of text
CN108287864A (en) A kind of interest group division methods, device, medium and computing device
CN109145965A (en) Cell recognition method and device based on random forest disaggregated model
WO2018196718A1 (en) Image disambiguation method and device, storage medium, and electronic device
CN110276442A (en) A kind of searching method and device of neural network framework
CN108053030A (en) A kind of transfer learning method and system of Opening field
CN109993102A (en) Similar face retrieval method, apparatus and storage medium
CN110472062A (en) The method and device of identification name entity
CN106789338B (en) Method for discovering key people in dynamic large-scale social network
CN112308115A (en) Multi-label image deep learning classification method and equipment
CN106844458A (en) Show method, computing device and the storage medium of user&#39;s internet behavior track
CN108364068A (en) Deep learning neural network construction method based on digraph and robot system
CN113569523A (en) PCB automatic wiring method and system based on line sequence simulation
CN109726331A (en) The method, apparatus and computer-readable medium of object preference prediction
CN106874339A (en) A kind of methods of exhibiting of circulant Digraph and its application
CN108763574A (en) A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour
JP2017026482A (en) Data processor, determination tree generation method, identification device, and program
CN111728302A (en) Garment design method and device
Ahmad et al. A novel adaptive learning path method
CN110047569A (en) Method, apparatus and medium based on rabat report generation question and answer data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication