CN109727635A - A kind of abstracting method of uncertain figure representative instance - Google Patents

A kind of abstracting method of uncertain figure representative instance Download PDF

Info

Publication number
CN109727635A
CN109727635A CN201811486200.7A CN201811486200A CN109727635A CN 109727635 A CN109727635 A CN 109727635A CN 201811486200 A CN201811486200 A CN 201811486200A CN 109727635 A CN109727635 A CN 109727635A
Authority
CN
China
Prior art keywords
uncertain
regression
regression model
representative instance
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811486200.7A
Other languages
Chinese (zh)
Inventor
徐周波
杨健
刘华东
梁轩瑜
黄文文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201811486200.7A priority Critical patent/CN109727635A/en
Publication of CN109727635A publication Critical patent/CN109727635A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention proposes a kind of abstracting method of uncertain figure representative instance, this method comprises: constructing training set and establishing regression model;The regression coefficient of the regression model is determined using gradient descent method;Logic Regression Models are established according to the coefficient of the regression model and the regression model.In implementation procedure of the present invention, the efficiency of problem solving is effectively raised, avoids and ADR algorithm is repeated when there is the new protein Internet to occur, there is good practicability.

Description

A kind of abstracting method of uncertain figure representative instance
Technical field
The present invention relates to uncertain graphical data mining technical fields, and in particular to a kind of extraction of uncertain figure representative instance Method.
Background technique
Uncertain figure, which refers to, describes the uncertainty of data on the basis of traditional diagram data indicates.Due to data Failure that the random error of acquiring technology and measurement error, data are transmitted and delay, multi-source integrated data imperfection with not There is uncertainty in many reasons such as consistency, data-privacy protection, a large amount of diagram datas, traditional diagram data model expression can not Depict uncertainty.The model assigns the probability of each edge appearance on traditional diagram data model to indicate existing for data It is uncertain.Uncertain graph model meets the uncertainty of data well, and has been used for social networks, the protein Internet The data mining in equal fields.
When analyzing uncertain diagram data, in order to reduce the uncertain influence to data result, from uncertain Representative instance is extracted in figure has become people's urgent problem to be solved.Example directly influences in non-determined figure whether extracting quality The correctness of data mining results.The method for solving just extracted at present to uncertain figure example has:
(1) Monte Carlo sampling (Monte-Carlo Method): this method does not know random in the subgraph that figure contains A large amount of example is chosen, carries out data mining respectively in the example of selection, takes final average value.This method is widely transported In the data mining of uncertain figure.However, to ensure that the accuracy of this method, it is necessary to extract a large amount of example, increase Expense on room and time.
(2) MP (MostProbability), GD (GreedyProbability): such method is more implemented more simple It is single, the higher side of probability of occurrence is only chosen, and plus-minus side is carried out to example with the thought of greedy algorithm.Such method expense compared with Small but uncertain figure can not be indicated very well by extracting example, the error of generation is larger.
(3)ADR(Average Degree Rewriting)、ABM(APPROXIMATEB-MATCHING):
Such method considers not only each side probability of occurrence size in uncertain figure, and in view of each Vertex Degree is big It is small, improve the accuracy of example extraction.But complexity is higher, algorithm executes relatively time-consuming.
Summary of the invention
In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide a kind of uncertain figure representative instances Abstracting method.
In order to achieve the above objects and other related objects, the present invention provides a kind of extraction side of uncertain figure representative instance Method, this method comprises:
Building training set simultaneously establishes regression model;
The regression coefficient of the regression model is determined using gradient descent method;
Logic Regression Models are established according to the coefficient of the regression model and the regression model.
Optionally, the regression model are as follows:
Optionally, the method for building up of the regression model, specifically includes:
Representative instance is extracted from uncertain figure g determines figure G;
By the presence of the degree on the vertex of each edge in the determining figure G, the expecting degree on the vertex of each edge and each edge Probability P e is as characteristic value;
Linear representation is established according to the characteristic value;The linear representation is then training set;
The functional value of the linear representation is mapped on [0,1] section.
Optionally, the coefficient that the regression model is determined using gradient descent method, is specifically included:
Construct loss function J;
Sharp gradient descent method solves the regression coefficient of all features.
Optionally, the Logic Regression Models are as follows:
Wherein, θ=[θ1',θ2',θ3',θ4',θ5']T
As described above, a kind of abstracting method of uncertain figure representative instance of the invention, has the advantages that
Invention uses Logistic regression model model, passes through ADR (Average Degree Rewiring) algorithm first Example extraction is carried out to uncertain figure.Then feature and label are extracted by the example extracted as training set and build Vertical regression expression.Then the regression coefficient in regression expression is determined by gradient descent method, so that it is determined that returning expression Formula finally establishes the classifier of example extraction.When needing (non-determined figure) extraction example from the new protein Internet, Input of the network (non-determined figure) that directly protein can be interacted as classifier, and the result that classifier predicts is to take out The side collection taken finally obtains the example (determining figure) extracted from the network.In implementation procedure of the present invention, effectively raise The efficiency of problem solving is avoided and is repeated when there is the new protein Internet to occur to ADR algorithm, is had good Practicability.
Detailed description of the invention
In order to which the present invention is further explained, described content, with reference to the accompanying drawing makees a specific embodiment of the invention Further details of explanation.It should be appreciated that these attached drawings are only used as typical case, and it is not to be taken as to the scope of the present invention It limits.
Fig. 1 is the non-determined figure of the given example figure of the preferred embodiment of the present invention;
Fig. 2 is the example that the given example figure of the preferred embodiment of the present invention is extracted using ADR algorithm;
Fig. 3 is flow chart of the invention.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.It should be noted that in the absence of conflict, following embodiment and implementation Feature in example can be combined with each other.
It should be noted that illustrating the basic structure that only the invention is illustrated in a schematic way provided in following embodiment Think, only shown in schema then with related component in the present invention rather than component count, shape and size when according to actual implementation Draw, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its assembly layout kenel It is likely more complexity.
The present invention discloses a kind of training method of non-determined figure Exemplary classes device based on ADR algorithm, uses Logistic regression model.Example is selected on uncertain figure using ADR algorithm first, and to uncertain figure detected Label is added on side, is 1 if selected, is otherwise 0.Post analysis do not know figure interior joint degree, node expecting degree, side occur Probability, whether the information such as selected, extract feature, construct Logistic regression model.According to gained regression model, damage is established Function is lost, regression coefficient is determined by gradient descent method, regression model is finally determined, to train classifier.When input is new Uncertain figure when, only need to can determine the example that can represent non-determined figure to classifier input feature vector value, classifier.Due to Logistic regression model is a kind of generalized linear regression model, therefore when there is new uncertain figure to carry out example extraction, effectively Improve example extraction efficiency, have good practicability.
Below by taking the INTERACTION PROBLEMS between protein as an example, the present invention is described in more detail.
Fig. 1 is the network of protein interaction.Wherein vertex representation protein, side indicate protein interaction.Due to protein There are constant errors for interactive high-throughput Measurement for Biotechnique (such as yeast-two hybrid technique), therefore the protein that experiment measures is handed over Whether be uncertain, therefore the weight on side indicates a possibility that protein interaction really exists if mutually really existing.To egg White matter is detected, and the uncertainty in protein should be first eliminated.ADR algorithm effectively eliminates from uncertain figure uncertain Property, the example (determining figure) that can represent uncertain figure is extracted, and then protein detection can be carried out to the example extracted Sequence of operations.The present invention is to improve the efficiency extracted to uncertain figure example, therefore establish regression model, is effectively accelerated The efficiency that new uncertain figure example is extracted.
As shown in figure 3, the present invention provides a kind of abstracting method of uncertain figure representative instance, this method comprises:
Step 1, it constructs training set and establishes regression model;
Step 11, given uncertain figure g, as shown in Figure 1, being G by the representative instance that ADR algorithm is extracted, such as Fig. 2 institute Show.
The example G that step 12, analysis are extracted by ADR algorithm.Will scheme G in each edge vertex degree (deg (u), Deg (v)), the expecting degree [deg (u)] on the vertex of each edge, [deg (v)], each edge Probability p e as feature x1,x2,x3, x4,x5.Its respective function value y is 1 if the side is selected, is otherwise 0. such as matrix
One sample of each behavior of X, each feature for being classified as sample of X.
Step 13 establishes linear representation z=θ according to extracted feature1x12x23x34x45x5, wherein θ1~ θ5It is 1 when initial for regression coefficient.Vector is expressed as z=θTX.Z=X when then initial.
Step 14, according to Sigmoid function, the functional value of expression formula z is mapped on [0,1] section.Sigmoid function Specific calculation formula is as follows:
Wherein, z=θ1x12x23x34x45x5, vector is expressed as z=θTX。
Step 2. determines the regression coefficient of the regression model using gradient descent method.
Step 21, construction loss function J
Cost function and J function are as follows, they be obtained based on maximal possibility estimation reasoning (total sample number m, each Sample has n sample):
Step 22, gradient descent method solve J functional minimum value:
θ renewal process can be write as:
Step 23 repeats step 22, until the number of iterations reaches setting value, then executes step 24;
Step 24, the regression coefficient θ=[θ for finding out all features1',θ2',θ3',θ4',θ5']T.Wherein, θ1'~θ5' be back Return coefficient.
Step 3. establishes the expression formula of Logic Regression Models according to the coefficient of the regression model and the regression model:
This implementation uses Logistic regression model model, is calculated first by ADR (Average Degree Rewiring) Method carries out example extraction to uncertain figure.Then feature and label are extracted by the example extracted as training set simultaneously Establish regression expression.Then the regression coefficient in regression expression is determined by gradient descent method, so that it is determined that returning table Up to formula, the classifier of example extraction is finally established.When (non-determined figure) extracts example to needs from the new protein Internet When, input of the network (non-determined figure) that can directly interact the protein as classifier, and the result that classifier predicts For the side collection of extraction, the example (determining figure) extracted from the network is finally obtained.It is effective to improve in algorithm implementation procedure The efficiency of problem solving, avoids and repeats when there is the new protein Internet to occur to ADR algorithm, has good Good practicability.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as At all equivalent modifications or change, should be covered by the claims of the present invention.

Claims (5)

1. a kind of abstracting method of uncertain figure representative instance, which is characterized in that this method comprises:
Building training set simultaneously establishes regression model;
The regression coefficient of the regression model is determined using gradient descent method;
Logic Regression Models are established according to the coefficient of the regression model and the regression model.
2. a kind of abstracting method of uncertain figure representative instance according to claim 1, which is characterized in that the recurrence mould Type are as follows:
3. a kind of abstracting method of uncertain figure representative instance according to claim 2, which is characterized in that the recurrence mould The method for building up of type, specifically includes:
Representative instance is extracted from uncertain figure g determines figure G;
By the existing probability of the degree on the vertex of each edge in the determining figure G, the expecting degree on the vertex of each edge and each edge Pe is as characteristic value;
Linear representation is established according to the characteristic value;The linear representation is then training set;
The functional value of the linear representation is mapped on [0,1] section.
4. a kind of abstracting method of uncertain figure representative instance according to claim 3, which is characterized in that described to use ladder Degree descent method determines the coefficient of the regression model, specifically includes:
Construct loss function J;
Sharp gradient descent method solves the regression coefficient of all features.
5. a kind of abstracting method of uncertain figure representative instance according to claim 4, which is characterized in that the logic is returned Return model are as follows:
Wherein, θ=[θ1',θ2',θ3',θ4',θ5']T
CN201811486200.7A 2018-12-06 2018-12-06 A kind of abstracting method of uncertain figure representative instance Pending CN109727635A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811486200.7A CN109727635A (en) 2018-12-06 2018-12-06 A kind of abstracting method of uncertain figure representative instance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811486200.7A CN109727635A (en) 2018-12-06 2018-12-06 A kind of abstracting method of uncertain figure representative instance

Publications (1)

Publication Number Publication Date
CN109727635A true CN109727635A (en) 2019-05-07

Family

ID=66295300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811486200.7A Pending CN109727635A (en) 2018-12-06 2018-12-06 A kind of abstracting method of uncertain figure representative instance

Country Status (1)

Country Link
CN (1) CN109727635A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617435A (en) * 2013-12-16 2014-03-05 苏州大学 Image sorting method and system for active learning
CN106250928A (en) * 2016-07-30 2016-12-21 哈尔滨工业大学深圳研究生院 Parallel logic homing method based on Graphics Processing Unit and system
CN106411965A (en) * 2016-12-22 2017-02-15 北京知道创宇信息技术有限公司 Method for determining network server providing counterfeit service, equipment and calculating equipment thereof
CN107678803A (en) * 2017-09-30 2018-02-09 广东欧珀移动通信有限公司 Using management-control method, device, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617435A (en) * 2013-12-16 2014-03-05 苏州大学 Image sorting method and system for active learning
CN106250928A (en) * 2016-07-30 2016-12-21 哈尔滨工业大学深圳研究生院 Parallel logic homing method based on Graphics Processing Unit and system
CN106411965A (en) * 2016-12-22 2017-02-15 北京知道创宇信息技术有限公司 Method for determining network server providing counterfeit service, equipment and calculating equipment thereof
CN107678803A (en) * 2017-09-30 2018-02-09 广东欧珀移动通信有限公司 Using management-control method, device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋少莹: "《不确定图的代表实例发现算法》", 《硕士论文》 *

Similar Documents

Publication Publication Date Title
WO2020259582A1 (en) Neural network model training method and apparatus, and electronic device
CN104063876B (en) Interactive image segmentation method
CN110533086B (en) Semi-automatic image data labeling method
CN110991532B (en) Scene graph generation method based on relational visual attention mechanism
CN105893349A (en) Category label matching and mapping method and device
CN112364880A (en) Omics data processing method, device, equipment and medium based on graph neural network
CN110019519A (en) Data processing method, device, storage medium and electronic device
Strupczewski et al. On seasonal approach to flood frequency modelling. Part I: Two‐component distribution revisited
CN106997373A (en) A kind of link prediction method based on depth confidence network
CN109783629A (en) A kind of micro-blog event rumour detection method of amalgamation of global event relation information
CN106339366A (en) Method and device for requirement identification based on artificial intelligence (AI)
CN106815215B (en) The method and apparatus for generating annotation repository
CN109948242A (en) Network representation learning method based on feature Hash
Santiago et al. A methodology for the characterization of flow conductivity through the identification of communities in samples of fractured rocks
Wu et al. Model validation using invariant signatures and logic-based inference for automated building code compliance checking
Kalfarisi et al. Detecting and geolocating city-scale soft-story buildings by deep machine learning for urban seismic resilience
CN114331380A (en) Method, system, equipment and storage medium for predicting occupational flow relationship
Zorn et al. Replacing energy simulations with surrogate models for design space exploration
CN109543114A (en) Heterogeneous Information network linking prediction technique, readable storage medium storing program for executing and terminal
Bloch et al. Graph-based learning for automated code checking–Exploring the application of graph neural networks for design review
CN109727635A (en) A kind of abstracting method of uncertain figure representative instance
WO2024021350A1 (en) Image recognition model training method and apparatus, computer device, and storage medium
CN104834958B (en) A kind of method and apparatus judged the step of answer
Mao et al. Building façade semantic segmentation based on K-means classification and graph analysis
CN105761152A (en) Topic participation prediction method based on triadic group in social network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190507