CN109727635A - A kind of abstracting method of uncertain figure representative instance - Google Patents
A kind of abstracting method of uncertain figure representative instance Download PDFInfo
- Publication number
- CN109727635A CN109727635A CN201811486200.7A CN201811486200A CN109727635A CN 109727635 A CN109727635 A CN 109727635A CN 201811486200 A CN201811486200 A CN 201811486200A CN 109727635 A CN109727635 A CN 109727635A
- Authority
- CN
- China
- Prior art keywords
- uncertain
- regression
- regression model
- representative instance
- coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention proposes a kind of abstracting method of uncertain figure representative instance, this method comprises: constructing training set and establishing regression model;The regression coefficient of the regression model is determined using gradient descent method;Logic Regression Models are established according to the coefficient of the regression model and the regression model.In implementation procedure of the present invention, the efficiency of problem solving is effectively raised, avoids and ADR algorithm is repeated when there is the new protein Internet to occur, there is good practicability.
Description
Technical field
The present invention relates to uncertain graphical data mining technical fields, and in particular to a kind of extraction of uncertain figure representative instance
Method.
Background technique
Uncertain figure, which refers to, describes the uncertainty of data on the basis of traditional diagram data indicates.Due to data
Failure that the random error of acquiring technology and measurement error, data are transmitted and delay, multi-source integrated data imperfection with not
There is uncertainty in many reasons such as consistency, data-privacy protection, a large amount of diagram datas, traditional diagram data model expression can not
Depict uncertainty.The model assigns the probability of each edge appearance on traditional diagram data model to indicate existing for data
It is uncertain.Uncertain graph model meets the uncertainty of data well, and has been used for social networks, the protein Internet
The data mining in equal fields.
When analyzing uncertain diagram data, in order to reduce the uncertain influence to data result, from uncertain
Representative instance is extracted in figure has become people's urgent problem to be solved.Example directly influences in non-determined figure whether extracting quality
The correctness of data mining results.The method for solving just extracted at present to uncertain figure example has:
(1) Monte Carlo sampling (Monte-Carlo Method): this method does not know random in the subgraph that figure contains
A large amount of example is chosen, carries out data mining respectively in the example of selection, takes final average value.This method is widely transported
In the data mining of uncertain figure.However, to ensure that the accuracy of this method, it is necessary to extract a large amount of example, increase
Expense on room and time.
(2) MP (MostProbability), GD (GreedyProbability): such method is more implemented more simple
It is single, the higher side of probability of occurrence is only chosen, and plus-minus side is carried out to example with the thought of greedy algorithm.Such method expense compared with
Small but uncertain figure can not be indicated very well by extracting example, the error of generation is larger.
(3)ADR(Average Degree Rewriting)、ABM(APPROXIMATEB-MATCHING):
Such method considers not only each side probability of occurrence size in uncertain figure, and in view of each Vertex Degree is big
It is small, improve the accuracy of example extraction.But complexity is higher, algorithm executes relatively time-consuming.
Summary of the invention
In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide a kind of uncertain figure representative instances
Abstracting method.
In order to achieve the above objects and other related objects, the present invention provides a kind of extraction side of uncertain figure representative instance
Method, this method comprises:
Building training set simultaneously establishes regression model;
The regression coefficient of the regression model is determined using gradient descent method;
Logic Regression Models are established according to the coefficient of the regression model and the regression model.
Optionally, the regression model are as follows:
Optionally, the method for building up of the regression model, specifically includes:
Representative instance is extracted from uncertain figure g determines figure G;
By the presence of the degree on the vertex of each edge in the determining figure G, the expecting degree on the vertex of each edge and each edge
Probability P e is as characteristic value;
Linear representation is established according to the characteristic value;The linear representation is then training set;
The functional value of the linear representation is mapped on [0,1] section.
Optionally, the coefficient that the regression model is determined using gradient descent method, is specifically included:
Construct loss function J;
Sharp gradient descent method solves the regression coefficient of all features.
Optionally, the Logic Regression Models are as follows:
Wherein, θ=[θ1',θ2',θ3',θ4',θ5']T。
As described above, a kind of abstracting method of uncertain figure representative instance of the invention, has the advantages that
Invention uses Logistic regression model model, passes through ADR (Average Degree Rewiring) algorithm first
Example extraction is carried out to uncertain figure.Then feature and label are extracted by the example extracted as training set and build
Vertical regression expression.Then the regression coefficient in regression expression is determined by gradient descent method, so that it is determined that returning expression
Formula finally establishes the classifier of example extraction.When needing (non-determined figure) extraction example from the new protein Internet,
Input of the network (non-determined figure) that directly protein can be interacted as classifier, and the result that classifier predicts is to take out
The side collection taken finally obtains the example (determining figure) extracted from the network.In implementation procedure of the present invention, effectively raise
The efficiency of problem solving is avoided and is repeated when there is the new protein Internet to occur to ADR algorithm, is had good
Practicability.
Detailed description of the invention
In order to which the present invention is further explained, described content, with reference to the accompanying drawing makees a specific embodiment of the invention
Further details of explanation.It should be appreciated that these attached drawings are only used as typical case, and it is not to be taken as to the scope of the present invention
It limits.
Fig. 1 is the non-determined figure of the given example figure of the preferred embodiment of the present invention;
Fig. 2 is the example that the given example figure of the preferred embodiment of the present invention is extracted using ADR algorithm;
Fig. 3 is flow chart of the invention.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification
Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities
The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from
Various modifications or alterations are carried out under spirit of the invention.It should be noted that in the absence of conflict, following embodiment and implementation
Feature in example can be combined with each other.
It should be noted that illustrating the basic structure that only the invention is illustrated in a schematic way provided in following embodiment
Think, only shown in schema then with related component in the present invention rather than component count, shape and size when according to actual implementation
Draw, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its assembly layout kenel
It is likely more complexity.
The present invention discloses a kind of training method of non-determined figure Exemplary classes device based on ADR algorithm, uses
Logistic regression model.Example is selected on uncertain figure using ADR algorithm first, and to uncertain figure detected
Label is added on side, is 1 if selected, is otherwise 0.Post analysis do not know figure interior joint degree, node expecting degree, side occur
Probability, whether the information such as selected, extract feature, construct Logistic regression model.According to gained regression model, damage is established
Function is lost, regression coefficient is determined by gradient descent method, regression model is finally determined, to train classifier.When input is new
Uncertain figure when, only need to can determine the example that can represent non-determined figure to classifier input feature vector value, classifier.Due to
Logistic regression model is a kind of generalized linear regression model, therefore when there is new uncertain figure to carry out example extraction, effectively
Improve example extraction efficiency, have good practicability.
Below by taking the INTERACTION PROBLEMS between protein as an example, the present invention is described in more detail.
Fig. 1 is the network of protein interaction.Wherein vertex representation protein, side indicate protein interaction.Due to protein
There are constant errors for interactive high-throughput Measurement for Biotechnique (such as yeast-two hybrid technique), therefore the protein that experiment measures is handed over
Whether be uncertain, therefore the weight on side indicates a possibility that protein interaction really exists if mutually really existing.To egg
White matter is detected, and the uncertainty in protein should be first eliminated.ADR algorithm effectively eliminates from uncertain figure uncertain
Property, the example (determining figure) that can represent uncertain figure is extracted, and then protein detection can be carried out to the example extracted
Sequence of operations.The present invention is to improve the efficiency extracted to uncertain figure example, therefore establish regression model, is effectively accelerated
The efficiency that new uncertain figure example is extracted.
As shown in figure 3, the present invention provides a kind of abstracting method of uncertain figure representative instance, this method comprises:
Step 1, it constructs training set and establishes regression model;
Step 11, given uncertain figure g, as shown in Figure 1, being G by the representative instance that ADR algorithm is extracted, such as Fig. 2 institute
Show.
The example G that step 12, analysis are extracted by ADR algorithm.Will scheme G in each edge vertex degree (deg (u),
Deg (v)), the expecting degree [deg (u)] on the vertex of each edge, [deg (v)], each edge Probability p e as feature x1,x2,x3,
x4,x5.Its respective function value y is 1 if the side is selected, is otherwise 0. such as matrix
One sample of each behavior of X, each feature for being classified as sample of X.
Step 13 establishes linear representation z=θ according to extracted feature1x1+θ2x2+θ3x3+θ4x4+θ5x5, wherein θ1~
θ5It is 1 when initial for regression coefficient.Vector is expressed as z=θTX.Z=X when then initial.
Step 14, according to Sigmoid function, the functional value of expression formula z is mapped on [0,1] section.Sigmoid function
Specific calculation formula is as follows:
Wherein, z=θ1x1+θ2x2+θ3x3+θ4x4+θ5x5, vector is expressed as z=θTX。
Step 2. determines the regression coefficient of the regression model using gradient descent method.
Step 21, construction loss function J
Cost function and J function are as follows, they be obtained based on maximal possibility estimation reasoning (total sample number m, each
Sample has n sample):
Step 22, gradient descent method solve J functional minimum value:
θ renewal process can be write as:
Step 23 repeats step 22, until the number of iterations reaches setting value, then executes step 24;
Step 24, the regression coefficient θ=[θ for finding out all features1',θ2',θ3',θ4',θ5']T.Wherein, θ1'~θ5' be back
Return coefficient.
Step 3. establishes the expression formula of Logic Regression Models according to the coefficient of the regression model and the regression model:
This implementation uses Logistic regression model model, is calculated first by ADR (Average Degree Rewiring)
Method carries out example extraction to uncertain figure.Then feature and label are extracted by the example extracted as training set simultaneously
Establish regression expression.Then the regression coefficient in regression expression is determined by gradient descent method, so that it is determined that returning table
Up to formula, the classifier of example extraction is finally established.When (non-determined figure) extracts example to needs from the new protein Internet
When, input of the network (non-determined figure) that can directly interact the protein as classifier, and the result that classifier predicts
For the side collection of extraction, the example (determining figure) extracted from the network is finally obtained.It is effective to improve in algorithm implementation procedure
The efficiency of problem solving, avoids and repeats when there is the new protein Internet to occur to ADR algorithm, has good
Good practicability.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe
The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause
This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as
At all equivalent modifications or change, should be covered by the claims of the present invention.
Claims (5)
1. a kind of abstracting method of uncertain figure representative instance, which is characterized in that this method comprises:
Building training set simultaneously establishes regression model;
The regression coefficient of the regression model is determined using gradient descent method;
Logic Regression Models are established according to the coefficient of the regression model and the regression model.
2. a kind of abstracting method of uncertain figure representative instance according to claim 1, which is characterized in that the recurrence mould
Type are as follows:。
3. a kind of abstracting method of uncertain figure representative instance according to claim 2, which is characterized in that the recurrence mould
The method for building up of type, specifically includes:
Representative instance is extracted from uncertain figure g determines figure G;
By the existing probability of the degree on the vertex of each edge in the determining figure G, the expecting degree on the vertex of each edge and each edge
Pe is as characteristic value;
Linear representation is established according to the characteristic value;The linear representation is then training set;
The functional value of the linear representation is mapped on [0,1] section.
4. a kind of abstracting method of uncertain figure representative instance according to claim 3, which is characterized in that described to use ladder
Degree descent method determines the coefficient of the regression model, specifically includes:
Construct loss function J;
Sharp gradient descent method solves the regression coefficient of all features.
5. a kind of abstracting method of uncertain figure representative instance according to claim 4, which is characterized in that the logic is returned
Return model are as follows:
Wherein, θ=[θ1',θ2',θ3',θ4',θ5']T。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811486200.7A CN109727635A (en) | 2018-12-06 | 2018-12-06 | A kind of abstracting method of uncertain figure representative instance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811486200.7A CN109727635A (en) | 2018-12-06 | 2018-12-06 | A kind of abstracting method of uncertain figure representative instance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109727635A true CN109727635A (en) | 2019-05-07 |
Family
ID=66295300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811486200.7A Pending CN109727635A (en) | 2018-12-06 | 2018-12-06 | A kind of abstracting method of uncertain figure representative instance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109727635A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617435A (en) * | 2013-12-16 | 2014-03-05 | 苏州大学 | Image sorting method and system for active learning |
CN106250928A (en) * | 2016-07-30 | 2016-12-21 | 哈尔滨工业大学深圳研究生院 | Parallel logic homing method based on Graphics Processing Unit and system |
CN106411965A (en) * | 2016-12-22 | 2017-02-15 | 北京知道创宇信息技术有限公司 | Method for determining network server providing counterfeit service, equipment and calculating equipment thereof |
CN107678803A (en) * | 2017-09-30 | 2018-02-09 | 广东欧珀移动通信有限公司 | Using management-control method, device, storage medium and electronic equipment |
-
2018
- 2018-12-06 CN CN201811486200.7A patent/CN109727635A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617435A (en) * | 2013-12-16 | 2014-03-05 | 苏州大学 | Image sorting method and system for active learning |
CN106250928A (en) * | 2016-07-30 | 2016-12-21 | 哈尔滨工业大学深圳研究生院 | Parallel logic homing method based on Graphics Processing Unit and system |
CN106411965A (en) * | 2016-12-22 | 2017-02-15 | 北京知道创宇信息技术有限公司 | Method for determining network server providing counterfeit service, equipment and calculating equipment thereof |
CN107678803A (en) * | 2017-09-30 | 2018-02-09 | 广东欧珀移动通信有限公司 | Using management-control method, device, storage medium and electronic equipment |
Non-Patent Citations (1)
Title |
---|
宋少莹: "《不确定图的代表实例发现算法》", 《硕士论文》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020259582A1 (en) | Neural network model training method and apparatus, and electronic device | |
CN104063876B (en) | Interactive image segmentation method | |
CN110533086B (en) | Semi-automatic image data labeling method | |
CN110991532B (en) | Scene graph generation method based on relational visual attention mechanism | |
CN105893349A (en) | Category label matching and mapping method and device | |
CN112364880A (en) | Omics data processing method, device, equipment and medium based on graph neural network | |
CN110019519A (en) | Data processing method, device, storage medium and electronic device | |
Strupczewski et al. | On seasonal approach to flood frequency modelling. Part I: Two‐component distribution revisited | |
CN106997373A (en) | A kind of link prediction method based on depth confidence network | |
CN109783629A (en) | A kind of micro-blog event rumour detection method of amalgamation of global event relation information | |
CN106339366A (en) | Method and device for requirement identification based on artificial intelligence (AI) | |
CN106815215B (en) | The method and apparatus for generating annotation repository | |
CN109948242A (en) | Network representation learning method based on feature Hash | |
Santiago et al. | A methodology for the characterization of flow conductivity through the identification of communities in samples of fractured rocks | |
Wu et al. | Model validation using invariant signatures and logic-based inference for automated building code compliance checking | |
Kalfarisi et al. | Detecting and geolocating city-scale soft-story buildings by deep machine learning for urban seismic resilience | |
CN114331380A (en) | Method, system, equipment and storage medium for predicting occupational flow relationship | |
Zorn et al. | Replacing energy simulations with surrogate models for design space exploration | |
CN109543114A (en) | Heterogeneous Information network linking prediction technique, readable storage medium storing program for executing and terminal | |
Bloch et al. | Graph-based learning for automated code checking–Exploring the application of graph neural networks for design review | |
CN109727635A (en) | A kind of abstracting method of uncertain figure representative instance | |
WO2024021350A1 (en) | Image recognition model training method and apparatus, computer device, and storage medium | |
CN104834958B (en) | A kind of method and apparatus judged the step of answer | |
Mao et al. | Building façade semantic segmentation based on K-means classification and graph analysis | |
CN105761152A (en) | Topic participation prediction method based on triadic group in social network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190507 |