CN110321377A - A kind of multi-source heterogeneous data true value determines method and device - Google Patents

A kind of multi-source heterogeneous data true value determines method and device Download PDF

Info

Publication number
CN110321377A
CN110321377A CN201910340361.3A CN201910340361A CN110321377A CN 110321377 A CN110321377 A CN 110321377A CN 201910340361 A CN201910340361 A CN 201910340361A CN 110321377 A CN110321377 A CN 110321377A
Authority
CN
China
Prior art keywords
value
data
true value
statement
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910340361.3A
Other languages
Chinese (zh)
Other versions
CN110321377B (en
Inventor
许海涛
王铮
周贤伟
林福宏
吕兴
安建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201910340361.3A priority Critical patent/CN110321377B/en
Publication of CN110321377A publication Critical patent/CN110321377A/en
Application granted granted Critical
Publication of CN110321377B publication Critical patent/CN110321377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of multi-source heterogeneous data true value and determines method and device, can carry out Combined Treatment to isomery colliding data and improve the accuracy rate of true value discovery.The described method includes: S1, obtains the isomery colliding data from different data sources;S2, to the colliding data of description same target, every an object and all objects for all data sources are constructed respectively to maximize statement value credibility weighted sum as the objective function G of target and Optimized model F;S3 updates the weight of all data sources using the true value selection strategy based on the method for exhaustion for every an object;S4 calculates F value, judges whether Optimized model F restrains according to obtained F value according to the weight of updated all data sources, if not restraining, returns to S3 and continues to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set.The present invention relates to the field of data mining.

Description

A kind of multi-source heterogeneous data true value determines method and device
Technical field
The present invention relates to the field of data mining, particularly relates to a kind of multi-source heterogeneous data true value and determine method and device.
Background technique
With the arrival of big data era, data just like become a huge precious deposits, this is collected by many websites and company A little data are government, enterprise and the public provide service.People can obtain the description of same entity from various data sources, this It brings great convenience to people's lives.But for same thing, different data sources may provide different retouch It states, wherein these information result in serious data collision and affect people couple there are some information for not meeting truth In the judgement of the truth of the matter.
In order to clear up data collision, a kind of effective solution method is true value discovery, i.e., from the conflict for describing same entity True description is found in data.Data source may be isomery for the statement of entity different aspect, but existing true value It was found that method does not have the Combined Treatment ability to isomeric data all just for the data of single type, and have ignored for true The selection strategy of value can not correctly find true value.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of multi-source heterogeneous data true value to determine method and device, with solution Certainly present in the prior art can not Combined Treatment isomeric data, the problem of being easily trapped into local optimum.
In order to solve the above technical problems, the embodiment of the present invention, which provides a kind of multi-source heterogeneous data true value, determines method, wrap It includes:
S1 obtains the isomery colliding data from different data sources;
S2, to the colliding data of description same target, every an object and all objects for all data sources, with number It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target Objective function G and Optimized model F;
S3, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection Value determines G value according to the reference true value of selection as true value is referred to, and when G value maximum, current reference true value is current right The optimal true value of elephant updates the weight of all data sources according to the optimal true value of obtained all objects;
S4 calculates F value, judges that Optimized model F is according to obtained F value according to the weight of updated all data sources No convergence returns to S3 and continues to execute if not restraining;If convergence, the optimal true value of obtained all objects forms optimal true value Collection.
Further, to the colliding data of description same target, for all objects of all data sources, building with Maximizing the Optimized model F that statement value credibility weighted sum is target indicates are as follows:
Wherein, wnFor data source SnWeight, N indicate data source number, K indicate object number, An,kFor data source SnFor object OkThe statement value of offer, f (An,k) it is statement value An,kCredibility, s.t. indicate constraint condition, A*,kFor object Ok Statement value, TkFor object OkTrue value and be A*,kSubset;
Wherein, for the object O of all data sourcesk, building to maximize statement value credibility weighted sum as target Objective function G (k) is indicated are as follows:
Wherein, as object OkCertain statement value when objective function G (k) being made to reach maximum, which is exactly object Ok's True value indicates are as follows:
Further, data source SnWeight wnIt indicates are as follows:
Further, statement value An,kCredible f (An,k) indicate are as follows:
Wherein, β is the support factor, NkFor data source SnIt is supplied to object OkStatement value quantity, sim () indicate phase Like degree function, sup () indicates that statement value supports function.
Further, if colliding data is classification data, similarity function is indicated are as follows:
Further, if colliding data is continuous data, similarity function is indicated are as follows:
Further, statement value support is expressed as:
sup(An,k,Ai,k)=sim (An,k,Ai,k)。
Further, described to be directed to every an object, using the true value selection strategy based on the method for exhaustion, select statement value collection Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true Value is that the optimal true value of existing object updates the weight packet of all data sources according to the optimal true value of obtained all objects It includes:
S31, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection Value is as with reference to true value;
S32 supports to come really to each similarity with reference to true data calculation object each statement value and its, and in conjunction with statement value Determine statement value credibility, according to the data source weight that determining statement value credibility and last iteration obtain, determines G value;
S33, when judging whether G value is maximum value, if so, current reference true value is the optimal true value of existing object;It is no Then, then it returns to S31 and continues iteration;
The optimal true value of obtained all objects is formed preliminary optimal truth set, according to obtained all objects by S34 The corresponding statement value credibility set of preliminary optimal truth set, update the weight of all data sources.
The embodiment of the present invention also provides a kind of multi-source heterogeneous data true value determining device, comprising:
Module is obtained, for obtaining the isomery colliding data from different data sources;
Module is constructed, for the colliding data to description same target, every an object for all data sources and all Object is constructed respectively using data source weight and statement value credibility as optimized variable to maximize statement value credibility weighted sum For the objective function G and Optimized model F of target;
Update module, using the true value selection strategy based on the method for exhaustion, selects statement value collection for being directed to every an object Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true Value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;
Determining module calculates F value for the weight according to updated all data sources, is judged according to obtained F value Whether Optimized model F restrains, if not restraining, returns to update module and continues to execute;If convergence, obtained all objects are most Excellent true value forms optimal truth set.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, the isomery colliding data from different data sources is obtained;To the number of collisions of description same target According to, every an object and all objects for all data sources, using data source weight and statement value credibility as optimized variable, It is constructed respectively to maximize statement value credibility weighted sum as the objective function G of target and Optimized model F;For every an object, Using the true value selection strategy based on the method for exhaustion, select the statement value in statement value set as true value is referred to, according to selection G value is determined with reference to true value, and when G value maximum, current reference true value is the optimal true value of existing object, according to obtained institute There is the optimal true value of object, updates the weight of all data sources;According to the weight of updated all data sources, F value is calculated, Judge whether Optimized model F restrains according to obtained F value, if convergence, the optimal true value composition of obtained all objects is most Excellent truth set;In this way, the true value selection strategy based on the method for exhaustion, can carry out Combined Treatment to isomery colliding data, and overcome Local optimum influences and accurately finds globally optimal solution, to improve the accuracy rate of true value discovery.
Detailed description of the invention
Fig. 1 is the flow diagram that multi-source heterogeneous data true value provided in an embodiment of the present invention determines method;
Fig. 2 is the schematic illustration that multi-source heterogeneous data true value provided in an embodiment of the present invention determines method;
Fig. 3 is the detailed process schematic diagram that multi-source heterogeneous data true value provided in an embodiment of the present invention determines method;
Fig. 4 is the structural schematic diagram of multi-source heterogeneous data true value determining device provided in an embodiment of the present invention.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.
The present invention for the problem that it is existing can not Combined Treatment isomeric data, be easily trapped into local optimum, one kind be provided Multi-source heterogeneous data true value determines method and device.
Embodiment one
As shown in Figure 1, multi-source heterogeneous data true value provided in an embodiment of the present invention determines method, comprising:
S1 obtains the isomery colliding data from different data sources;
S2, to the colliding data of description same target, every an object and all objects for all data sources, with number It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target Objective function G and Optimized model F;
S3, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection Value determines G value according to the reference true value of selection as true value is referred to, and when G value maximum, current reference true value is current right The optimal true value of elephant updates the weight of all data sources according to the optimal true value of obtained all objects;
S4 calculates F value, judges that Optimized model F is according to obtained F value according to the weight of updated all data sources No convergence returns to S3 and continues to execute if not restraining;If convergence, the optimal true value of obtained all objects forms optimal true value Collection.
Multi-source heterogeneous data true value described in the embodiment of the present invention determines method, obtains from the different of different data sources Structure colliding data;To the colliding data of description same target, every an object and all objects for all data sources, with number It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target Objective function G and Optimized model F;Statement value collection is selected using the true value selection strategy based on the method for exhaustion for every an object Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true Value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;Root According to the weight of updated all data sources, F value is calculated, judges whether Optimized model F restrains according to obtained F value, if receiving It holds back, then the optimal true value of all objects obtained forms optimal truth set;In this way, the true value selection strategy based on the method for exhaustion, energy It is enough that Combined Treatment is carried out to isomery colliding data, and overcome local optimum to influence and accurately find globally optimal solution, to improve The accuracy rate of true value discovery.
In the present embodiment, to every an object, provide a statement value set, state the statement value in value set be by Duplicate removal processing, without repetition values.
In the present embodiment, based on the observation to real world colliding data, the embodiment of the present invention proposes following inspiration Formula:
(1) true value of object looks like identical or similar in most of data sources;
(2) data source for providing the high statement value of more credibilities may more provide true statement value;
(3) the statement value that the high data source of weight provides more may be close to the true value of object.
In the present embodiment, for the ease of research, it is assumed that each object is that only one is true for independent and each object Value.As shown in Fig. 2, to the colliding data of description same target, true value is found the problem, and it is excellent to be converted into based on above-mentioned heuristic Change problem, devises a kind of multi-source heterogeneous data true value discovery Optimized model, and the Optimized model is credible to maximize statement value Weighted sum is target, and using data source weight and statement value credibility as optimized variable, updates the two in an iterative manner Optimized variable continues to optimize model, wherein weight is data source weight, which indicates are as follows:
Wherein, F is all objects for all data sources, is constructed to maximize statement value credibility weighted sum as mesh Target Optimized model;wnFor data source SnWeight, N indicate data source number, K indicate object number, An,kFor data source SnFor object OkThe statement value of offer, f (An,k) it is statement value An,kCredibility, s.t. indicate constraint condition, A*,kFor object Ok Statement value set, TkFor object OkTrue value and be A*,kSubset.
In Optimized model, data source weight and statement value credibility are all known variables, therefore can be using iteration Method both updates to continue to optimize model.
In the present embodiment, since each object is mutually indepedent, for the object O of all data sourcesk, building with Maximizing the objective function G (k) that statement value credibility weighted sum is target can indicate are as follows:
As object OkSome statement value when objective function G (k) being made to reach maximum, which is exactly object OkIt is true Value indicates are as follows:
In the present embodiment, each iteration is considered as a suboptimization, every suboptimization selects plan using the true value based on the method for exhaustion Slightly, by all statement values in statement value set, successively as true value is referred to, then exporting is maximized objective function G (k) Statement value is as object OkTrue value, then continue to optimize model until Optimized model F restrains, Optimized model at this time obtains Truth set be best truth set.
In the present embodiment, data source weight is calculated by statement value credibility, data source SnWeight wnIt indicates are as follows:
Wherein,Indicate data source SnThe sum of all object statement value credibilities provided, Indicate the sum of the statement value credibility of all objects that all data sources provide.
In the present embodiment, statement value credibility is collectively constituted by statement value similarity and the support of statement value, statement value An,k Credible f (An,k) indicate are as follows:
Wherein, β is the support factor, NkFor data source SnIt is supplied to object OkStatement value quantity, sim () indicate phase Like degree function, sup () indicates that statement value supports function.
In the present embodiment, colliding data can be divided into: classification data and continuous data both data types, wherein point Class data and continuous data are two kinds of common isomeric datas, give corresponding similarity function for both data, In,
Classification data uses 0-1 similarity function, then the similarity function of classification data indicates are as follows:
Continuous data using normalization square root relative error as similarity function, then the similarity letter of continuous data Number indicates are as follows:
In the present embodiment, corresponding similarity function is used for different types of data source, it can be to multiple types Data source carry out Combined Treatment, thus preferably assess data source quality and improve true value discovery accuracy rate.
In the present embodiment, statement value supports the similarity for statement value and other statement values for describing same target, indicates It is as follows:
sup(An,k,Ai,k)=sim (An,k,Ai,k) (8)
In the present embodiment, the support of statement value has been introduced into the calculating of statement value credibility, can accelerate to optimize and ask The solution of topic gets rid of local optimum, rapidly to current optimal close, improve result accuracy and to accelerate multi-source heterogeneous data true It is worth the convergence rate of the method for determination.
In the specific embodiment that aforementioned multi-source heterogeneous data true value determines method, further, described use is directed to Every an object, the true value selection strategy based on the method for exhaustion, select statement value set in statement value as refer to true value, according to The reference true value of selection determines G value, and when G value maximum, current reference true value is the optimal true value of existing object, according to The optimal true value of all objects arrived, the weight for updating all data sources include:
S31, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection Value is as with reference to true value;
S32 supports to come really to each similarity with reference to true data calculation object each statement value and its, and in conjunction with statement value Determine statement value credibility, according to the data source weight that determining statement value credibility and last iteration obtain, determines G value;
S33, when judging whether G value is maximum value, if so, current reference true value is the optimal true value of existing object;It is no Then, then it returns to S31 and continues iteration;
The optimal true value of obtained all objects is formed preliminary optimal truth set, according to obtained all objects by S34 The corresponding statement value credibility set of preliminary optimal truth set, update the weight of all data sources.
It is further, described according to update in the specific embodiment that aforementioned multi-source heterogeneous data true value determines method The weight of all data sources afterwards calculates F value, judges whether Optimized model F restrains according to obtained F value, if not restraining, return S3 is returned to continue to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set and includes:
It can according to the corresponding statement value of preliminary optimal truth set of the weight of updated all data sources and all objects The value of letter property set calculation optimization model F;
According to obtained F value, judge whether Optimized model F restrains, if not restraining, returns to S3 and continue to execute;If convergence, Then current preliminary optimal truth set is final optimal truth set.
As shown in figure 3, multi-source heterogeneous data true value described in the present embodiment determines method in order to better understand, to it Workflow is described in detail, and the method can specifically include following steps:
Step 1 carries out data mining by modes such as web crawlers first, obtains from network from different data The isomery colliding data in source (for example, the microbloggings such as Sina, Tencent, each public platform, the search engines such as Baidu, Google), and will own Data source weights initialisation is identical value;
Step 2 is supported before iteration starts according to the statement value that formula (8) calculate all statement values;
Step 3 starts iteration, after each iteration starts, for every an object, using the true value based on the method for exhaustion Selection strategy successively selects the statement value in statement value set as with reference to true value, refers to each sound of true data calculation object to each The similarity of bright value and its, and the support of statement value is combined to determine that statement value is credible by formula (5), according to determining statement The data source weight that the credible and last iteration of value obtains, calculates G value according to formula (2), and when G value maximum, reference at this time is true Value is optimal true value;
The optimal true value of obtained all objects is formed preliminary optimal truth set by step 4, all right according to what is obtained The corresponding statement value credibility set of preliminary optimal truth set of elephant, passes through the weight that formula (4) update all data sources, terminates Iteration;Otherwise, then return step three continues iteration;
Step 5, the preliminary optimal truth set according to the weight of updated all data sources and all objects are corresponding The value of statement value credibility set calculation optimization model F, repeat Step 3: four iterative process, until Optimized model F restrain, Then current truth set is exported as optimal truth set.
Multi-source heterogeneous data true value described in the present embodiment determines method, can be used for the field of data mining, on network The colliding data excavated carries out data cleansing, and true value is found from colliding data, helps to improve the information matter of network environment Amount provides more accurate information service for the public, business and government, reduces because losing caused by colliding data.
The advantageous effects of the above technical solutions of the present invention are as follows:
(1) multi-source heterogeneous data true value proposed by the present invention finds Optimized model, solves true value by the method for optimization, Not vulnerable to the influence of data distribution, and corresponding similarity function is used for different types of data, it can be to a variety of The data of type carry out Combined Treatment, to preferably assess data source quality and improve the accuracy rate of true value discovery.
(2) support of statement value has been introduced into the calculating of statement value credibility by the present invention, and the addition that statement value is supported can Local optimum is got rid of to accelerate the solution of optimization problem, rapidly to current optimal close, result accuracy improved and accelerates Algorithm the convergence speed.
(3) the invention proposes the true value selection strategies based on the method for exhaustion, although the method for exhaustion is more complicated, it can be with Overcome local optimum to influence and accurately finds globally optimal solution.
Embodiment two
The present invention also provides a kind of specific embodiments of multi-source heterogeneous data true value determining device, since the present invention mentions The multi-source heterogeneous data true value determining device supplied determines that the specific embodiment of method is opposite with aforementioned multi-source heterogeneous data true value Answer, the multi-source heterogeneous data true value determining device can by execute above method specific embodiment in process step come It achieves the object of the present invention, therefore above-mentioned multi-source heterogeneous data true value determines the explanation in method specific embodiment, It is below specific in the present invention suitable for the specific embodiment of multi-source heterogeneous data true value determining device provided by the invention It will not be described in great detail in embodiment.
As shown in figure 4, the embodiment of the present invention also provides a kind of multi-source heterogeneous data true value determining device, comprising:
Module 11 is obtained, for obtaining the isomery colliding data from different data sources;
Module 12 is constructed, for the colliding data to description same target, every an object and institute for all data sources There is object, using data source weight and statement value credibility as optimized variable, is constructed respectively to maximize the weighting of statement value credibility With the objective function G and Optimized model F for target;
Update module 13, using the true value selection strategy based on the method for exhaustion, selects statement value for being directed to every an object Statement value in set, which is used as, refers to true value, determines G value according to the reference true value of selection, when G value maximum, current reference True value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;
Determining module 14 calculates F value, is sentenced according to obtained F value for the weight according to updated all data sources Whether disconnected Optimized model F restrains, if not restraining, returns to update module 13 and continues to execute;If convergence, obtained all objects Optimal true value form optimal truth set.
Multi-source heterogeneous data true value determining device described in the embodiment of the present invention is obtained from the different of different data sources Structure colliding data;To the colliding data of description same target, every an object and all objects for all data sources, with number It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target Objective function G and Optimized model F;Statement value collection is selected using the true value selection strategy based on the method for exhaustion for every an object Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true Value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;Root According to the weight of updated all data sources, F value is calculated, judges whether Optimized model F restrains according to obtained F value, if receiving It holds back, then the optimal true value of all objects obtained forms optimal truth set;In this way, the true value selection strategy based on the method for exhaustion, energy It is enough that Combined Treatment is carried out to isomery colliding data, and overcome local optimum to influence and accurately find globally optimal solution, to improve The accuracy rate of true value discovery.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications Also it should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of multi-source heterogeneous data true value determines method characterized by comprising
S1 obtains the isomery colliding data from different data sources;
S2, to the colliding data of description same target, every an object and all objects for all data sources are weighed with data source Weight and statement value credibility are optimized variable, are constructed respectively to maximize statement value credibility weighted sum as the objective function of target G and Optimized model F;
S3, for every an object, using the true value selection strategy based on the method for exhaustion, select statement value in statement value set as With reference to true value, G value is determined according to the reference true value of selection, when G value maximum, current reference true value is the optimal of existing object True value updates the weight of all data sources according to the optimal true value of obtained all objects;
S4 calculates F value, judges whether Optimized model F receives according to obtained F value according to the weight of updated all data sources It holds back, if not restraining, returns to S3 and continue to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set.
2. multi-source heterogeneous data true value according to claim 1 determines method, which is characterized in that description same target Colliding data, for all objects of all data sources, building to maximize statement value credibility weighted sum as the excellent of target Changing model F indicates are as follows:
Wherein, wnFor data source SnWeight, N indicate data source number, K indicate object number, An,kFor data source SnFor Object OkThe statement value of offer, f (An,k) it is statement value An,kCredibility, s.t. indicate constraint condition, A*,kFor object OkSound Bright value, TkFor object OkTrue value and be A*,kSubset;
Wherein, for the object O of all data sourcesk, building to maximize statement value credibility weighted sum as the target letter of target Number G (k) indicates are as follows:
Wherein, as object OkCertain statement value when objective function G (k) being made to reach maximum, which is exactly object OkTrue value, It indicates are as follows:
3. multi-source heterogeneous data true value according to claim 2 determines method, which is characterized in that data source SnWeight wn It indicates are as follows:
4. multi-source heterogeneous data true value according to claim 3 determines method, which is characterized in that statement value An,kIt is credible Property f (An,k) indicate are as follows:
Wherein, β is the support factor, NkFor data source SnIt is supplied to object OkStatement value quantity, sim () indicate similarity Function, sup () indicate that statement value supports function.
5. multi-source heterogeneous data true value according to claim 4 determines method, which is characterized in that if colliding data is classification Data, then similarity function indicates are as follows:
6. multi-source heterogeneous data true value according to claim 5 determines method, which is characterized in that if colliding data is continuous Data, then similarity function indicates are as follows:
7. multi-source heterogeneous data true value according to claim 6 determines method, which is characterized in that statement value is supported to indicate Are as follows:
sup(An,k,Ai,k)=sim (An,k,Ai,k)。
8. multi-source heterogeneous data true value according to claim 1 determines method, which is characterized in that described to be directed to every a pair As selecting the statement value in statement value set as true value is referred to, according to selection using the true value selection strategy based on the method for exhaustion Reference true value determine G value, when G value maximum, current reference true value is the optimal true value of existing object, according to obtained institute There is the optimal true value of object, the weight for updating all data sources includes:
S31, for every an object, using the true value selection strategy based on the method for exhaustion, the statement value in selection statement value set is made For with reference to true value;
S32 is supported to each similarity with reference to true data calculation object each statement value and its, and in conjunction with statement value to determine statement Value is credible, according to the data source weight that determining statement value credibility and last iteration obtain, determines G value;
S33, when judging whether G value is maximum value, if so, current reference true value is the optimal true value of existing object;Otherwise, then It returns to S31 and continues iteration;
The optimal true value of obtained all objects is formed preliminary optimal truth set, according to the preliminary of obtained all objects by S34 The corresponding statement value credibility set of optimal truth set, updates the weight of all data sources.
9. multi-source heterogeneous data true value according to claim 8 determines method, which is characterized in that described according to updated The weight of all data sources, calculate F value, judge whether Optimized model F restrains according to obtained F value, if not restraining, return S3 after It is continuous to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set and includes:
It is credible according to the corresponding statement value of preliminary optimal truth set of the weight of updated all data sources and all objects Gather the value of calculation optimization model F;
According to obtained F value, judge whether Optimized model F restrains, if not restraining, returns to S3 and continue to execute;If convergence, currently Preliminary optimal truth set be optimal truth set.
10. a kind of multi-source heterogeneous data true value determining device characterized by comprising
Module is obtained, for obtaining the isomery colliding data from different data sources;
Module is constructed, for the colliding data to description same target, every an object and all objects for all data sources, Using data source weight and statement value credibility as optimized variable, constructed respectively to maximize statement value credibility weighted sum as target Objective function G and Optimized model F;
Update module, for being directed to every an object, using the true value selection strategy based on the method for exhaustion, selection is stated in value set Statement value, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference true value is current The optimal true value of object updates the weight of all data sources according to the optimal true value of obtained all objects;
Determining module calculates F value for the weight according to updated all data sources, according to obtained F value judgement optimization mould Whether type F restrains, if not restraining, returns to update module and continues to execute;If convergence, the optimal true value group of obtained all objects At optimal truth set.
CN201910340361.3A 2019-04-25 2019-04-25 Multi-source heterogeneous data truth value determination method and device Active CN110321377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910340361.3A CN110321377B (en) 2019-04-25 2019-04-25 Multi-source heterogeneous data truth value determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910340361.3A CN110321377B (en) 2019-04-25 2019-04-25 Multi-source heterogeneous data truth value determination method and device

Publications (2)

Publication Number Publication Date
CN110321377A true CN110321377A (en) 2019-10-11
CN110321377B CN110321377B (en) 2021-07-23

Family

ID=68113240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910340361.3A Active CN110321377B (en) 2019-04-25 2019-04-25 Multi-source heterogeneous data truth value determination method and device

Country Status (1)

Country Link
CN (1) CN110321377B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708816A (en) * 2020-05-15 2020-09-25 西安交通大学 Multi-truth-value conflict resolution method based on Bayesian model
CN113535693A (en) * 2020-04-20 2021-10-22 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment
CN115932702A (en) * 2023-03-14 2023-04-07 武汉格蓝若智能技术股份有限公司 Voltage transformer online operation calibration method and device based on virtual standard device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933052A (en) * 2014-03-17 2015-09-23 华为技术有限公司 Data true value estimation method and data true value estimation device
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
US20170344558A1 (en) * 2016-05-26 2017-11-30 International Business Machines Corporation Graph method for system sensitivity analyses
CN108564101A (en) * 2017-12-29 2018-09-21 天津南大通用数据技术股份有限公司 A kind of data fusion method and device based on more hierarchical cluster attributes
US20180365779A1 (en) * 2017-06-14 2018-12-20 Global Tel*Link Corporation Administering pre-trial judicial services
CN109284316A (en) * 2018-09-11 2019-01-29 中国人民解放军战略支援部队信息工程大学 True value based on data source Multi-attributes finds method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933052A (en) * 2014-03-17 2015-09-23 华为技术有限公司 Data true value estimation method and data true value estimation device
US20170344558A1 (en) * 2016-05-26 2017-11-30 International Business Machines Corporation Graph method for system sensitivity analyses
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
US20180365779A1 (en) * 2017-06-14 2018-12-20 Global Tel*Link Corporation Administering pre-trial judicial services
CN108564101A (en) * 2017-12-29 2018-09-21 天津南大通用数据技术股份有限公司 A kind of data fusion method and device based on more hierarchical cluster attributes
CN109284316A (en) * 2018-09-11 2019-01-29 中国人民解放军战略支援部队信息工程大学 True value based on data source Multi-attributes finds method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯钦 等: "基于多蚁群同步优化的多真值发现算法", 《计算机软件及计算机应用》 *
马如霞 等: "MTruths:Web信息多真值发现方法", 《计算机研究与发展》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535693A (en) * 2020-04-20 2021-10-22 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment
CN113535693B (en) * 2020-04-20 2023-04-07 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment
CN111708816A (en) * 2020-05-15 2020-09-25 西安交通大学 Multi-truth-value conflict resolution method based on Bayesian model
CN115932702A (en) * 2023-03-14 2023-04-07 武汉格蓝若智能技术股份有限公司 Voltage transformer online operation calibration method and device based on virtual standard device

Also Published As

Publication number Publication date
CN110321377B (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN110321377A (en) A kind of multi-source heterogeneous data true value determines method and device
CN106411896B (en) Network security situation prediction method based on APDE-RBF neural network
CN102202012B (en) Group dividing method and system of communication network
CN101699432B (en) Ordering strategy-based information filtering system
CN108846472A (en) A kind of optimization method of Adaptive Genetic Particle Swarm Mixed Algorithm
CN108287881A (en) A kind of optimization method found based on random walk relationship
US20170300580A1 (en) System and method for identifying contacts of a target user in a social network
Zhao et al. Multi-strategy ensemble firefly algorithm with equilibrium of convergence and diversity
CN115525038A (en) Equipment fault diagnosis method based on federal hierarchical optimization learning
CN114385376B (en) Client selection method for federal learning of lower edge side of heterogeneous data
CN109657147A (en) Microblogging abnormal user detection method based on firefly and weighting extreme learning machine
CN110852435A (en) Neural evolution calculation model
CN108052743B (en) Method and system for determining step approach centrality
Salehi et al. Enhanced genetic algorithm for spam detection in email
Tang et al. A symmetric points search and variable grouping method for large-scale multi-objective optimization
CN104850646B (en) A kind of Frequent tree mining method for digging for single uncertain figure
Lee et al. Automatic clustering with differential evolution using cluster number oscillation method
CN114004326A (en) ELM neural network optimization method based on improved suburb algorithm
Gong et al. Parallel genetic algorithms on line topology of heterogeneous computing resources
Zhao et al. A web service composition method based on merging genetic algorithm and ant colony algorithm
Gao et al. High utility itemsets mining based on hybrid harris hawk optimization and beluga whale optimization algorithms
CN116305297B (en) Data analysis method and system for distributed database
Ling et al. An MOEA/D‐ACO with PBI for Many‐Objective Optimization
CN107301564A (en) Abnormal consumer behavior detection method based on clustering algorithm and echo state network
Dai et al. Group-based competitive influence maximization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant