CN110321377A - A kind of multi-source heterogeneous data true value determines method and device - Google Patents
A kind of multi-source heterogeneous data true value determines method and device Download PDFInfo
- Publication number
- CN110321377A CN110321377A CN201910340361.3A CN201910340361A CN110321377A CN 110321377 A CN110321377 A CN 110321377A CN 201910340361 A CN201910340361 A CN 201910340361A CN 110321377 A CN110321377 A CN 110321377A
- Authority
- CN
- China
- Prior art keywords
- value
- data
- true value
- statement
- optimal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/256—Integrating or interfacing systems involving database management systems in federated or virtual databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of multi-source heterogeneous data true value and determines method and device, can carry out Combined Treatment to isomery colliding data and improve the accuracy rate of true value discovery.The described method includes: S1, obtains the isomery colliding data from different data sources;S2, to the colliding data of description same target, every an object and all objects for all data sources are constructed respectively to maximize statement value credibility weighted sum as the objective function G of target and Optimized model F;S3 updates the weight of all data sources using the true value selection strategy based on the method for exhaustion for every an object;S4 calculates F value, judges whether Optimized model F restrains according to obtained F value according to the weight of updated all data sources, if not restraining, returns to S3 and continues to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set.The present invention relates to the field of data mining.
Description
Technical field
The present invention relates to the field of data mining, particularly relates to a kind of multi-source heterogeneous data true value and determine method and device.
Background technique
With the arrival of big data era, data just like become a huge precious deposits, this is collected by many websites and company
A little data are government, enterprise and the public provide service.People can obtain the description of same entity from various data sources, this
It brings great convenience to people's lives.But for same thing, different data sources may provide different retouch
It states, wherein these information result in serious data collision and affect people couple there are some information for not meeting truth
In the judgement of the truth of the matter.
In order to clear up data collision, a kind of effective solution method is true value discovery, i.e., from the conflict for describing same entity
True description is found in data.Data source may be isomery for the statement of entity different aspect, but existing true value
It was found that method does not have the Combined Treatment ability to isomeric data all just for the data of single type, and have ignored for true
The selection strategy of value can not correctly find true value.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of multi-source heterogeneous data true value to determine method and device, with solution
Certainly present in the prior art can not Combined Treatment isomeric data, the problem of being easily trapped into local optimum.
In order to solve the above technical problems, the embodiment of the present invention, which provides a kind of multi-source heterogeneous data true value, determines method, wrap
It includes:
S1 obtains the isomery colliding data from different data sources;
S2, to the colliding data of description same target, every an object and all objects for all data sources, with number
It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target
Objective function G and Optimized model F;
S3, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection
Value determines G value according to the reference true value of selection as true value is referred to, and when G value maximum, current reference true value is current right
The optimal true value of elephant updates the weight of all data sources according to the optimal true value of obtained all objects;
S4 calculates F value, judges that Optimized model F is according to obtained F value according to the weight of updated all data sources
No convergence returns to S3 and continues to execute if not restraining;If convergence, the optimal true value of obtained all objects forms optimal true value
Collection.
Further, to the colliding data of description same target, for all objects of all data sources, building with
Maximizing the Optimized model F that statement value credibility weighted sum is target indicates are as follows:
Wherein, wnFor data source SnWeight, N indicate data source number, K indicate object number, An,kFor data source
SnFor object OkThe statement value of offer, f (An,k) it is statement value An,kCredibility, s.t. indicate constraint condition, A*,kFor object Ok
Statement value, TkFor object OkTrue value and be A*,kSubset;
Wherein, for the object O of all data sourcesk, building to maximize statement value credibility weighted sum as target
Objective function G (k) is indicated are as follows:
Wherein, as object OkCertain statement value when objective function G (k) being made to reach maximum, which is exactly object Ok's
True value indicates are as follows:
Further, data source SnWeight wnIt indicates are as follows:
Further, statement value An,kCredible f (An,k) indicate are as follows:
Wherein, β is the support factor, NkFor data source SnIt is supplied to object OkStatement value quantity, sim () indicate phase
Like degree function, sup () indicates that statement value supports function.
Further, if colliding data is classification data, similarity function is indicated are as follows:
Further, if colliding data is continuous data, similarity function is indicated are as follows:
Further, statement value support is expressed as:
sup(An,k,Ai,k)=sim (An,k,Ai,k)。
Further, described to be directed to every an object, using the true value selection strategy based on the method for exhaustion, select statement value collection
Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true
Value is that the optimal true value of existing object updates the weight packet of all data sources according to the optimal true value of obtained all objects
It includes:
S31, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection
Value is as with reference to true value;
S32 supports to come really to each similarity with reference to true data calculation object each statement value and its, and in conjunction with statement value
Determine statement value credibility, according to the data source weight that determining statement value credibility and last iteration obtain, determines G value;
S33, when judging whether G value is maximum value, if so, current reference true value is the optimal true value of existing object;It is no
Then, then it returns to S31 and continues iteration;
The optimal true value of obtained all objects is formed preliminary optimal truth set, according to obtained all objects by S34
The corresponding statement value credibility set of preliminary optimal truth set, update the weight of all data sources.
The embodiment of the present invention also provides a kind of multi-source heterogeneous data true value determining device, comprising:
Module is obtained, for obtaining the isomery colliding data from different data sources;
Module is constructed, for the colliding data to description same target, every an object for all data sources and all
Object is constructed respectively using data source weight and statement value credibility as optimized variable to maximize statement value credibility weighted sum
For the objective function G and Optimized model F of target;
Update module, using the true value selection strategy based on the method for exhaustion, selects statement value collection for being directed to every an object
Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true
Value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;
Determining module calculates F value for the weight according to updated all data sources, is judged according to obtained F value
Whether Optimized model F restrains, if not restraining, returns to update module and continues to execute;If convergence, obtained all objects are most
Excellent true value forms optimal truth set.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, the isomery colliding data from different data sources is obtained;To the number of collisions of description same target
According to, every an object and all objects for all data sources, using data source weight and statement value credibility as optimized variable,
It is constructed respectively to maximize statement value credibility weighted sum as the objective function G of target and Optimized model F;For every an object,
Using the true value selection strategy based on the method for exhaustion, select the statement value in statement value set as true value is referred to, according to selection
G value is determined with reference to true value, and when G value maximum, current reference true value is the optimal true value of existing object, according to obtained institute
There is the optimal true value of object, updates the weight of all data sources;According to the weight of updated all data sources, F value is calculated,
Judge whether Optimized model F restrains according to obtained F value, if convergence, the optimal true value composition of obtained all objects is most
Excellent truth set;In this way, the true value selection strategy based on the method for exhaustion, can carry out Combined Treatment to isomery colliding data, and overcome
Local optimum influences and accurately finds globally optimal solution, to improve the accuracy rate of true value discovery.
Detailed description of the invention
Fig. 1 is the flow diagram that multi-source heterogeneous data true value provided in an embodiment of the present invention determines method;
Fig. 2 is the schematic illustration that multi-source heterogeneous data true value provided in an embodiment of the present invention determines method;
Fig. 3 is the detailed process schematic diagram that multi-source heterogeneous data true value provided in an embodiment of the present invention determines method;
Fig. 4 is the structural schematic diagram of multi-source heterogeneous data true value determining device provided in an embodiment of the present invention.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
The present invention for the problem that it is existing can not Combined Treatment isomeric data, be easily trapped into local optimum, one kind be provided
Multi-source heterogeneous data true value determines method and device.
Embodiment one
As shown in Figure 1, multi-source heterogeneous data true value provided in an embodiment of the present invention determines method, comprising:
S1 obtains the isomery colliding data from different data sources;
S2, to the colliding data of description same target, every an object and all objects for all data sources, with number
It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target
Objective function G and Optimized model F;
S3, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection
Value determines G value according to the reference true value of selection as true value is referred to, and when G value maximum, current reference true value is current right
The optimal true value of elephant updates the weight of all data sources according to the optimal true value of obtained all objects;
S4 calculates F value, judges that Optimized model F is according to obtained F value according to the weight of updated all data sources
No convergence returns to S3 and continues to execute if not restraining;If convergence, the optimal true value of obtained all objects forms optimal true value
Collection.
Multi-source heterogeneous data true value described in the embodiment of the present invention determines method, obtains from the different of different data sources
Structure colliding data;To the colliding data of description same target, every an object and all objects for all data sources, with number
It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target
Objective function G and Optimized model F;Statement value collection is selected using the true value selection strategy based on the method for exhaustion for every an object
Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true
Value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;Root
According to the weight of updated all data sources, F value is calculated, judges whether Optimized model F restrains according to obtained F value, if receiving
It holds back, then the optimal true value of all objects obtained forms optimal truth set;In this way, the true value selection strategy based on the method for exhaustion, energy
It is enough that Combined Treatment is carried out to isomery colliding data, and overcome local optimum to influence and accurately find globally optimal solution, to improve
The accuracy rate of true value discovery.
In the present embodiment, to every an object, provide a statement value set, state the statement value in value set be by
Duplicate removal processing, without repetition values.
In the present embodiment, based on the observation to real world colliding data, the embodiment of the present invention proposes following inspiration
Formula:
(1) true value of object looks like identical or similar in most of data sources;
(2) data source for providing the high statement value of more credibilities may more provide true statement value;
(3) the statement value that the high data source of weight provides more may be close to the true value of object.
In the present embodiment, for the ease of research, it is assumed that each object is that only one is true for independent and each object
Value.As shown in Fig. 2, to the colliding data of description same target, true value is found the problem, and it is excellent to be converted into based on above-mentioned heuristic
Change problem, devises a kind of multi-source heterogeneous data true value discovery Optimized model, and the Optimized model is credible to maximize statement value
Weighted sum is target, and using data source weight and statement value credibility as optimized variable, updates the two in an iterative manner
Optimized variable continues to optimize model, wherein weight is data source weight, which indicates are as follows:
Wherein, F is all objects for all data sources, is constructed to maximize statement value credibility weighted sum as mesh
Target Optimized model;wnFor data source SnWeight, N indicate data source number, K indicate object number, An,kFor data source
SnFor object OkThe statement value of offer, f (An,k) it is statement value An,kCredibility, s.t. indicate constraint condition, A*,kFor object Ok
Statement value set, TkFor object OkTrue value and be A*,kSubset.
In Optimized model, data source weight and statement value credibility are all known variables, therefore can be using iteration
Method both updates to continue to optimize model.
In the present embodiment, since each object is mutually indepedent, for the object O of all data sourcesk, building with
Maximizing the objective function G (k) that statement value credibility weighted sum is target can indicate are as follows:
As object OkSome statement value when objective function G (k) being made to reach maximum, which is exactly object OkIt is true
Value indicates are as follows:
In the present embodiment, each iteration is considered as a suboptimization, every suboptimization selects plan using the true value based on the method for exhaustion
Slightly, by all statement values in statement value set, successively as true value is referred to, then exporting is maximized objective function G (k)
Statement value is as object OkTrue value, then continue to optimize model until Optimized model F restrains, Optimized model at this time obtains
Truth set be best truth set.
In the present embodiment, data source weight is calculated by statement value credibility, data source SnWeight wnIt indicates are as follows:
Wherein,Indicate data source SnThe sum of all object statement value credibilities provided,
Indicate the sum of the statement value credibility of all objects that all data sources provide.
In the present embodiment, statement value credibility is collectively constituted by statement value similarity and the support of statement value, statement value An,k
Credible f (An,k) indicate are as follows:
Wherein, β is the support factor, NkFor data source SnIt is supplied to object OkStatement value quantity, sim () indicate phase
Like degree function, sup () indicates that statement value supports function.
In the present embodiment, colliding data can be divided into: classification data and continuous data both data types, wherein point
Class data and continuous data are two kinds of common isomeric datas, give corresponding similarity function for both data,
In,
Classification data uses 0-1 similarity function, then the similarity function of classification data indicates are as follows:
Continuous data using normalization square root relative error as similarity function, then the similarity letter of continuous data
Number indicates are as follows:
In the present embodiment, corresponding similarity function is used for different types of data source, it can be to multiple types
Data source carry out Combined Treatment, thus preferably assess data source quality and improve true value discovery accuracy rate.
In the present embodiment, statement value supports the similarity for statement value and other statement values for describing same target, indicates
It is as follows:
sup(An,k,Ai,k)=sim (An,k,Ai,k) (8)
In the present embodiment, the support of statement value has been introduced into the calculating of statement value credibility, can accelerate to optimize and ask
The solution of topic gets rid of local optimum, rapidly to current optimal close, improve result accuracy and to accelerate multi-source heterogeneous data true
It is worth the convergence rate of the method for determination.
In the specific embodiment that aforementioned multi-source heterogeneous data true value determines method, further, described use is directed to
Every an object, the true value selection strategy based on the method for exhaustion, select statement value set in statement value as refer to true value, according to
The reference true value of selection determines G value, and when G value maximum, current reference true value is the optimal true value of existing object, according to
The optimal true value of all objects arrived, the weight for updating all data sources include:
S31, for every an object, using the true value selection strategy based on the method for exhaustion, the statement in value set is stated in selection
Value is as with reference to true value;
S32 supports to come really to each similarity with reference to true data calculation object each statement value and its, and in conjunction with statement value
Determine statement value credibility, according to the data source weight that determining statement value credibility and last iteration obtain, determines G value;
S33, when judging whether G value is maximum value, if so, current reference true value is the optimal true value of existing object;It is no
Then, then it returns to S31 and continues iteration;
The optimal true value of obtained all objects is formed preliminary optimal truth set, according to obtained all objects by S34
The corresponding statement value credibility set of preliminary optimal truth set, update the weight of all data sources.
It is further, described according to update in the specific embodiment that aforementioned multi-source heterogeneous data true value determines method
The weight of all data sources afterwards calculates F value, judges whether Optimized model F restrains according to obtained F value, if not restraining, return
S3 is returned to continue to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set and includes:
It can according to the corresponding statement value of preliminary optimal truth set of the weight of updated all data sources and all objects
The value of letter property set calculation optimization model F;
According to obtained F value, judge whether Optimized model F restrains, if not restraining, returns to S3 and continue to execute;If convergence,
Then current preliminary optimal truth set is final optimal truth set.
As shown in figure 3, multi-source heterogeneous data true value described in the present embodiment determines method in order to better understand, to it
Workflow is described in detail, and the method can specifically include following steps:
Step 1 carries out data mining by modes such as web crawlers first, obtains from network from different data
The isomery colliding data in source (for example, the microbloggings such as Sina, Tencent, each public platform, the search engines such as Baidu, Google), and will own
Data source weights initialisation is identical value;
Step 2 is supported before iteration starts according to the statement value that formula (8) calculate all statement values;
Step 3 starts iteration, after each iteration starts, for every an object, using the true value based on the method for exhaustion
Selection strategy successively selects the statement value in statement value set as with reference to true value, refers to each sound of true data calculation object to each
The similarity of bright value and its, and the support of statement value is combined to determine that statement value is credible by formula (5), according to determining statement
The data source weight that the credible and last iteration of value obtains, calculates G value according to formula (2), and when G value maximum, reference at this time is true
Value is optimal true value;
The optimal true value of obtained all objects is formed preliminary optimal truth set by step 4, all right according to what is obtained
The corresponding statement value credibility set of preliminary optimal truth set of elephant, passes through the weight that formula (4) update all data sources, terminates
Iteration;Otherwise, then return step three continues iteration;
Step 5, the preliminary optimal truth set according to the weight of updated all data sources and all objects are corresponding
The value of statement value credibility set calculation optimization model F, repeat Step 3: four iterative process, until Optimized model F restrain,
Then current truth set is exported as optimal truth set.
Multi-source heterogeneous data true value described in the present embodiment determines method, can be used for the field of data mining, on network
The colliding data excavated carries out data cleansing, and true value is found from colliding data, helps to improve the information matter of network environment
Amount provides more accurate information service for the public, business and government, reduces because losing caused by colliding data.
The advantageous effects of the above technical solutions of the present invention are as follows:
(1) multi-source heterogeneous data true value proposed by the present invention finds Optimized model, solves true value by the method for optimization,
Not vulnerable to the influence of data distribution, and corresponding similarity function is used for different types of data, it can be to a variety of
The data of type carry out Combined Treatment, to preferably assess data source quality and improve the accuracy rate of true value discovery.
(2) support of statement value has been introduced into the calculating of statement value credibility by the present invention, and the addition that statement value is supported can
Local optimum is got rid of to accelerate the solution of optimization problem, rapidly to current optimal close, result accuracy improved and accelerates
Algorithm the convergence speed.
(3) the invention proposes the true value selection strategies based on the method for exhaustion, although the method for exhaustion is more complicated, it can be with
Overcome local optimum to influence and accurately finds globally optimal solution.
Embodiment two
The present invention also provides a kind of specific embodiments of multi-source heterogeneous data true value determining device, since the present invention mentions
The multi-source heterogeneous data true value determining device supplied determines that the specific embodiment of method is opposite with aforementioned multi-source heterogeneous data true value
Answer, the multi-source heterogeneous data true value determining device can by execute above method specific embodiment in process step come
It achieves the object of the present invention, therefore above-mentioned multi-source heterogeneous data true value determines the explanation in method specific embodiment,
It is below specific in the present invention suitable for the specific embodiment of multi-source heterogeneous data true value determining device provided by the invention
It will not be described in great detail in embodiment.
As shown in figure 4, the embodiment of the present invention also provides a kind of multi-source heterogeneous data true value determining device, comprising:
Module 11 is obtained, for obtaining the isomery colliding data from different data sources;
Module 12 is constructed, for the colliding data to description same target, every an object and institute for all data sources
There is object, using data source weight and statement value credibility as optimized variable, is constructed respectively to maximize the weighting of statement value credibility
With the objective function G and Optimized model F for target;
Update module 13, using the true value selection strategy based on the method for exhaustion, selects statement value for being directed to every an object
Statement value in set, which is used as, refers to true value, determines G value according to the reference true value of selection, when G value maximum, current reference
True value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;
Determining module 14 calculates F value, is sentenced according to obtained F value for the weight according to updated all data sources
Whether disconnected Optimized model F restrains, if not restraining, returns to update module 13 and continues to execute;If convergence, obtained all objects
Optimal true value form optimal truth set.
Multi-source heterogeneous data true value determining device described in the embodiment of the present invention is obtained from the different of different data sources
Structure colliding data;To the colliding data of description same target, every an object and all objects for all data sources, with number
It is optimized variable according to source weight and statement value credibility, is constructed respectively to maximize statement value credibility weighted sum as target
Objective function G and Optimized model F;Statement value collection is selected using the true value selection strategy based on the method for exhaustion for every an object
Statement value in conjunction, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference is true
Value is that the optimal true value of existing object updates the weight of all data sources according to the optimal true value of obtained all objects;Root
According to the weight of updated all data sources, F value is calculated, judges whether Optimized model F restrains according to obtained F value, if receiving
It holds back, then the optimal true value of all objects obtained forms optimal truth set;In this way, the true value selection strategy based on the method for exhaustion, energy
It is enough that Combined Treatment is carried out to isomery colliding data, and overcome local optimum to influence and accurately find globally optimal solution, to improve
The accuracy rate of true value discovery.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications
Also it should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of multi-source heterogeneous data true value determines method characterized by comprising
S1 obtains the isomery colliding data from different data sources;
S2, to the colliding data of description same target, every an object and all objects for all data sources are weighed with data source
Weight and statement value credibility are optimized variable, are constructed respectively to maximize statement value credibility weighted sum as the objective function of target
G and Optimized model F;
S3, for every an object, using the true value selection strategy based on the method for exhaustion, select statement value in statement value set as
With reference to true value, G value is determined according to the reference true value of selection, when G value maximum, current reference true value is the optimal of existing object
True value updates the weight of all data sources according to the optimal true value of obtained all objects;
S4 calculates F value, judges whether Optimized model F receives according to obtained F value according to the weight of updated all data sources
It holds back, if not restraining, returns to S3 and continue to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set.
2. multi-source heterogeneous data true value according to claim 1 determines method, which is characterized in that description same target
Colliding data, for all objects of all data sources, building to maximize statement value credibility weighted sum as the excellent of target
Changing model F indicates are as follows:
Wherein, wnFor data source SnWeight, N indicate data source number, K indicate object number, An,kFor data source SnFor
Object OkThe statement value of offer, f (An,k) it is statement value An,kCredibility, s.t. indicate constraint condition, A*,kFor object OkSound
Bright value, TkFor object OkTrue value and be A*,kSubset;
Wherein, for the object O of all data sourcesk, building to maximize statement value credibility weighted sum as the target letter of target
Number G (k) indicates are as follows:
Wherein, as object OkCertain statement value when objective function G (k) being made to reach maximum, which is exactly object OkTrue value,
It indicates are as follows:
3. multi-source heterogeneous data true value according to claim 2 determines method, which is characterized in that data source SnWeight wn
It indicates are as follows:
4. multi-source heterogeneous data true value according to claim 3 determines method, which is characterized in that statement value An,kIt is credible
Property f (An,k) indicate are as follows:
Wherein, β is the support factor, NkFor data source SnIt is supplied to object OkStatement value quantity, sim () indicate similarity
Function, sup () indicate that statement value supports function.
5. multi-source heterogeneous data true value according to claim 4 determines method, which is characterized in that if colliding data is classification
Data, then similarity function indicates are as follows:
6. multi-source heterogeneous data true value according to claim 5 determines method, which is characterized in that if colliding data is continuous
Data, then similarity function indicates are as follows:
7. multi-source heterogeneous data true value according to claim 6 determines method, which is characterized in that statement value is supported to indicate
Are as follows:
sup(An,k,Ai,k)=sim (An,k,Ai,k)。
8. multi-source heterogeneous data true value according to claim 1 determines method, which is characterized in that described to be directed to every a pair
As selecting the statement value in statement value set as true value is referred to, according to selection using the true value selection strategy based on the method for exhaustion
Reference true value determine G value, when G value maximum, current reference true value is the optimal true value of existing object, according to obtained institute
There is the optimal true value of object, the weight for updating all data sources includes:
S31, for every an object, using the true value selection strategy based on the method for exhaustion, the statement value in selection statement value set is made
For with reference to true value;
S32 is supported to each similarity with reference to true data calculation object each statement value and its, and in conjunction with statement value to determine statement
Value is credible, according to the data source weight that determining statement value credibility and last iteration obtain, determines G value;
S33, when judging whether G value is maximum value, if so, current reference true value is the optimal true value of existing object;Otherwise, then
It returns to S31 and continues iteration;
The optimal true value of obtained all objects is formed preliminary optimal truth set, according to the preliminary of obtained all objects by S34
The corresponding statement value credibility set of optimal truth set, updates the weight of all data sources.
9. multi-source heterogeneous data true value according to claim 8 determines method, which is characterized in that described according to updated
The weight of all data sources, calculate F value, judge whether Optimized model F restrains according to obtained F value, if not restraining, return S3 after
It is continuous to execute;If convergence, the optimal true value of obtained all objects forms optimal truth set and includes:
It is credible according to the corresponding statement value of preliminary optimal truth set of the weight of updated all data sources and all objects
Gather the value of calculation optimization model F;
According to obtained F value, judge whether Optimized model F restrains, if not restraining, returns to S3 and continue to execute;If convergence, currently
Preliminary optimal truth set be optimal truth set.
10. a kind of multi-source heterogeneous data true value determining device characterized by comprising
Module is obtained, for obtaining the isomery colliding data from different data sources;
Module is constructed, for the colliding data to description same target, every an object and all objects for all data sources,
Using data source weight and statement value credibility as optimized variable, constructed respectively to maximize statement value credibility weighted sum as target
Objective function G and Optimized model F;
Update module, for being directed to every an object, using the true value selection strategy based on the method for exhaustion, selection is stated in value set
Statement value, which is used as, refers to true value, determines G value according to the reference true value of selection, and when G value maximum, current reference true value is current
The optimal true value of object updates the weight of all data sources according to the optimal true value of obtained all objects;
Determining module calculates F value for the weight according to updated all data sources, according to obtained F value judgement optimization mould
Whether type F restrains, if not restraining, returns to update module and continues to execute;If convergence, the optimal true value group of obtained all objects
At optimal truth set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910340361.3A CN110321377B (en) | 2019-04-25 | 2019-04-25 | Multi-source heterogeneous data truth value determination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910340361.3A CN110321377B (en) | 2019-04-25 | 2019-04-25 | Multi-source heterogeneous data truth value determination method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110321377A true CN110321377A (en) | 2019-10-11 |
CN110321377B CN110321377B (en) | 2021-07-23 |
Family
ID=68113240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910340361.3A Active CN110321377B (en) | 2019-04-25 | 2019-04-25 | Multi-source heterogeneous data truth value determination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321377B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708816A (en) * | 2020-05-15 | 2020-09-25 | 西安交通大学 | Multi-truth-value conflict resolution method based on Bayesian model |
CN113535693A (en) * | 2020-04-20 | 2021-10-22 | 中国移动通信集团湖南有限公司 | Data true value determination method and device for mobile platform and electronic equipment |
CN115932702A (en) * | 2023-03-14 | 2023-04-07 | 武汉格蓝若智能技术股份有限公司 | Voltage transformer online operation calibration method and device based on virtual standard device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933052A (en) * | 2014-03-17 | 2015-09-23 | 华为技术有限公司 | Data true value estimation method and data true value estimation device |
CN107193967A (en) * | 2017-05-25 | 2017-09-22 | 南开大学 | A kind of multi-source heterogeneous industry field big data handles full link solution |
US20170344558A1 (en) * | 2016-05-26 | 2017-11-30 | International Business Machines Corporation | Graph method for system sensitivity analyses |
CN108564101A (en) * | 2017-12-29 | 2018-09-21 | 天津南大通用数据技术股份有限公司 | A kind of data fusion method and device based on more hierarchical cluster attributes |
US20180365779A1 (en) * | 2017-06-14 | 2018-12-20 | Global Tel*Link Corporation | Administering pre-trial judicial services |
CN109284316A (en) * | 2018-09-11 | 2019-01-29 | 中国人民解放军战略支援部队信息工程大学 | True value based on data source Multi-attributes finds method |
-
2019
- 2019-04-25 CN CN201910340361.3A patent/CN110321377B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933052A (en) * | 2014-03-17 | 2015-09-23 | 华为技术有限公司 | Data true value estimation method and data true value estimation device |
US20170344558A1 (en) * | 2016-05-26 | 2017-11-30 | International Business Machines Corporation | Graph method for system sensitivity analyses |
CN107193967A (en) * | 2017-05-25 | 2017-09-22 | 南开大学 | A kind of multi-source heterogeneous industry field big data handles full link solution |
US20180365779A1 (en) * | 2017-06-14 | 2018-12-20 | Global Tel*Link Corporation | Administering pre-trial judicial services |
CN108564101A (en) * | 2017-12-29 | 2018-09-21 | 天津南大通用数据技术股份有限公司 | A kind of data fusion method and device based on more hierarchical cluster attributes |
CN109284316A (en) * | 2018-09-11 | 2019-01-29 | 中国人民解放军战略支援部队信息工程大学 | True value based on data source Multi-attributes finds method |
Non-Patent Citations (2)
Title |
---|
冯钦 等: "基于多蚁群同步优化的多真值发现算法", 《计算机软件及计算机应用》 * |
马如霞 等: "MTruths:Web信息多真值发现方法", 《计算机研究与发展》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535693A (en) * | 2020-04-20 | 2021-10-22 | 中国移动通信集团湖南有限公司 | Data true value determination method and device for mobile platform and electronic equipment |
CN113535693B (en) * | 2020-04-20 | 2023-04-07 | 中国移动通信集团湖南有限公司 | Data true value determination method and device for mobile platform and electronic equipment |
CN111708816A (en) * | 2020-05-15 | 2020-09-25 | 西安交通大学 | Multi-truth-value conflict resolution method based on Bayesian model |
CN115932702A (en) * | 2023-03-14 | 2023-04-07 | 武汉格蓝若智能技术股份有限公司 | Voltage transformer online operation calibration method and device based on virtual standard device |
Also Published As
Publication number | Publication date |
---|---|
CN110321377B (en) | 2021-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321377A (en) | A kind of multi-source heterogeneous data true value determines method and device | |
CN106411896B (en) | Network security situation prediction method based on APDE-RBF neural network | |
CN102202012B (en) | Group dividing method and system of communication network | |
CN101699432B (en) | Ordering strategy-based information filtering system | |
CN108846472A (en) | A kind of optimization method of Adaptive Genetic Particle Swarm Mixed Algorithm | |
CN108287881A (en) | A kind of optimization method found based on random walk relationship | |
US20170300580A1 (en) | System and method for identifying contacts of a target user in a social network | |
Zhao et al. | Multi-strategy ensemble firefly algorithm with equilibrium of convergence and diversity | |
CN115525038A (en) | Equipment fault diagnosis method based on federal hierarchical optimization learning | |
CN114385376B (en) | Client selection method for federal learning of lower edge side of heterogeneous data | |
CN109657147A (en) | Microblogging abnormal user detection method based on firefly and weighting extreme learning machine | |
CN110852435A (en) | Neural evolution calculation model | |
CN108052743B (en) | Method and system for determining step approach centrality | |
Salehi et al. | Enhanced genetic algorithm for spam detection in email | |
Tang et al. | A symmetric points search and variable grouping method for large-scale multi-objective optimization | |
CN104850646B (en) | A kind of Frequent tree mining method for digging for single uncertain figure | |
Lee et al. | Automatic clustering with differential evolution using cluster number oscillation method | |
CN114004326A (en) | ELM neural network optimization method based on improved suburb algorithm | |
Gong et al. | Parallel genetic algorithms on line topology of heterogeneous computing resources | |
Zhao et al. | A web service composition method based on merging genetic algorithm and ant colony algorithm | |
Gao et al. | High utility itemsets mining based on hybrid harris hawk optimization and beluga whale optimization algorithms | |
CN116305297B (en) | Data analysis method and system for distributed database | |
Ling et al. | An MOEA/D‐ACO with PBI for Many‐Objective Optimization | |
CN107301564A (en) | Abnormal consumer behavior detection method based on clustering algorithm and echo state network | |
Dai et al. | Group-based competitive influence maximization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |