CN109635004A - A kind of object factory providing method, device and the equipment of database - Google Patents
A kind of object factory providing method, device and the equipment of database Download PDFInfo
- Publication number
- CN109635004A CN109635004A CN201811527172.9A CN201811527172A CN109635004A CN 109635004 A CN109635004 A CN 109635004A CN 201811527172 A CN201811527172 A CN 201811527172A CN 109635004 A CN109635004 A CN 109635004A
- Authority
- CN
- China
- Prior art keywords
- description
- value
- confidence
- truth
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 239000013598 vector Substances 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 23
- 230000014509 gene expression Effects 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 11
- 238000012163 sequencing technique Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 5
- 239000012141 concentrate Substances 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 15
- 239000002023 wood Substances 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of object factory providing methods of database, comprising: receiving each database is the description that object provides, and a database is that the collection for the description that object provides is combined into a value collection;All database is that the union for the description that object provides is the probable value collection of object;The confidence level that the probable value concentrates each description value is calculated, and each description value is ranked up according to confidence level size, true value of the highest description value of top n confidence level as object is therefrom screened, obtains target true value collection;N is greater than 1;Description in target true value collection is shown.The present invention can carry out more true value discoveries to object in the colliding data that different data sources provide for object, and the scope of application is wider, and accuracy is higher;Also, it allows users to view multiple descriptions an of object as a result, the experience of user is more preferable.The invention also discloses a kind of device based on the above method, equipment and computer readable storage mediums.
Description
Technical Field
The invention relates to the technical field of databases, in particular to a method for providing object description of a database. The invention also relates to an object description providing device, equipment and a computer readable storage medium of the database.
Background
With the rapid development of network technology and the widespread use of intelligent devices, data has not previously been generated and created at all rates. However, while big data changes many levels of modern society, we can often observe that different databases (or data sources) can provide conflicting descriptions for the same physical object. These conflicts are often due to input errors, data obsolescence, recording loss, etc., which if applied in practice may cause significant damage and economic loss. For example, data used for drug recommendation in medical systems is used for stock price forecasting in the stock market, which presents significant problems. Therefore, it is a popular research to find the true value (correct description) that best meets the reality from multiple data sources to solve the conflict.
So far, there has been a lot of work to deal with the truth finding problem, however, the current algorithm usually assumes only one truth value for each object and is not suitable for finding multiple truth values for the same object. However, in the real world, it may be more common for an object to have multiple truth values. For example, a book typically has multiple authors, and a movie may have several directors. Therefore, the current discovery algorithm of the single true value has a narrow application range and low result accuracy.
Therefore, how to provide an object description providing method for a database capable of discovering multiple truth values is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a method for providing object description of a database, which can find an object in the conflict data provided for the object by different data sources in a multi-truth value mode, and has wider application range and higher accuracy; moreover, the user can see a plurality of description results of one object, and the user experience is better; another object of the present invention is to provide an apparatus, a device and a computer readable storage medium based on the above method.
In order to solve the above technical problem, the present invention provides a method for providing object description of a database, comprising:
step s 1: receiving descriptions provided by each database for an object, wherein a set of descriptions provided by one database for the object is a value set; the union of the descriptions provided by all the databases for the object is the possible value set of the object;
step s 2: calculating the confidence of each description value in the possible value set, sequencing each description value according to the confidence, and screening the description values with the highest first N confidences as the truth values of the object to obtain a target truth value set; n is greater than 1;
step s 3: displaying a description within the target truth set.
Preferably, said step s2 includes:
step s 21: calculating the credibility of each database according to the current truth set of the object; the initial confidence level without the truth set is a specific value;
step s 22: sequentially calculating the confidence of each description in the possible value set of the object according to the confidence;
step s 23: sorting the confidence degrees from large to small;
step s 24: according to the sorting sequence, sequentially adding the description corresponding to each confidence coefficient into a truth value set to obtain a candidate truth value set, and calculating the value of a target function of the currently obtained candidate truth value set; the objective function is:
a is the current candidate truth set, and A is a value set provided by the database s for the object e; sim (a, a) is the cosine similarity between a and a; e is the set of all objects; s (v) is a set of all databases that provide a set of values for the objects to which the descriptions in a belong; t(s) is the confidence level of the database s;
step s 25: and if the value of the target function obtained by current calculation is smaller than the value of the target function corresponding to the previous candidate truth value set, taking the set of all descriptions of the target function calculated before as the target truth value set.
Preferably, after the step s25 and before the step s3, the method further includes:
judging whether the change of the confidence coefficient vector of the target truth value set and the confidence coefficient vector of the last target truth value set is smaller than a preset vector difference, if so, entering a step s 3; otherwise, the target truth set is used as the current truth set of the object, and the step s21 is returned.
Preferably, the process of step s21 includes:
calculating the credibility of each database according to the current truth value set of the object and the credibility relational expression; the initial confidence level without the truth set is a specific value; the reliability relational expression is as follows:
wherein t(s) is the credibility of the database s; v(s) is the total value set provided for database s, and V is one value set in V(s); a is the truth set of the object corresponding to V, and A is the value set of the object corresponding to V.
Preferably, the process of step s22 includes:
sequentially calculating the confidence of each description in the possible value set of the object according to the confidence and the confidence relation; the confidence relation is as follows:
wherein t(s) is the credibility of the database s; c (v) confidence for describing v; s (V) is a set of all databases that provide a set of values for describing the objects to which v belongs; v is a set of values describing to which V belongs.
Preferably, after step s22, step s23 is preceded by:
determining a similar description value list similar to the description in the possible value set according to a similarity rule;
calculating the similarity degree between each similar description value in the similar description value list and the description;
integrating according to the similarity degree of each similar description value and the description and the confidence degree of the description to obtain the harmonic confidence degree of the description;
repeating the operation until all the described harmonic confidence degrees are obtained;
accordingly, the reconciliation confidences are ranked from high to low in step s 23.
Preferably, the process of calculating the similarity degree between each similar description value in the similar description value list and the description includes:
calculating the similarity degree between each similar description value in the similar description value list and the description according to a support degree relational expression; the support degree relational expression is as follows:
wherein sup (v ', v) is the similarity between the similar description value v ' and the description v, Z ' is the word set of the similar description value v ', Z is the word set of the description v, m is the number of words in Z ', n is the number of words in Z, Z ' [ i ] is the ith word in Z ', and Z [ j ] is the jth word in Z; the isSame (Z ' i, Z [ j ]) belongs to {0,1}, when the Z ' i and the Z [ j ] are equal, the isSame (Z ' i, Z [ j ]) takes 1, otherwise, the isSame takes 0;
the process of obtaining the reconciliation confidence level of the description by integrating the similarity degree of each similar description value and the description and the confidence level of the description comprises the following steps:
substituting the similarity degree of each similar description value and the description and the confidence degree of the description into a harmonic confidence degree relation to obtain the harmonic confidence degree of the description; the reconciliation confidence relationship is:
c (v) is the reconciliation confidence describing v; ρ is a proportional parameter, and c (v ') is the confidence of the similar description value v';is the list of similar descriptive values.
In order to solve the above technical problem, the present invention further provides an object description providing apparatus for a database, including:
the receiving module is used for receiving the description provided by each database for the object, and the set of the description provided by one database for the object is a value set; the union of the descriptions provided by all the databases for the object is the possible value set of the object;
the multi-truth screening module is used for calculating the confidence coefficient of each description value in the possible value set, sequencing each description value according to the confidence coefficient, screening the description values with the highest first N confidence coefficients as the truth values of the object from the confidence coefficients, and obtaining a target truth value set; n is greater than 1;
and the display module is used for displaying the description in the target truth value set.
In order to solve the above technical problem, the present invention further provides an object description providing device for a database, including:
a memory for storing a computer program;
a processor for implementing the steps of the object description providing method of the database as claimed in any one of the above when executing the computer program.
To solve the above technical problem, the present invention further provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the object description providing method of the database according to any one of the above.
The invention provides a method for providing object description of a database, which comprises the steps of after receiving descriptions provided by each database for an object, respectively taking each description as a candidate truth value set, then calculating the confidence coefficient of each description value in a possible value set, sequencing each description value according to the confidence coefficient, and screening the description value with the highest N confidence coefficients as the true value of the object; candidate truth set where N is greater than 1. The set of possible values here refers to the union of the descriptions provided by the entire database for the object. It will be appreciated that the truth set for an entity object should be the set of all values provided by the maximum proximity conflict data source, i.e., the truth set for an object should be the set of descriptions of the maximum proximity object's possible value sets. Therefore, by means of the method and the device, a truth set which is closest to the possible value set of the object can be screened, and a plurality of descriptions are included in the truth set. Therefore, the method and the device can perform multi-truth value discovery on the object in the conflict data provided for the object by different data sources, and have wider application range and higher accuracy compared with a single truth value discovery algorithm. Compared with the scheme that only one object description condition can be observed, the method and the device enable the user to observe a plurality of description results of one object, and the user experience is better. The invention also provides a device, equipment and a computer readable storage medium based on the method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart of a process of an object description providing method of a database according to the present invention;
FIG. 2 is a flow chart of a process of an object description providing method of another database provided by the present invention;
FIG. 3 is a flow chart of a process of an object description providing method of another database provided by the present invention;
fig. 4 is a schematic structural diagram of an object description providing apparatus of a database according to the present invention.
Detailed Description
The core of the invention is to provide a method for providing object description of a database, which can find the true value of an object in the conflict data provided for the object by different data sources, and has wider application range and higher accuracy; moreover, the user can see a plurality of description results of one object, and the user experience is better; the other core of the invention is to provide a device, equipment and a computer readable storage medium based on the method.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present invention provides a method for providing object description of a database, referring to fig. 1, fig. 1 is a flowchart of a process of the method for providing object description of a database provided by the present invention; the method comprises the following steps:
step s 1: receiving descriptions provided by each database for the object, wherein a set of the descriptions provided by one database for the object is a value set; the union of the descriptions provided by all the databases for the object is the possible value set of the object;
the database is used for providing description data of mutual conflict for truth finding problems, and may be a database of a website or an internal database of a bank, a hospital and the like. An object represents a unique presence that can be identified in the real world, for example: a book or a movie. A truth set for an object refers to the set of truths that all values consistent with the real world constitute in the set of possible values for the object.
Step s 2: calculating the confidence of each description value in the possible value set, sequencing each description value according to the confidence, and screening the description values with the highest first N confidences as the truth values of the object to obtain a target truth value set; n is greater than 1;
the truth set for an entity object should be maximally close to all the value sets provided by the conflicting data sources, i.e., the truth set for an object should be maximally close to all the value sets provided for objects by different databases. Therefore, the value set composed of the N values most similar to all the value sets of the object of the database is found from the whole description of the object of the database, namely the target truth set. Since the present invention is intended to solve the problem of multiple truth values, N is greater than 1.
Step s 3: a description within the target truth set is displayed.
The description of the object by the database can be understood as the definition content of the object stored in the database. The invention aims to screen and display all descriptions closer to the real world from all descriptions of an object in a plurality of databases, so that the descriptions finally seen by a user are consistent with the real world, and the descriptions are simply correct descriptions seen by the user, thereby avoiding damage and economic loss to the user caused by misleading the user by wrong descriptions as much as possible, and having better user experience.
The invention provides a method for providing object description of a database, which comprises the steps of after receiving descriptions provided by each database for an object, respectively taking each description as a candidate truth value set, then calculating the confidence coefficient of each description value in a possible value set, sequencing each description value according to the confidence coefficient, and screening the description value with the highest N confidence coefficients as the true value of the object; candidate truth set where N is greater than 1. The set of possible values here refers to the union of the descriptions provided by the entire database for the object. It will be appreciated that the truth set for an entity object should be the most closely to all of the value sets provided by the conflicting data sources, i.e., the truth set for an object should have the greatest similarity to the value sets provided for objects by different databases. Therefore, by the method, N description values with high confidence coefficient can be screened out as the truth value set of the object, and N is greater than 1. Therefore, the method and the device can perform multi-truth value discovery on the object in the conflict data provided for the object by different data sources, and have wider application range and higher accuracy compared with a single truth value discovery algorithm. Compared with the scheme that only one object description condition can be observed, the method and the device enable the user to observe a plurality of description results of one object, and the user experience is better.
Generally speaking, a database will provide descriptive value information for multiple attributes of an object, however, the confidence level of the database may be different for each attribute, and therefore each attribute type needs to be processed separately. In the present invention, it is assumed that an object has only one attribute to simplify the discussion.
Preferably, referring to fig. 2, fig. 2 is a flowchart of a process of an object description providing method for another database provided by the present invention; step s2 includes:
step s 21: calculating the credibility of each database according to the current truth set of the object; the initial confidence level without the truth set is a specific value;
here, the confidence level is measured by the average similarity between all value sets provided by the database and the truth set of the object, so that the higher the confidence level of the database is, the higher the similarity between the value set provided by the database and the truth set of the object is, and vice versa, the lower the similarity between the value set provided by the database and the truth set of the object is.
Step s 22: sequentially calculating the confidence of each description in the possible value set of the object according to the confidence;
in statistics, the Confidence interval (Confidence interval) of a probability sample is an interval estimate for some overall parameter of this sample. The confidence interval exhibits the extent to which the true value of this parameter has a certain probability of falling around the measurement. The confidence interval indicates the degree of plausibility of the measured value of the measured parameter, i.e. the "certain probability" required above, which is referred to as the confidence level. Therefore, a description with a high confidence level will be more likely to be the true value of the object. A set of truth values (i.e., a target truth set) for the constituent objects may be selected from a plurality of descriptions with higher confidence levels.
Step s 23: sequencing the confidence degrees from large to small;
the purpose of the ordering is that, since a higher confidence level indicates that the description is more likely to be the true value, the objective function value of the description with a higher confidence level will generally be larger when the objective function is subsequently calculated. Therefore, the confidence degrees are ranked from high to low, so as to facilitate the subsequent calculation of the objective function of the description corresponding to each confidence degree in sequence, and quickly confirm which descriptions corresponding to the confidence degrees belong to the true value.
Step s 24: according to the sequencing sequence, sequentially adding the description corresponding to each confidence coefficient into the truth value set to obtain a candidate truth value set, and calculating the value of the target function of the currently obtained candidate truth value set; the objective function is:
a is the current candidate truth set, and A is a value set provided by the database s for the object e; sim (a, a) is the cosine similarity between a and a; e is the set of all objects; s (v) is a set of all databases that provide a set of values for the objects to which the descriptions in a belong; t(s) is the confidence level of the database s;
step s 25: and if the value of the objective function obtained by current calculation is smaller than the value of the objective function corresponding to the previous candidate truth value set, taking the set of all descriptions of the previously calculated objective function as a target truth value set.
It will be appreciated that the foregoing has mentioned that the objective function value is used to indicate how similar the candidate truth set is to all value sets of the object, and that the greater the similarity, the more likely the description contained in the candidate truth set belongs to the objective truth set. And sequentially and independently putting the possible values into the candidate truth value sets according to the confidence degrees in the descending order, then calculating the similarity between each candidate truth value set and all value sets of the object, reserving the candidate truth value set with larger sum of the similarity, and finally, taking the description set in the N candidate truth value sets with the largest sum of the similarity as the target truth value set.
It is understood that the derivation of the objective function described above is as follows.
Assume that the database set is S ═ S1,s2,s3,…,smAnd m represents the number of databases. Set of objects as E ═ E1,e2,e3,…,enN denotes the number of objects. The set of values provided by the database s for the object e is denoted by V. The possible value set of object e is then the union of all value sets provided by S for e, and is used And the length of V is expressed by L. Object e may have multiple Truth values, with Truth representing the object's Truth set, which is a subset of V. Therefore, the multiple truth finding problem in the present invention is: given a set S of collision databases and a set E of objects, the aim is to find its Truth set Truth in the set V of possible values of an object.
Cosine similarity is commonly used for calculating similarity between document vectors, words in a text are mapped to a vector space to form a mapping relation between word frequency and vector data in the document, and the similarity between the documents is obtained by calculating the cosine similarity between two vectors. For example, the similarity of the following two sentences is calculated:
sentence a: "I love mother, also love father";
sentence B: "I love father, love mother".
Firstly, segmenting words of sentences to obtain a segmentation set { I love mother also father better }, and calculating word frequency: sentence a: i1, love 2, mother 1, also 1, father 1, more 0;
sentence B: i 1, love 2, mother 1, also 0, father 1, more 1.
The word frequency vectors a (1,2,1,1,1,0) and B (1,2,1,0,1,1) are obtained. And calculating the cosine similarity of the two vectors to obtain the similarity of the two sentences.
Therefore, cosine similarity is incorporated herein into the similarity calculation between value sets. With A*Representing the truth set of the object. Let A denote the vector of value set V, and the length of A is the length of the object's possible value set.
The ith element value in a is therefore:
wherein A and A are finally obtained*Are both arrays of 1 and 0. V*[i]Set of possible values V representing an object*The ith element of (1). For example: set of possible values V*And if the value set V is { a, b, c, d, e }, the binary vector a of V is (1,0,1,1, 0). The invention adopts cosine similarity to calculate the similarity value between two value vectors. The cosine similarity relationship is:
A1,A2vector of values for two data sets。
Then, the objective function of the invention is obtained according to the relation of the cosine similarity and the credibility of the database,
so far, the multiple rational discovery has been translated into an optimization problem. According to the objective function, several descriptions with high confidence level can be selected from the possible value set of the object to form an objective truth value set.
In addition, the invention specifically measures the credibility of the database by calculating the average similarity between all value sets provided by the database and the object truth set. Preferably, the process of step s21 includes:
calculating the credibility of each database according to the current truth value set of the object and the credibility relational expression; the initial confidence level without the truth set is a specific value; the reliability relationship is:
wherein t(s) is the credibility of the database s; v(s) is the total value set provided for database s, and V is one value set in V(s); a is the truth set of the object corresponding to V, and A is the value set of the object corresponding to V.
Preferably, in the multi-truth finding problem, descriptions with high confidence degrees are more likely to be the truth values of the objects, so the invention can select a plurality of descriptions with high confidence degrees as the truth sets of the objects. According to the objective function, the embodiment adopts a greedy selection strategy: and sequencing the description of the object according to the degree of the confidence, and then preferentially selecting the description with high confidence as a true value of the object. For description v, its confidence is calculated by weighted voting of each database. The process of step s22 includes:
sequentially calculating the confidence of each description in the possible value set of the object according to the confidence and the confidence relation; the confidence relationship is:
wherein t(s) is the credibility of the database s; c (v) confidence for describing v; s (V) is a set of all databases that provide a set of values for describing the objects to which v belongs; v is a set of values describing to which V belongs.
Specifically, the following is an algorithm for calculating a target truth set according to the present invention, in which after ranking the confidence levels, the ranked array is a vector W. According to W, the temporal complexity of the algorithm is O (ML). Algorithm-1: Entitytutrhs fining
Input:V*,{t(s)|s∈S};
Output:Truth.
Steps:
Of course, the above is only a specific embodiment, and the invention is not limited to the content of the specific implementation algorithm.
In a preferred embodiment, after step s25 and before step s3, the method further comprises:
step s 26: judging whether the change of the confidence coefficient vector of the target truth value set and the confidence coefficient vector of the last target truth value set is smaller than a preset vector difference, if so, entering a step s 3; otherwise, the target truth set is taken as the current truth set of the object and the process returns to step s 21. The preset vector difference may be 0.01%, but the present invention is not limited thereto.
As described above, if the trustworthiness of the database is known, then the truth set of the objects can be derived, whereas the trustworthiness of the database can only be derived based on the truth set of the objects. However, since the truth set of the object is unknown at the beginning, which results in that the credibility of the database is not calculated, in this embodiment, an iterative method is used to jointly derive the credibility of the database and the truth set of the object, in this process, although the information of the actual credibility of the data source and the truth set of the object is not known at the beginning, each iteration is further close to the actual credibility of the database and the actual truth set of the object, and the iteration stops until the convergence condition is satisfied. The general flow of iteration is given below:
first, the confidence levels of all databases are set to an initial value T0(T0 is an estimated average confidence level, and may be T0 ═ 0.9, for example), and then iterative computation is started. Each iteration is divided into two steps: 1) calculating a truth set of objects using the database credibility obtained from the last iteration; 2) and calculating the credibility of the database by using the truth set of the object obtained in the last iteration. And so on until finally reaching a steady state. The steady state being measured by the change in the confidence level of the database, i.e. vectorsTo indicate that the user is not in a normal position,is the difference value between the vector of the current obtained target truth value set and the vector of the last calculated target truth value set. The invention measures the similarity between two iterations by using cosineA change in (c). If only after an iterationIf the difference is smaller than the preset vector difference, the iteration is stopped, and the momentAnd the obtained target truth set is the final truth set and is displayed.
Specifically, the following is an algorithm of the above iterative process provided by the present invention. If the algorithm iterates K times, the temporal complexity of the algorithm is O (KMNL).
Algorithm-3:Multi-truth Finding(OptMTF)
Input:S,E;
Output:{Truth|e∈E}.
Steps:
1.initialize the trustworthiness of sources and get{t(s)|s∈S};
2.n=0;
3.do
4.n++;
5.while(e∈E)
6.compute Truth according to the algorithm 2;
7.end while
8.while(s∈S)
9.compute t(s)according to the formula(4);
10.end while
11.until(the convergence condition is satisfied)
12.return{Truth|e∈E};
Of course, the above is only a specific embodiment, and the invention is not limited to the content of the specific implementation algorithm.
In the multi-truth discovery problem, since different expressions of the same description are more common in reality, the influence of combining the similar descriptions can be more important, however, the existing truth discovery algorithm does not consider the different expressions of the description and neglects the support of the similar descriptions on truth. For example, the true value "Shell Wood" has another expression "Wood", which includes "Wood", so that the higher probability of "Shell Wood" becomes a true value. In practice, many false descriptions may be caused by incomplete data or lack of a certain part, and the false similar descriptions can be used to improve the confidence of the truth value, thereby improving the accuracy of the truth value discovery. Therefore, the invention adds the support of a similar description to a description into the confidence of the description, thereby improving the probability that the description becomes true and further improving the accuracy of truth finding.
Preferably, referring to fig. 3, fig. 3 is a flowchart of a process of an object description providing method of another database provided by the present invention; after step s22, step s23 is preceded by:
step s 220: judging whether the reconciliation confidence degrees of all descriptions are obtained currently or not, if so, entering a step s 231; otherwise, go to step s 221;
step s 221: determining a similar description value list similar to the description in the possible value set according to a similarity rule;
step s 222: calculating the similarity degree between each similar description value in the similar description value list and the description;
step s 223: integrating the similarity of each similar description value and the description confidence coefficient to obtain a described harmonic confidence coefficient;
accordingly, step s23 is modified to step s 231: and ordering the reconciliation confidence degrees from large to small.
It is understood that the present embodiment finds the reconciliation confidence by finding similar descriptions for each description and integrating the similarity of these similar descriptions to the currently calculated description into the currently calculated confidence. Because the similar descriptions and the current calculation description are likely to belong to the same description, the two descriptions need to be integrated to obtain a reconciliation confidence level, so that the probability that the current calculation description becomes a true value is improved, and the condition that the partial description confidence level is lower in the process of finding the true value and the partial description is not finally included in a true value set due to incomplete partial description is avoided, so that the accuracy of finding the true value is improved.
Further, the process of step s222 includes:
calculating the similarity degree between each similar description value in the similar description value list and the description according to the support degree relational expression; the support degree relation is as follows:
wherein sup (v ', v) is the similarity between the similar description value v ' and the description v, Z ' is the word set of the similar description value v ', Z is the word set of the description v, m is the number of words in Z ', n is the number of words in Z, Z ' [ i ] is the ith word in Z ', and Z [ j ] is the jth word in Z; the isSame (Z ' i, Z [ j ]) belongs to {0,1}, when the Z ' i and the Z [ j ] are equal, the isSame (Z ' i, Z [ j ]) takes 1, otherwise, the isSame takes 0; for example, when v ' is "Wood", v is "Shell Wood", sup (v ', v) ═ 1 but sup (v, v ') -1/2. It can be concluded that v has a higher probability of becoming a true value.
Further, the process of step s223 includes:
substituting the similarity of each similar description value and the description confidence degree into a harmonic confidence degree relation to obtain the described harmonic confidence degree; the reconciliation confidence relationship is:
c (v) is the reconciliation confidence describing v; ρ is a proportional parameter, and c (v ') is the confidence of the similar description value v';for the purpose of similar descriptionA list of values. Where ρ is a parameter between 0 and 1, which controls the influence of the similarity value. In order to obtain the harmonic confidence c (v) of v, its similarity value list sim (v) needs to be obtained. Based on a heuristic idea: different manifestations of a description and the description hardly appear in the same set of values. Therefore, a simple method is adopted, and the similar description values in sim (v) need to satisfy two conditions: 1) their support for description v is greater than zero. 2) They do not appear in the value set containing the description v. For example, "o 'least timing j" has two similar values "o' least linear i" and "timing j" according to condition 1. However, "o 'least time j" and "o' least linear i" appear in the same value set, so they are likely to be different values. So according to condition 2, a more correct value can be retained in the similarity value list. The specific algorithm is shown below.
Algorithm-2:Support list computingInput:V*,B={c(v)|v∈V*};
Output:
Steps:
Of course, the above is only a specific embodiment, and the invention is not limited to the content of the specific implementation algorithm.
The present invention further provides an object description providing device for a database, referring to fig. 4, where fig. 4 is a schematic structural diagram of the object description providing device for a database provided by the present invention. The device includes:
the system comprises a receiving module 1, a processing module and a processing module, wherein the receiving module is used for receiving descriptions provided by all databases for objects, and a set of the descriptions provided by one database for the objects is a value set; the union of the descriptions provided by all the databases for the object is the possible value set of the object;
the multi-truth screening module 2 is used for calculating the confidence degrees of all description values in the possible value set, sequencing all description values according to the confidence degrees, screening the description values with the highest first N confidence degrees as the truth values of the object from the confidence degrees, and obtaining a target truth value set; n is greater than 1;
and the display module 3 is used for displaying the description in the target truth set.
The object description providing device of the database provided by the invention is used for realizing the object description providing method of the database, so that the object description providing device of the database provided by the invention is in one-to-one correspondence with the object description providing method of the database.
The present invention also provides an object description providing device for a database, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the object description providing method of the database as in any one of the above when executing the computer program.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the object description providing method of the database according to any one of the above.
The above embodiments are only preferred embodiments of the present invention, and the above embodiments can be combined arbitrarily, and the combined embodiments are also within the scope of the present invention. It should be noted that other modifications and variations that may suggest themselves to persons skilled in the art without departing from the spirit and scope of the invention are intended to be included within the scope of the invention as defined by the appended claims.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. An object description providing method for a database, comprising:
step s 1: receiving descriptions provided by each database for an object, wherein a set of descriptions provided by one database for the object is a value set; the union of the descriptions provided by all the databases for the object is the possible value set of the object;
step s 2: calculating the confidence of each description value in the possible value set, sequencing each description value according to the confidence, and screening the description values with the highest first N confidences as the truth values of the object to obtain a target truth value set; n is greater than 1;
step s 3: displaying a description within the target truth set.
2. The method according to claim 1, wherein said step s2 comprises:
step s 21: calculating the credibility of each database according to the current truth set of the object; the initial confidence level without the truth set is a specific value;
step s 22: sequentially calculating the confidence of each description in the possible value set of the object according to the confidence;
step s 23: sorting the confidence degrees from large to small;
step s 24: according to the sorting sequence, sequentially adding the description corresponding to each confidence coefficient into a truth value set to obtain a candidate truth value set, and calculating the value of a target function of the currently obtained candidate truth value set; the objective function is:
a is the current candidate truth set, and A is a value set provided by the database s for the object e; sim (a, a) is the cosine similarity between a and a; e is the set of all objects; s (v) is a set of all databases that provide a set of values for the objects to which the descriptions in a belong; t(s) is the confidence level of the database s;
step s 25: and if the value of the target function obtained by current calculation is smaller than the value of the target function corresponding to the previous candidate truth value set, taking the set of all descriptions of the target function calculated before as the target truth value set.
3. The method of claim 2, wherein after step s25 and before step s3, further comprising:
judging whether the change of the confidence coefficient vector of the target truth value set and the confidence coefficient vector of the last target truth value set is smaller than a preset vector difference, if so, entering a step s 3; otherwise, the target truth set is used as the current truth set of the object, and the step s21 is returned.
4. A method according to claim 2 or 3, wherein the process of step s21 comprises:
calculating the credibility of each database according to the current truth value set of the object and the credibility relational expression; the initial confidence level without the truth set is a specific value; the reliability relational expression is as follows:
wherein t(s) is the credibility of the database s; v(s) is the total value set provided for database s, and V is one value set in V(s); a is the truth set of the object corresponding to V, and A is the value set of the object corresponding to V.
5. A method according to claim 2 or 3, wherein the process of step s22 comprises:
sequentially calculating the confidence of each description in the possible value set of the object according to the confidence and the confidence relation; the confidence relation is as follows:
wherein t(s) is the credibility of the database s; c (v) confidence for describing v; s (V) is a set of all databases that provide a set of values for describing the objects to which v belongs; v is a set of values describing to which V belongs.
6. The method of claim 5, wherein after step s22, step s23 further comprises:
determining a similar description value list similar to the description in the possible value set according to a similarity rule;
calculating the similarity degree between each similar description value in the similar description value list and the description;
integrating according to the similarity degree of each similar description value and the description and the confidence degree of the description to obtain the harmonic confidence degree of the description;
repeating the operation until all the described harmonic confidence degrees are obtained;
accordingly, the reconciliation confidences are ranked from high to low in step s 23.
7. The method according to claim 6, wherein the calculating of the similarity degree between each similar description value in the similar description value list and the description comprises:
calculating the similarity degree between each similar description value in the similar description value list and the description according to a support degree relational expression; the support degree relational expression is as follows:
wherein sup (v ', v) is the similarity between the similar description value v ' and the description v, Z ' is the word set of the similar description value v ', Z is the word set of the description v, m is the number of words in Z ', n is the number of words in Z, Z ' [ i ] is the ith word in Z ', and Z [ j ] is the jth word in Z; the isSame (Z ' i, Z [ j ]) belongs to {0,1}, when the Z ' i and the Z [ j ] are equal, the isSame (Z ' i, Z [ j ]) takes 1, otherwise, the isSame takes 0;
the process of obtaining the reconciliation confidence level of the description by integrating the similarity degree of each similar description value and the description and the confidence level of the description comprises the following steps:
substituting the similarity degree of each similar description value and the description and the confidence degree of the description into a harmonic confidence degree relation to obtain the harmonic confidence degree of the description; the reconciliation confidence relationship is:
c (v) is the reconciliation confidence describing v; ρ is a proportional parameter, and c (v ') is the confidence of the similar description value v';is the list of similar descriptive values.
8. An object description providing apparatus for a database, comprising:
the receiving module is used for receiving the description provided by each database for the object, and the set of the description provided by one database for the object is a value set; the union of the descriptions provided by all the databases for the object is the possible value set of the object;
the multi-truth screening module is used for calculating the confidence coefficient of each description value in the possible value set, sequencing each description value according to the confidence coefficient, screening the description values with the highest first N confidence coefficients as the truth values of the object from the confidence coefficients, and obtaining a target truth value set; n is greater than 1;
and the display module is used for displaying the description in the target truth value set.
9. An object description providing apparatus for a database, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the object description providing method of a database according to any of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the object description providing method of a database according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811527172.9A CN109635004B (en) | 2018-12-13 | 2018-12-13 | Object description providing method, device and equipment of database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811527172.9A CN109635004B (en) | 2018-12-13 | 2018-12-13 | Object description providing method, device and equipment of database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109635004A true CN109635004A (en) | 2019-04-16 |
CN109635004B CN109635004B (en) | 2023-05-05 |
Family
ID=66073719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811527172.9A Active CN109635004B (en) | 2018-12-13 | 2018-12-13 | Object description providing method, device and equipment of database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635004B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297974A (en) * | 2019-06-20 | 2019-10-01 | 百度(中国)有限公司 | Sort processing method, device, equipment and storage medium |
CN112651505A (en) * | 2020-12-18 | 2021-04-13 | 广州大学 | Truth value discovery method and system for knowledge verification |
CN113808230A (en) * | 2021-08-26 | 2021-12-17 | 华南理工大学 | Method, system, device and storage medium for improving electrical impedance imaging accuracy |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100083233A1 (en) * | 2008-09-26 | 2010-04-01 | Microsoft Corporation | Symbolic Runtime Checking of Quantified Contracts |
CN101819578A (en) * | 2010-01-25 | 2010-09-01 | 青岛普加智能信息有限公司 | Retrieval method, method and device for establishing index and retrieval system |
CN103026368A (en) * | 2010-07-30 | 2013-04-03 | 高通股份有限公司 | Object recognition using incremental feature extraction |
US8484225B1 (en) * | 2009-07-22 | 2013-07-09 | Google Inc. | Predicting object identity using an ensemble of predictors |
CA2904526A1 (en) * | 2013-03-13 | 2014-09-18 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for implementing data upload, processing, and predictive query api exposure |
WO2016050066A1 (en) * | 2014-09-29 | 2016-04-07 | 华为技术有限公司 | Method and device for parsing interrogative sentence in knowledge base |
US20160239711A1 (en) * | 2013-10-18 | 2016-08-18 | Vision Semanatics Limited | Visual Data Mining |
CN107784111A (en) * | 2017-11-06 | 2018-03-09 | 北京锐安科技有限公司 | Data digging method, device, equipment and storage medium |
CN107870959A (en) * | 2016-09-23 | 2018-04-03 | 奥多比公司 | Inquired about in response to video search and associated video scene is provided |
-
2018
- 2018-12-13 CN CN201811527172.9A patent/CN109635004B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100083233A1 (en) * | 2008-09-26 | 2010-04-01 | Microsoft Corporation | Symbolic Runtime Checking of Quantified Contracts |
US8484225B1 (en) * | 2009-07-22 | 2013-07-09 | Google Inc. | Predicting object identity using an ensemble of predictors |
CN101819578A (en) * | 2010-01-25 | 2010-09-01 | 青岛普加智能信息有限公司 | Retrieval method, method and device for establishing index and retrieval system |
CN103026368A (en) * | 2010-07-30 | 2013-04-03 | 高通股份有限公司 | Object recognition using incremental feature extraction |
CA2904526A1 (en) * | 2013-03-13 | 2014-09-18 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for implementing data upload, processing, and predictive query api exposure |
CN105229633A (en) * | 2013-03-13 | 2016-01-06 | 萨勒斯福斯通讯有限公司 | For realizing system, method and apparatus disclosed in data upload, process and predicted query API |
US20160239711A1 (en) * | 2013-10-18 | 2016-08-18 | Vision Semanatics Limited | Visual Data Mining |
WO2016050066A1 (en) * | 2014-09-29 | 2016-04-07 | 华为技术有限公司 | Method and device for parsing interrogative sentence in knowledge base |
CN107870959A (en) * | 2016-09-23 | 2018-04-03 | 奥多比公司 | Inquired about in response to video search and associated video scene is provided |
CN107784111A (en) * | 2017-11-06 | 2018-03-09 | 北京锐安科技有限公司 | Data digging method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
陈韶斌等: "一个图像数据库检索系统的结构设计和快速检索方法", 《计算机与数字工程》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297974A (en) * | 2019-06-20 | 2019-10-01 | 百度(中国)有限公司 | Sort processing method, device, equipment and storage medium |
CN110297974B (en) * | 2019-06-20 | 2021-07-20 | 百度(中国)有限公司 | Sorting processing method, device, equipment and storage medium |
CN112651505A (en) * | 2020-12-18 | 2021-04-13 | 广州大学 | Truth value discovery method and system for knowledge verification |
CN112651505B (en) * | 2020-12-18 | 2022-01-14 | 广州大学 | Truth value discovery method and system for knowledge verification |
CN113808230A (en) * | 2021-08-26 | 2021-12-17 | 华南理工大学 | Method, system, device and storage medium for improving electrical impedance imaging accuracy |
Also Published As
Publication number | Publication date |
---|---|
CN109635004B (en) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11227118B2 (en) | Methods, devices, and systems for constructing intelligent knowledge base | |
CN110188168B (en) | Semantic relation recognition method and device | |
WO2022116537A1 (en) | News recommendation method and apparatus, and electronic device and storage medium | |
CN108804641B (en) | Text similarity calculation method, device, equipment and storage medium | |
CN110569496B (en) | Entity linking method, device and storage medium | |
WO2019223552A1 (en) | Article recommendation method and apparatus, and computer device and storage medium | |
US10762150B2 (en) | Searching method and searching apparatus based on neural network and search engine | |
CN107992477A (en) | Text subject determines method, apparatus and electronic equipment | |
CN111008272A (en) | Knowledge graph-based question and answer method and device, computer equipment and storage medium | |
CN113656698B (en) | Training method and device for interest feature extraction model and electronic equipment | |
CN110765348B (en) | Hot word recommendation method and device, electronic equipment and storage medium | |
CN109635004B (en) | Object description providing method, device and equipment of database | |
CN112395487A (en) | Information recommendation method and device, computer-readable storage medium and electronic equipment | |
CN112131261A (en) | Community query method and device based on community network and computer equipment | |
WO2014107194A1 (en) | Identifying relevant user content | |
CN110135769A (en) | Kinds of goods attribute fill method and device, storage medium and electric terminal | |
CN111160699A (en) | Expert recommendation method and system | |
CN110262906B (en) | Interface label recommendation method and device, storage medium and electronic equipment | |
CN117435685A (en) | Document retrieval method, document retrieval device, computer equipment, storage medium and product | |
CN113139558A (en) | Method and apparatus for determining a multi-level classification label for an article | |
CN111221880B (en) | Feature combination method, device, medium, and electronic apparatus | |
CN115098728A (en) | Video retrieval method and device | |
CN112650869A (en) | Image retrieval reordering method and device, electronic equipment and storage medium | |
CN113239284B (en) | Method and device for determining sequence of recommendation information | |
CN115795023B (en) | Document recommendation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |