CN110399540B - Instance retrieval method integrating correlation function and D-HS index - Google Patents
Instance retrieval method integrating correlation function and D-HS index Download PDFInfo
- Publication number
- CN110399540B CN110399540B CN201910662850.0A CN201910662850A CN110399540B CN 110399540 B CN110399540 B CN 110399540B CN 201910662850 A CN201910662850 A CN 201910662850A CN 110399540 B CN110399540 B CN 110399540B
- Authority
- CN
- China
- Prior art keywords
- fuzzy
- formula
- interval
- correlation function
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
Abstract
An example retrieval method fusing a correlation function and a D-HS index comprises the following steps: the concept and the corresponding calculation formula of the fuzzy correlation function are provided by combining the fuzzy number, and then the single-dimensional correlation function combination similarity measurement method is provided by combining the analytic hierarchy process, compared with the full-dimensional correlation function calculation method, the method not only ensures the index precision but also improves the index speed when high-dimensional retrieval is carried out; nonlinear transformation is carried out on data by using a Sigmoid function, so that the problem of low index accuracy caused by uneven data distribution in the traditional D-HS index method is effectively solved; by fusing the improved method, the rigor and the accuracy of the traditional D-HS index are improved, the complexity that the correlation function needs to carry out similarity calculation on the full-instance library instance is avoided, and the retrieval utility problem is solved to a certain extent.
Description
Technical Field
The invention relates to an example retrieval method.
Background
How to measure the distance between the examples is a core problem of example retrieval, and a correlation function, a mixed attribute distance, a nearest neighbor method, an improvement research thereof and the like are successively proposed and are very effective in improving the index precision. However, the example retrieval is an organic combination of basic contradictions of index precision and index speed, and the increasingly complex calculation method brings huge calculation amount while improving the precision, thereby causing the reduction of the index speed. In addition, the indexing speed is not only limited by the indexing precision, but also influenced by the number of instances in the instance library, and in the construction process of the instance library, in order to enrich the instance knowledge as much as possible, designers often continuously update the instance library, so that part of the instance retrieval has the problem of influencing the indexing speed and is not beneficial to redundant calculation of the indexing precision, namely retrieval effectiveness.
Aiming at the contradiction between the index speed and the index precision, the invention combines the fuzzy number and the correlation function, provides the concept of the fuzzy correlation function and a calculation formula thereof, abandons the full-dimensional correlation function calculation method with huge calculation amount when facing multi-attribute products, and changes a single-dimensional correlation function combination similarity measurement method combining a hierarchy analysis method; meanwhile, the D-HS index is improved based on the Sigmoid function, and the measurement method and the improved D-HS index are fused, so that partial redundant calculation in instance retrieval is avoided, and the retrieval utility problem is effectively solved.
Disclosure of Invention
The invention provides an example retrieval method integrating a correlation function and a D-HS index, aiming at solving the problems of basic contradiction between index precision and index speed and retrieval utility commonly existing in example retrieval, and realizing faster and more reasonable retrieval.
The technical scheme adopted by the invention for solving the technical problems is as follows:
an example retrieval method fusing a correlation function and a D-HS index comprises the following steps:
S1: and improving a D-HS indexing method based on a Sigmoid function. Aiming at the problem of low index accuracy caused by the phenomenon of non-uniform distribution of example attribute values of an example library, the invention utilizes a Sigmoid function to carry out non-linear transformation on the attribute values, and the transformation can respectively expand and compress dense and sparse areas of data on the premise of not changing the overall distribution and arrangement condition of the data, thereby improving the identification of the data and the capability of dividing layers. The Sigmoid function is expressed as follows:
in the formula, the coefficient β is the median of the sample data, and the coefficient α ═ ln9/t, where the parameter t is the smaller of the distances from 90% quantile and 10% quantile to the median of the sample data.
S11: and carrying out logarithm processing on the original data in the example base by taking a natural logarithm e as a base to obtain ln data, and then carrying out descending order arrangement on the ln data.
S12: and (3) calculating the values of the coefficient alpha and the coefficient beta in the formula (1) according to the ln data, and substituting the ln data into the formula (1) to obtain Sigmoid data.
S13: and dividing the index interval according to the Sigmoid data to form an index table.
S2: preliminary index set SPAnd (6) obtaining. Converting the related attribute values of the target examples into Sigmoid data, substituting the Sigmoid data into an index table, calculating the attribute matching degree of each example in the example base relative to the target examples by combining a formula (2), and rejecting the examples with too low attribute matching degree in the example base (according to the number of the examples, generally 60% to 80% after rejection) to form a preliminary index set SP。
S3: preliminary index set SPAnd calculating an internal instance correlation function. At present, the measurement of the fuzzy parameter by the correlation function is in a blank stage, and the fuzzy parameter includes a fuzzy numerical type, a fuzzy interval value type and a fuzzy conceptual type. The invention combines the expression modes of the correlation function and the fuzzy number and provides the concept of the fuzzy correlation function and a related calculation formula. In addition, the invention provides a unit correlation function combination similarity measurement method combined with an analytic hierarchy process.
S31: the correlation function is similar to a membership function in a fuzzy set theory, and can quantitatively describe the degree of things belonging to a certain interval or having a certain property, wherein the expression (3) is a correlation function expression, and the expressions (4) and (5) are side distance and bit value expressions respectively.
D(x,X0,X)=ρ(x,x0,X)-ρ(x,x0,X0) (5)
In the formula (4), x0Respectively representing a point and an optimum point in space, X0Representing a range constraint in space; x is the number of1,x2Respectively representing points x and x0Of (a) and the intersection of the boundary of the interval X, where X1Is far from x0A closer point; ext (x)0x1) Representing along a straight line x0x1And with x1Ray of origin, Ext (x)0x2) The same process is carried out; x ═ x0Representing point x and optimum point x0Overlapping; -max { | x0M in M | } represents each vertex of the interval X, -max { | X0M | } means to take the optimum point x0The negative number of the maximum distance from each vertex of the interval.
From the formula (5), the bit value D (X, X)0X) is equal to the side distance rho (X, X) in X0X) and with X0Is the side distance rho (x, x) of the interval0,X0) A difference of (d); intervals X and X0Respectively represent a feasible interval and an ideal interval, soWhen the intervals X and X0D (X, X) in absence of common boundary0X) is constantly less than 0, D (X, X) when a common boundary exists0And X) is constantly equal to or less than 0.
S32: the fuzzy number can be generally represented by a triangular fuzzy number and a trapezoidal fuzzy number, and the expressions (6 and 7) are respectively membership function expressions.
In the formula, s1,s2Two support points, sMIs a peak point, the membership degree of the support point and the two sides of the support point is 0, and the membership degree of the peak point is 1; in practical application, triangular fuzzy number is availableAnd (4) showing.
In the formula, t1,t2Is two supporting points, the membership degree of the supporting points and the two sides of the supporting points is 0 whenThe membership degree is 1; in practical application, trapezoidal fuzzy number can be usedAnd (4) showing.
Suppose an example ciIs of fuzzy numerical type (about)) And fuzzy interval value type (about)) Then the mth attribute can be blurred with a triangleIs shown, in which:
based on the expression mode of the fuzzy number, the invention calculates three elements (an optimal point, an ideal interval and a feasible interval) of the analog-to-correlation function, and calculates the triangular fuzzy numbersMAnalogize to the optimal point x0Interval [ s ]1,s2]Analogizing to a feasible interval [ X ]1,X2]Then, according to the user's requirement and in combination with the design standard, the ideal interval [ x ] is obtained from the optimal point and the feasible interval1,x2](ii) a Similarly, for trapezoidal fuzzy numbersInterval(s)Can be analogized to an ideal interval [ x ]1,x2]Interval [ t ]1,t2]Analogizing to a feasible interval [ X ]1,X2]And combining the intervals to obtain an optimal point x0。
S33: a fuzzy numerical parameter correlation function metric. According to the expressions and the relations of the fuzzy number and the correlation function, the invention provides the fuzzy numerical value type lateral distance rho (x, x)0X) expression and mathematical calculation are shown in formula (10) and formula (11), respectively.
In the formula (I), the compound is shown in the specification,representing triangles Qxx0The area of (a) is,representing a trapezoid Qx0The area of xM (same reason as the rest);meaning taking the triangle QX1x0Area and triangle QX2x0The greater value in area, since the present invention specifies X1Is far from x0Relatively close to each other, therefore
The correlation function needs to calculate the feasible interval X and the ideal interval X respectively0In the face of fuzzy numerical parameters, the two calculation methods are similar, that is, the feasible region X is [ X ]1,X2]Substitution into ideal interval X0=[x1,x2]Therefore, the side pitch ρ (x, x) will not be described in detail0,X0) The computational graph and the mathematical computation.
The lateral distance rho (x, x)0X) and lateral distance ρ (X, X)0,X0) The calculation result is substituted into the formula (3) and the formula (5) to obtain the correlation function value corresponding to the fuzzy numerical parameter.
S34: fuzzy interval value type parameter correlation function measurement. The invention provides a fuzzy interval value type side distance rho (x, x)0X) expression and mathematical calculation are shown in formula (12) and formula (13), respectively.
Different from fuzzy numerical parameters, the calculation methods of the distances between two sides of the fuzzy interval value parameters are different, and an ideal interval X is0Side distance ρ (x, x)0,X0) Is expressed by the formula (14), and the formula (15) is a corresponding mathematical calculation formula.
The lateral distance rho (x, x)0X) and lateral distance ρ (X, X)0,X0) The correlation function value corresponding to the fuzzy interval value type parameter can be obtained by substituting the calculation result of the formula (3) and the formula (5).
S35: a fuzzy conceptual parametric correlation function metric. Aiming at fuzzy conceptual parameters, the invention introduces a fuzzy semantic conversion table (shown in table 1) to quantize fuzzy words, combines the quantized value with fuzzy numerical parameters, namely represents the fuzzy value by using a triangular fuzzy number (if certain performance is good, the fuzzy numerical parameter can be converted into about 1.0, and if certain performance is poor, the fuzzy numerical parameter can be converted into about 0.2), and then obtains the correlation function value according to a corresponding formula.
TABLE 1 fuzzy semantic conversion Table
S36: an overall similarity measure. The correlation function calculation formula of the deterministic parameter and the fuzzy parameter is given above, the correlation function value of each dimension attribute between the examples can be obtained according to the formula, and the target example and the preliminary index set S can be obtained by combining each attribute weight obtained by the analytic hierarchy processPThe combined correlation function of the inner example is used for drawing the similarity of the two, and the calculation formula is as follows:
in the formula, k (A)*) Representing a combined correlation function, k (A)i) A correlation function, w, representing an attribute iiAre the corresponding weights.
S4: similar examples can be used as new examples to be re-stored after retrieval, reuse and modification, so that the distribution condition of the attribute values of all dimensions of the example library is continuously changed in the process, and the distribution condition of the attribute values after nonlinear change is also continuously changed, so that the division of corresponding intervals is not constant, and the division operation of the intervals can be completed when the example library is updated and maintained without being performed before each retrieval.
Compared with the prior art, the invention has the advantages that:
1. the concept and the corresponding calculation formula of the fuzzy correlation function are provided by combining the fuzzy number, and the single-dimensional correlation function combination similarity measurement method is provided by combining the analytic hierarchy process.
2. The data is subjected to nonlinear transformation by using the Sigmoid function, and the problem of low indexing accuracy caused by uneven data distribution in the traditional D-HS indexing method is effectively solved.
3. By fusing the improved method, the rigor and the accuracy of the traditional D-HS index are improved, the complexity that the correlation function needs to carry out similarity calculation on the full-instance library instance is avoided, and the retrieval utility problem is solved to a certain extent.
4. The embodiment is accessed in a partition mode, so that the dynamic update and maintenance of data in the embodiment library are facilitated;
drawings
FIG. 1 is a graphical representation of the improvement and fusion of the correlation function with the D-HS index.
Fig. 2 is a flow chart of the operation of the present invention.
Fig. 3a to 3b are diagrams of triangle-trapezoid blur numbers, where fig. 3a shows a triangle blur number and fig. 3b shows a trapezoid blur number.
Fig. 4a to 4b are graphs of correlation function calculation based on the fuzzy number, wherein fig. 4a shows a triangular fuzzy number-correlation function, and fig. 4b shows a trapezoidal fuzzy number-correlation function.
FIGS. 5a to 5e show blur numerical side distances ρ (x, x)0X) calculation diagram, where fig. 5a represents X e Ext (X)0,X1) FIG. 5b shows x ∈ x0X1,|x0X1|≠0,x≠x0FIG. 5c shows x ∈ x0X2FIG. 5d shows x ∈ Ext (x)0X2) Fig. 5e shows that x is x0。
FIGS. 6a to 6g show blur section value type side distances ρ (x, x)0X) calculation diagram, where fig. 6a represents X e Ext (X)0,X1) FIG. 6b shows x ∈ x1X1,|x1X1| ≠ 0, FIG. 6c denotes x ∈ x0x1,|x0x1|≠0,x≠x0FIG. 6d shows x ∈ x0x2FIG. 6e shows x ∈ x2X2FIG. 6f shows x ∈ Ext (x)0X2) Fig. 6g shows that x ═ x0。
FIGS. 7a to 7e show blur section value type side distances ρ (x, x)0,X0) Graphical representation of the calculation, where FIG. 7a represents x ∈ Ext (x)0,x1) FIG. 7b shows x ∈ x0x1,|x0x1|≠0,x≠x0FIG. 7c shows x ∈ x0x2FIG. 7d shows x ∈ Ext (x)0x2) Fig. 7e shows that x is x0。
Fig. 8a to 8c are data comparison diagrams of original data of an example library before and after nonlinear transformation, wherein fig. 8a shows original normalized data, fig. 8b shows ln normalized data, and fig. 8c shows Sigmoid data.
Fig. 9a to 9b are graphs of fitting the fuzzy correlation function to the correlation function, wherein fig. 9a shows the fuzzy correlation function, and fig. 9b shows the correlation function.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention takes 150 examples in a certain vacuum pump example library as sample data. The method comprises the following steps:
S1: and obtaining the related attribute value of the target instance from the user, and obtaining the corresponding weight by combining an analytic hierarchy process. Table 2 lists attribute values and corresponding weights of the target instances, where Sigmoid data of the interval-type parameters is a corresponding value of the midpoint of the interval;
TABLE 2 target instance attribute values and corresponding weights
S2: and calling original data of vacuum pumps of various models and division conditions of various intervals after nonlinear transformation from an instance library to form an index table. Tables 3, 4 and 5 are index tables corresponding to the key attributes of the vacuum pump, i.e., pumping rate, operating power and failure rate, respectively (in view of the higher search dimension and the larger number of examples, the index tables for the 9-dimensional attributes are all listed in space).
TABLE 3 index Table-air extraction Rate
TABLE 4 index Table-Power of operation
TABLE 5 index Table-failure Rate
S3: and (3) converting the related attribute values of the target examples into Sigmoid data, substituting the Sigmoid data into an index table, and calculating the attribute matching degree of each example in the example base relative to the target examples by combining a formula (2), wherein the calculation result is shown in a table 6.
TABLE 6 calculation of Attribute match
S4: according to the distribution of the attribute matching degrees of each instance in the table 6, the similar instances with SA being more than or equal to 4 are taken to form a primary index set SPAnd obtaining S according to the calculation method set forth in section 2PThe association function of each dimension attribute of the internal instance is combined with the corresponding weight to obtain a combined association function. TABLE 7 is SPThe results of the single-dimensional correlation function and the combined correlation function calculations for each instance within.
TABLE 7 results of single-dimensional and combinatorial relevance function computations (preliminary index set S)P)
Comparing the combined correlation function k (A) in Table 7, the example c is known39,c44The similarity with the target instance is highest, and the target instance can be preferentially used as an object for reusing and modifying the subsequent instance, and the table 8 shows the model numbers of the two instances and the related attributes.
TABLE 8 example c39,c44Model number and associated attributes
S5: and the above examples are retrieved, reused and modified to be used as new examples to be re-stored in the library, and the interval division condition of the example library is updated.
The description of the examples is given for the sake of illustration only, and it is not intended to limit the scope of the invention to the particular forms set forth, but rather to the extent that such equivalents are known to those skilled in the art and can be made on the basis of the teachings herein.
Claims (1)
1. An example retrieval method fusing a correlation function and a D-HS index comprises the following steps:
S1: improving a D-HS indexing method based on a Sigmoid function; aiming at the problem of low index accuracy caused by the non-uniform distribution phenomenon of the attribute values of the instance library, the attribute values are subjected to nonlinear transformation by using a Sigmoid function, and the transformation can be used for respectively expanding and compressing dense and sparse areas of data on the premise of not changing the overall distribution and arrangement condition of the data, so that the identification of the data and the capability of dividing the hierarchy are improved; the Sigmoid function is expressed as follows:
wherein, the coefficient beta is the median of the sample data, the coefficient alpha is ln9/t, and the parameter t is the smaller value of the distance from 90% quantile point and 10% quantile point of the sample data to the median;
S11: carrying out logarithm processing on original data in the example library by taking a natural logarithm e as a base to obtain ln data, and then carrying out descending order arrangement on the ln data;
S12: calculating the values of the coefficient alpha and the coefficient beta in the formula (1) according to the ln data, and then substituting the ln data into the formula (1) to obtain Sigmoid data;
S13: dividing an index interval according to the Sigmoid data to form an index table;
S2: preliminary index set SPObtaining; converting the related attribute values of the target examples into Sigmoid data, substituting the Sigmoid data into an index table, calculating the attribute matching degree of each example in the example base relative to the target examples by combining a formula (2), and eliminating the examples with too low attribute matching degree in the example base to form a primary index set SP;
S3: preliminary index set SPCalculating an internal instance correlation function; at present, the correlation letterMeasuring the number-to-fuzzy parameter in a blank stage, wherein the fuzzy parameter comprises a fuzzy numerical model, a fuzzy interval value model and a fuzzy conceptual model; therefore, the association function is combined with the expression mode of the fuzzy number, and the concept and the related calculation formula of the fuzzy association function are provided; in addition, a single-dimensional correlation function combination similarity measurement method is provided by combining an analytic hierarchy process;
S31: the correlation function is similar to a membership function in a fuzzy set theory, the degree that an object belongs to a certain interval or has certain properties can be quantitatively described, the formula (3) is a correlation function expression, and the formulas (4) and (5) are side distance and bit value expressions respectively;
D(x,X0,X)=ρ(x,x0,X)-ρ(x,x0,X0) (5)
in the formula (4), x0Respectively representing a point and an optimum point in space, X0Representing a range constraint in space; x is the number of1,x2Respectively representing points x and x0Of (a) and the intersection of the boundary of the interval X, where X1Is far from x0A closer point; ext (x)0x1) Representing along a straight line x0x1And with x1Ray of origin, Ext (x)0x2) The same process is carried out; x ═ x0Representing point x and optimum point x0Overlapping; -max { | x0M in M | } represents each vertex of the interval X, -max { | X0M | } means to take the optimum point x0The negative number of the maximum distance between the interval and each vertex;
from the formula (5), the bit value D (X, X)0X) is equal to the side distance rho (X, X) in X0X) and with X0Is the side distance rho (x, x) of the interval0,X0) A difference of (d); intervals X and X0Respectively represent feasible intervals andideal interval, thereforeWhen the intervals X and X0D (X, X) in absence of common boundary0X) is constantly less than 0, D (X, X) when a common boundary exists0X) is constantly equal to or less than 0;
S32: the fuzzy number is usually represented by a triangular fuzzy number and a trapezoidal fuzzy number, and the formula (6) and the formula (7) are membership function expressions of the triangular fuzzy number and the trapezoidal fuzzy number respectively;
in the formula, s1,s2Two support points, sMIs a peak point, the membership degree of the support point and the two sides of the support point is 0, and the membership degree of the peak point is 1; in practical application, triangular fuzzy number is usedRepresents;
in the formula, t1,t2Is two supporting points, the membership degree of the supporting points and the two sides of the supporting points is 0 whenThe membership degree is 1; in practical application, the trapezoidal fuzzy number is usedRepresents;
suppose an example ciThe m-th and n-th attributes are fuzzy numerical type and fuzzy interval value type respectively, and the expression mode is aboutAnd the combinationThen the mth attribute is blurred with trianglesIs shown, in which:
based on the expression of the fuzzy number, the relation function is analogized to calculate three elements, namely an optimal point, an ideal interval and a feasible interval, for the triangular fuzzy numbersMAnalogy is the optimal point x0Interval [ s ]1,s2]Analogy is the feasible region [ X1,X2]Then, according to the user's requirement and in combination with the design standard, the ideal interval [ x ] is obtained from the optimal point and the feasible interval1,x2](ii) a Similarly, for trapezoidal fuzzy numbersInterval(s)Analogy is the ideal interval [ x ]1,x2]Interval [ t ]1,t2]Analogy is the feasible region [ X1,X2]And combining the intervals to obtain an optimal point x0;
S33: fuzzy numerical parameter correlation function measurement; according to the expression and relation of the fuzzy number and the correlation function, the fuzzy numerical value type lateral distance rho (x, x) is provided0X) expression and mathematical calculation are respectively shown as formula (10) and formula (11);
in the formula (I), the compound is shown in the specification,representing triangles Qxx0The area of (a) is,representing a trapezoid Qx0The area of xM, the rest is the same;meaning taking the triangle QX1x0Area and triangle QX2x0Greater value in area, due to specification of X1Is far from x0Relatively close to each other, therefore
The correlation function needs to calculate the feasible interval X and the ideal interval X respectively0In the face of fuzzy numerical parameters, the two calculation methods are similar, that is, the feasible region X is [ X ]1,X2]Substitution into ideal interval X0=[x1,x2]Therefore, the side pitch ρ (x, x) will not be described in detail0,X0) The computational graph and the mathematical computation;
the lateral distance rho (x, x)0X) and lateral distance ρ (X, X)0,X0) Substituting the calculation results of the formula (3) and the formula (5) to obtain a correlation function value corresponding to the fuzzy numerical parameter;
S34: fuzzy interval value type parameter association function measurement; the fuzzy interval value type side distance rho (x, x) is provided0X) expression and mathematical calculation are respectively shown as formula (12) and formula (13);
different from fuzzy numerical parameters, the calculation methods of the distances between two sides of the fuzzy interval value parameters are different, and an ideal interval X is0Side distance ρ (x, x)0,X0) The expression (2) is shown as a formula (14), and the formula (15) is a corresponding mathematical calculation formula;
the lateral distance rho (x, x)0X) and lateral distance ρ (X, X)0,X0) Substituting the calculation result of the formula (3) and the formula (5) to obtain a correlation function value corresponding to the fuzzy interval value type parameter;
S35: fuzzy conceptual parameter correlation function measurement; aiming at fuzzy conceptual parameters, fuzzy semantics are introduced to carry out corresponding magnitude conversion, namely certain performance is very poor: 0.2; some performance is poor: 0.4; some properties are generally: 0.6; certain performance is good: 0.8; certain properties are good: 1.0; median values for the corresponding meanings given above: 0.1, 0.3, 0.5, 0.7, 0.9,and then combining the quantized value with a fuzzy numerical parameter, namely representing by using a triangular fuzzy number: some performance translates well into fuzzy numerical parameters: about 1.0; some performance translates poorly into fuzzy numerical parameters: about 0.2, and then obtaining a correlation function value according to a corresponding formula;
S36: an overall similarity measure; the correlation function calculation formula of the deterministic parameter and the fuzzy parameter is given above, the correlation function value of each dimension attribute between the examples is obtained according to the formula, and the target example and the preliminary index set S are obtained by combining each attribute weight obtained by the analytic hierarchy processPThe combined correlation function of the inner example is used for drawing the similarity of the two, and the calculation formula is as follows:
in the formula, k (A)*) Representing a combined correlation function, k (A)i) A correlation function, w, representing an attribute iiIs the corresponding weight;
S4: similar examples are retrieved, reused and modified to serve as new examples to be re-warehoused, so that the distribution situation of attribute values of all dimensions of the example warehouse is changed continuously in the process, the distribution situation of the attribute values after nonlinear change is also changed continuously, the division of the corresponding interval is not constant, the division operation of the interval is completed when the example warehouse is updated and maintained, and the division operation is not required to be performed before retrieval every time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910662850.0A CN110399540B (en) | 2019-07-22 | 2019-07-22 | Instance retrieval method integrating correlation function and D-HS index |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910662850.0A CN110399540B (en) | 2019-07-22 | 2019-07-22 | Instance retrieval method integrating correlation function and D-HS index |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110399540A CN110399540A (en) | 2019-11-01 |
CN110399540B true CN110399540B (en) | 2021-08-24 |
Family
ID=68324860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910662850.0A Active CN110399540B (en) | 2019-07-22 | 2019-07-22 | Instance retrieval method integrating correlation function and D-HS index |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399540B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104317838A (en) * | 2014-10-10 | 2015-01-28 | 浙江大学 | Cross-media Hash index method based on coupling differential dictionary |
CN104462018A (en) * | 2014-11-21 | 2015-03-25 | 浙江工业大学 | Similar case retrieval method based on multidimensional correlation function |
CN105512122A (en) * | 2014-09-22 | 2016-04-20 | 华为技术有限公司 | Ordering method and ordering device for information retrieval system |
CN108897791A (en) * | 2018-06-11 | 2018-11-27 | 云南师范大学 | A kind of image search method based on depth convolution feature and semantic similarity amount |
CN109460423A (en) * | 2018-10-18 | 2019-03-12 | 浙江工业大学 | A kind of low-carbon similar case retrieval method based on D-HS |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9529843B2 (en) * | 2011-09-02 | 2016-12-27 | Oracle International Corporation | Highly portable and dynamic user interface component to specify and perform simple to complex filtering on data using natural language-like user interface |
-
2019
- 2019-07-22 CN CN201910662850.0A patent/CN110399540B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512122A (en) * | 2014-09-22 | 2016-04-20 | 华为技术有限公司 | Ordering method and ordering device for information retrieval system |
CN104317838A (en) * | 2014-10-10 | 2015-01-28 | 浙江大学 | Cross-media Hash index method based on coupling differential dictionary |
CN104462018A (en) * | 2014-11-21 | 2015-03-25 | 浙江工业大学 | Similar case retrieval method based on multidimensional correlation function |
CN108897791A (en) * | 2018-06-11 | 2018-11-27 | 云南师范大学 | A kind of image search method based on depth convolution feature and semantic similarity amount |
CN109460423A (en) * | 2018-10-18 | 2019-03-12 | 浙江工业大学 | A kind of low-carbon similar case retrieval method based on D-HS |
Non-Patent Citations (1)
Title |
---|
基于多维关联函数的相似实例检索方法研究与实现;赵燕伟 等;《数学的实践与认识》;20151031;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110399540A (en) | 2019-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | A riemannian block coordinate descent method for computing the projection robust wasserstein distance | |
CN105117488B (en) | A kind of distributed storage RDF data balanced division method based on hybrid hierarchy cluster | |
CN109783628B (en) | Method for searching KSAARM by combining time window and association rule mining | |
Shukla et al. | Analysis and evaluation of outlier detection algorithms in data streams | |
CN104933156A (en) | Collaborative filtering method based on shared neighbor clustering | |
Zhang et al. | Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data | |
Xin et al. | An overlapping semantic community detection algorithm base on the ARTs multiple sampling models | |
Liu et al. | Method of Time Series Similarity Measurement Based on Dynamic Time Warping. | |
Liu et al. | Multi-fidelity global optimization using a data-mining strategy for computationally intensive black-box problems | |
He et al. | A Method of Identifying Thunderstorm Clouds in Satellite Cloud Image Based on Clustering. | |
CN114428803A (en) | Operation optimization method and system for air compression station, storage medium and terminal | |
Li et al. | Optimization of CVC shifting mode for hot strip mill based on the proposed LightGBM prediction model of roll shifting | |
CN106651461A (en) | Film personalized recommendation method based on gray theory | |
Chen et al. | Differential privacy histogram publishing method based on dynamic sliding window | |
CN110399540B (en) | Instance retrieval method integrating correlation function and D-HS index | |
Fu et al. | ICA: an incremental clustering algorithm based on OPTICS | |
Guo et al. | Mobile user credit prediction based on lightgbm | |
CN113032443B (en) | Method, apparatus, device and computer readable storage medium for processing data | |
Sokolov et al. | Resource efficient data warehouse optimization | |
Marjit | Aggregated similarity optimization in ontology alignment through multiobjective particle swarm optimization | |
CN114297582A (en) | Modeling method of discrete counting data based on multi-probe locality sensitive Hash negative binomial regression model | |
Kuchuganov et al. | Clustering algorithm for a set of machine parts on the basis of engineering drawings | |
Liu et al. | A novel effective distance measure and a relevant algorithm for optimizing the initial cluster centroids of K-means | |
Lu et al. | Main control factors affecting mechanical oil recovery efficiency in complex blocks identified using the improved k-means algorithm | |
Parthasarathy et al. | Ensemble Learning Based Collaborative Filtering with Instance Selection and Enhanced Clustering. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |